I need help - I can help - GetPageText(4) returns cryptic characters

Print Page | Close Window

GetPageText(4) returns cryptic characters

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1914
Printed Date: 20 May 26 at 7:06PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com

Topic: GetPageText(4) returns cryptic characters

Posted By: christoph81
Subject: GetPageText(4) returns cryptic characters
Date Posted: 08 Aug 11 at 11:14AM

Hello everybody,

im trying your sdk and with most of my pdf file i get good result. Now i tried another pdf file with embedded text and i get cryptic results for the words.

After that i tried to copy the text out of the pdf with the foxit pdf viewer. And i also get the same strange results.

A customer of us said it could be related to the fonts used in the pdf file.

Best regards

Christoph

Replies:

Posted By: Rowan
Date Posted: 08 Aug 11 at 2:38PM

Hi Christoph,

There's a small chance that the text that you're trying to extract text from contains Unicode characters and these aren't being decoded when you get the results?

The result is encoded using UTF-8 in the Delphi and DLL editions of the library.

However, if you cannot copy the text out of the PDF using Foxit PDF Viewer either, then that means that there's probably an issue with the cmap, character lookup map, which means that the PDF is somewhat corrupt and you won't be able to extract text from it until it is repaired (using Acrobat or a similar tool which can repair PDFs).

Cheers,

- Rowan.