Print Page | Close Window

GetPageText(4) returns cryptic characters

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1914
Printed Date: 14 Dec 25 at 12:30AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: GetPageText(4) returns cryptic characters
Posted By: christoph81
Subject: GetPageText(4) returns cryptic characters
Date Posted: 08 Aug 11 at 11:14AM
Hello everybody,
im trying your sdk and with most of my pdf file i get good result. Now i tried another pdf file with embedded text and i get cryptic results for the words.

After that i tried to copy the text out of the pdf with the foxit pdf viewer. And i also get the same strange results. 
A customer of us said it could be related to the fonts used in the pdf file.

Best regards
Christoph 




Replies:
Posted By: Rowan
Date Posted: 08 Aug 11 at 2:38PM
Hi Christoph,

There's a small chance that the text that you're trying to extract text from contains Unicode characters and these aren't being decoded when you get the results? 

The result is encoded using UTF-8 in the Delphi and DLL editions of the library.

However, if you cannot copy the text out of the PDF using Foxit PDF Viewer either, then that means that there's probably an issue with the cmap, character lookup map, which means that the PDF is somewhat corrupt and you won't be able to extract text from it until it is repaired (using Acrobat or a similar tool which can repair PDFs).

Cheers,
- Rowan.



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk