Print Page | Close Window

Extracting text from different charsets

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=2271
Printed Date: 13 Aug 25 at 5:25PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Extracting text from different charsets
Posted By: miguele
Subject: Extracting text from different charsets
Date Posted: 21 May 12 at 11:20AM
Hi. I'm using v.7.26 under Delphi XE to extract from different charset PDF files (usually with accentuation characters). Usually the "ExtractFilePageText" with option 0 reads accurately, but for some older PDF files the accentuated characters are extracted wrongly. Is there a way to prevent this? Can you provide some sample code?

Thanks!



Replies:
Posted By: AndrewC
Date Posted: 22 May 12 at 12:21PM
Can you try using ExtractPageText(3) or ExtractPageText(7) to see if the text is extracted correctly.  Option 0 is a very fast extraction method but it is not aware of font encodings.

In QPL 8.xx we have added option 8 which outputs the text using the same format as option 0 but can handle various font encodings and mappings.

Andrew.

Andrew.



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk