I need help - I can help - Extracting text from different charsets

Print Page | Close Window

Extracting text from different charsets

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=2271
Printed Date: 13 Aug 25 at 5:25PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com

Topic: Extracting text from different charsets

Posted By: miguele
Subject: Extracting text from different charsets
Date Posted: 21 May 12 at 11:20AM

Hi. I'm using v.7.26 under Delphi XE to extract from different charset PDF files (usually with accentuation characters). Usually the "ExtractFilePageText" with option 0 reads accurately, but for some older PDF files the accentuated characters are extracted wrongly. Is there a way to prevent this? Can you provide some sample code?

Thanks!

Replies:

Posted By: AndrewC
Date Posted: 22 May 12 at 12:21PM

Can you try using ExtractPageText(3) or ExtractPageText(7) to see if the text is extracted correctly. Option 0 is a very fast extraction method but it is not aware of font encodings.

In QPL 8.xx we have added option 8 which outputs the text using the same format as option 0 but can handle various font encodings and mappings.

Andrew.