Extracting text from different charsets

Post Reply

Author	Message Topic Search Topic Options Post Reply Create New Topic Printable Version Translate Topic
miguele Members Profile Find Members Posts Beginner Joined: 21 May 12 Location: Brazil Status: Offline Points: 3	Post Options Post Reply Quote miguele Report Post Thanks(0) Quote Reply Topic: Extracting text from different charsets Posted: 21 May 12 at 11:20AM
	Hi. I'm using v.7.26 under Delphi XE to extract from different charset PDF files (usually with accentuation characters). Usually the "ExtractFilePageText" with option 0 reads accurately, but for some older PDF files the accentuated characters are extracted wrongly. Is there a way to prevent this? Can you provide some sample code? Thanks!

AndrewC Members Profile Find Members Posts Moderator Group Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841	Post Options Post Reply Quote AndrewC Report Post Thanks(0) Quote Reply Posted: 22 May 12 at 12:21PM
	Can you try using ExtractPageText(3) or ExtractPageText(7) to see if the text is extracted correctly. Option 0 is a very fast extraction method but it is not aware of font encodings. In QPL 8.xx we have added option 8 which outputs the text using the same format as option 0 but can handle various font encodings and mappings. Andrew. Andrew.

Post Reply
Tweet

Forum Jump

Forum Permissions View Drop Down

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot delete your posts in this forum
You cannot edit your posts in this forum
You cannot create polls in this forum
You cannot vote in polls in this forum