Print Page | Close Window

Extract text from PDF problem with "fi" and "fl"

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=2574
Printed Date: 04 Apr 26 at 7:52PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Extract text from PDF problem with "fi" and "fl"
Posted By: miguele
Subject: Extract text from PDF problem with "fi" and "fl"
Date Posted: 21 Mar 13 at 3:29PM
Hi,

I'm extracting text from PDF with Delphi XE and QuickPDF v.7. I've tried the v9.13 demo and the problem is the same. The PDF encoding id UTF-8. Language is Portuguese.

For the word:
 Identifique

I get:
 Identifi que 

I've checked the Ord for each char (it's between paranthesis) and it looks like this:
(32)I(73)d(100)e(101)n(110)t(116)i(105)fi(64257) (32)q(113)u(117)e(101) (32)

This is coming out directly from QuickPDF library as follows:

            QP.LoadFromFile(OpenDialog1.FileName,'');
            PageCount := QP.PageCount();
            for i:= 1 to PageCount + 1 do
            begin
              TextOutput:= TextOutput + ' ' + #10 + QP.ExtractFilePageText(OpenDialog1.FileName, '', i, 8);
            end;
            sl.Text:= TextOutput;


TextOutput is a Widestring.

Am I doing something wrong? This happens only for "fi" and "fl".

Thanks,
Miguel





Replies:
Posted By: miguele
Date Posted: 21 Mar 13 at 6:24PM
I've found that this issue only happens with PDF files generated by Adobe InDesign. It seems to be related with ligatures as this forum post describes:

http://forums.adobe.com/thread/792075 - http://forums.adobe.com/thread/792075

Any workaround for this?

Thanks


Posted By: AndrewC
Date Posted: 25 Mar 13 at 11:16PM
QPL 8.xx has support for extracting ligatures.  We would need to see the file to see why the problem PDF to see why this is happening.

Andrew.



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk