Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Extract text from PDF problem with "fi" and "fl"
  FAQ FAQ  Forum Search   Register Register  Login Login

Extract text from PDF problem with "fi" and "fl"

 Post Reply Post Reply
Author
Message
miguele View Drop Down
Beginner
Beginner


Joined: 21 May 12
Location: Brazil
Status: Offline
Points: 3
Post Options Post Options   Thanks (0) Thanks(0)   Quote miguele Quote  Post ReplyReply Direct Link To This Post Topic: Extract text from PDF problem with "fi" and "fl"
    Posted: 21 Mar 13 at 3:29PM
Hi,

I'm extracting text from PDF with Delphi XE and QuickPDF v.7. I've tried the v9.13 demo and the problem is the same. The PDF encoding id UTF-8. Language is Portuguese.

For the word:
 Identifique

I get:
 Identifi que 

I've checked the Ord for each char (it's between paranthesis) and it looks like this:
(32)I(73)d(100)e(101)n(110)t(116)i(105)fi(64257) (32)q(113)u(117)e(101) (32)

This is coming out directly from QuickPDF library as follows:

            QP.LoadFromFile(OpenDialog1.FileName,'');
            PageCount := QP.PageCount();
            for i:= 1 to PageCount + 1 do
            begin
              TextOutput:= TextOutput + ' ' + #10 + QP.ExtractFilePageText(OpenDialog1.FileName, '', i, 8);
            end;
            sl.Text:= TextOutput;


TextOutput is a Widestring.

Am I doing something wrong? This happens only for "fi" and "fl".

Thanks,
Miguel


Back to Top
miguele View Drop Down
Beginner
Beginner


Joined: 21 May 12
Location: Brazil
Status: Offline
Points: 3
Post Options Post Options   Thanks (0) Thanks(0)   Quote miguele Quote  Post ReplyReply Direct Link To This Post Posted: 21 Mar 13 at 6:24PM
I've found that this issue only happens with PDF files generated by Adobe InDesign. It seems to be related with ligatures as this forum post describes:


Any workaround for this?

Thanks
Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 25 Mar 13 at 11:16PM
QPL 8.xx has support for extracting ligatures.  We would need to see the file to see why the problem PDF to see why this is happening.

Andrew.
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store