Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - GetPageText with cid fonts doesn't work?
  FAQ FAQ  Forum Search   Register Register  Login Login

GetPageText with cid fonts doesn't work?

 Post Reply Post Reply
Author
Message
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (1) Thanks(1)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Topic: GetPageText with cid fonts doesn't work?
    Posted: 16 May 21 at 1:40PM
Hi :)


today i myself have a problem... ;-)
Regarding functionalities like fulltext search i'm using GetPageText as first preparing step.
Now i've made bad experiences with few files and GetPageText.
The textextract is hanging... my app stops and i'm loosing control and at least i have to use the task manager to kill the process.
After a deeper look into the relevant documents i've always seen that there are many (more than 10) embedded fonts and these fonts are always cid fonts.

I've tried to use some heal-functionality offered by QuickPDF to transform the pdf and do the extract later but nothing helps. Additionally it doesn't matter which extract option (0 up to 8) i'm using.

Is here anybody with the same experience and perhaps a solution?
Here are two of my samples to test the extraction.
Thanks in advance:
https://www.is-soft.de/vx800/prob1.pdf
https://www.is-soft.de/vx800/prob2.pdf


Cheers,
Ingo

Cheers,
Ingo

Back to Top
tfrost View Drop Down
Senior Member
Senior Member


Joined: 06 Sep 10
Location: UK
Status: Offline
Points: 437
Post Options Post Options   Thanks (0) Thanks(0)   Quote tfrost Quote  Post ReplyReply Direct Link To This Post Posted: 16 May 21 at 8:58PM
I can't help with a solution, but I tried analysing both files with PDF Analyzer Pro 5.0 (which normally gives a good report of issues). This also hung during text analysis.
Back to Top
Sopracenery View Drop Down
Team Player
Team Player
Avatar

Joined: 31 Aug 20
Location: Germany
Status: Offline
Points: 29
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sopracenery Quote  Post ReplyReply Direct Link To This Post Posted: 27 May 21 at 12:15AM
Hi Ingo,

can you reduce one of your files to a minimum? I mean a minimum of pages or words so that the error is still there? This would simplify the search of a reason.

In other projects I remember a bug when a special char was at a special position at byte 1024 in a richtext file. A strange thing that was found by reducing the file step by step.

Martin
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 28 May 21 at 11:00PM
Hi Martin,

i think it's a common prob too many cid-fonts prevent from some render functionalities.
Here are the same documents... one with one remaining page and the other one with two pages now. The probs are the same:
https://www.is-soft.de/vx800/prob1_klein.pdf
https://www.is-soft.de/vx800/prob2_klein.pdf
Cheers,
Ingo

Back to Top
Sopracenery View Drop Down
Team Player
Team Player
Avatar

Joined: 31 Aug 20
Location: Germany
Status: Offline
Points: 29
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sopracenery Quote  Post ReplyReply Direct Link To This Post Posted: 29 May 21 at 9:05AM
Hi Ingo,

I opened your probe prob2_klein.pdf with 18.11 and there is no issue with GetPageText.
I tested options 0 and 7. Both are working perfect. Page 1 and page 2 are ok.

How can I help you?
Martin
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 30 May 21 at 10:19PM
Hi Martin,

thanks a lot for trying yourself.
I'm working with the same release.
prob2_klein.pdf works with rendering (i have a small preview function). I see the textcontent but when i use GetPageText with option 7 i'm producing an empty txtfile..
prob1_klein.pdf completely doesn't work.

Here's a bit code - perhaps you can see what i don't see ;-)
Thanks in advance.

  QP := TDebenuPDFLibrary1811.Create;
  try
    QP.LoadFromFile(Edit1.Text, '');
    If (QP.EncryptionStatus > 0) Then
      QP.Decrypt;
    X := QP.PageCount;
    QP.CombineContentStreams;
    UNI := '';

//  . . .

    filetxt := ChangeFileExt(ExtractFileName(Edit1.Text), '.txt');
    verztxt := tpath + '__' + filetxt;

    sl  := TStringList.Create;
    sl2 := TStringList.Create;
    for i := 1 to X Do
    begin
      QP.SelectPage(i);
      QP.SetOrigin(1);
      QP.CombineContentStreams;
      UNI := WideString('');
      UNI2 := WideString(' ' + #13#10);
      UNI2 := UNI2 + WideString(' --- page ' + IntToStr(i) + ' from ' + IntToStr(X) + ' ---');
      UNI2 := UNI2 + WideString(' ' + #13#10);

      UNI := UNI2 + QP.GetPageText(7);

      UNI := UNI + #13#10;
      sl.Add(UNI);
    end;
  finally
    QP.Free;
    UNI := WideString('');
    UNI2 := WideString('');
    sl.SaveToFile(verztxt, TEncoding.Unicode);
    Screen.Cursor := Save_Cursor; { Always restore to normal }
  end;


Cheers,
Ingo

Back to Top
Sopracenery View Drop Down
Team Player
Team Player
Avatar

Joined: 31 Aug 20
Location: Germany
Status: Offline
Points: 29
Post Options Post Options   Thanks (0) Thanks(0)   Quote Sopracenery Quote  Post ReplyReply Direct Link To This Post Posted: 31 May 21 at 8:42AM
I tried your sequence as follows:

QP.LoadFromFile(Edit1.Text, '');
    X := QP.PageCount;
    QP.CombineContentStreams;
      QP.SelectPage(i);
      QP.SetOrigin(1);
      QP.CombineContentStreams;
      QP.GetPageText(7);

and I see no issue.
But I do not have QP.Free in my library. So we are not working with the same binary.
I use DebenuPDFLibraryAX1811.dll on windows32.

Please check you output directly after 
QP.GetPageText(7);
into debug. Is there really nothing coming out?
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store