Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
![]() |
GetPageText with cid fonts doesn't work? |
Post Reply ![]() |
Author | |
Ingo ![]() Moderator Group ![]() ![]() Joined: 29 Oct 05 Status: Offline Points: 3529 |
![]() ![]() ![]() ![]() ![]() Posted: 16 May 21 at 1:40PM |
Hi :)
today i myself have a problem... ;-) Regarding functionalities like fulltext search i'm using GetPageText as first preparing step. Now i've made bad experiences with few files and GetPageText. The textextract is hanging... my app stops and i'm loosing control and at least i have to use the task manager to kill the process. After a deeper look into the relevant documents i've always seen that there are many (more than 10) embedded fonts and these fonts are always cid fonts. I've tried to use some heal-functionality offered by QuickPDF to transform the pdf and do the extract later but nothing helps. Additionally it doesn't matter which extract option (0 up to 8) i'm using. Is here anybody with the same experience and perhaps a solution? Here are two of my samples to test the extraction. Thanks in advance: https://www.is-soft.de/vx800/prob1.pdf https://www.is-soft.de/vx800/prob2.pdf Cheers, Ingo |
|
Cheers,
Ingo |
|
![]() |
|
tfrost ![]() Senior Member ![]() Joined: 06 Sep 10 Location: UK Status: Offline Points: 437 |
![]() ![]() ![]() ![]() ![]() |
I can't help with a solution, but I tried analysing both files with PDF Analyzer Pro 5.0 (which normally gives a good report of issues). This also hung during text analysis.
|
|
![]() |
|
Sopracenery ![]() Team Player ![]() ![]() Joined: 31 Aug 20 Location: Germany Status: Offline Points: 29 |
![]() ![]() ![]() ![]() ![]() |
Hi Ingo,
can you reduce one of your files to a minimum? I mean a minimum of pages or words so that the error is still there? This would simplify the search of a reason. In other projects I remember a bug when a special char was at a special position at byte 1024 in a richtext file. A strange thing that was found by reducing the file step by step. Martin
|
|
![]() |
|
Ingo ![]() Moderator Group ![]() ![]() Joined: 29 Oct 05 Status: Offline Points: 3529 |
![]() ![]() ![]() ![]() ![]() |
Hi Martin,
i think it's a common prob too many cid-fonts prevent from some render functionalities. Here are the same documents... one with one remaining page and the other one with two pages now. The probs are the same: https://www.is-soft.de/vx800/prob1_klein.pdf https://www.is-soft.de/vx800/prob2_klein.pdf |
|
Cheers,
Ingo |
|
![]() |
|
Sopracenery ![]() Team Player ![]() ![]() Joined: 31 Aug 20 Location: Germany Status: Offline Points: 29 |
![]() ![]() ![]() ![]() ![]() |
Hi Ingo,
I opened your probe prob2_klein.pdf with 18.11 and there is no issue with GetPageText. I tested options 0 and 7. Both are working perfect. Page 1 and page 2 are ok. How can I help you? Martin
|
|
![]() |
|
Ingo ![]() Moderator Group ![]() ![]() Joined: 29 Oct 05 Status: Offline Points: 3529 |
![]() ![]() ![]() ![]() ![]() |
Hi Martin,
thanks a lot for trying yourself. I'm working with the same release. prob2_klein.pdf works with rendering (i have a small preview function). I see the textcontent but when i use GetPageText with option 7 i'm producing an empty txtfile.. prob1_klein.pdf completely doesn't work. Here's a bit code - perhaps you can see what i don't see ;-) Thanks in advance. QP := TDebenuPDFLibrary1811.Create; try QP.LoadFromFile(Edit1.Text, ''); If (QP.EncryptionStatus > 0) Then QP.Decrypt; X := QP.PageCount; QP.CombineContentStreams; UNI := ''; // . . . filetxt := ChangeFileExt(ExtractFileName(Edit1.Text), '.txt'); verztxt := tpath + '__' + filetxt; sl := TStringList.Create; sl2 := TStringList.Create; for i := 1 to X Do begin QP.SelectPage(i); QP.SetOrigin(1); QP.CombineContentStreams; UNI := WideString(''); UNI2 := WideString(' ' + #13#10); UNI2 := UNI2 + WideString(' --- page ' + IntToStr(i) + ' from ' + IntToStr(X) + ' ---'); UNI2 := UNI2 + WideString(' ' + #13#10); UNI := UNI2 + QP.GetPageText(7); UNI := UNI + #13#10; sl.Add(UNI); end; finally QP.Free; UNI := WideString(''); UNI2 := WideString(''); sl.SaveToFile(verztxt, TEncoding.Unicode); Screen.Cursor := Save_Cursor; { Always restore to normal } end; |
|
Cheers,
Ingo |
|
![]() |
|
Sopracenery ![]() Team Player ![]() ![]() Joined: 31 Aug 20 Location: Germany Status: Offline Points: 29 |
![]() ![]() ![]() ![]() ![]() |
I tried your sequence as follows: QP.LoadFromFile(Edit1.Text, ''); X := QP.PageCount; QP.CombineContentStreams; QP.SelectPage(i); QP.SetOrigin(1); QP.CombineContentStreams; QP.GetPageText(7); and I see no issue. But I do not have QP.Free in my library. So we are not working with the same binary. I use DebenuPDFLibraryAX1811.dll on windows32. Please check you output directly after QP.GetPageText(7); into debug. Is there really nothing coming out?
|
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store