Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - ExtractFilePageText hangs on many PDF's
  FAQ FAQ  Forum Search   Register Register  Login Login

ExtractFilePageText hangs on many PDF's

 Post Reply Post Reply
Author
Message
Franciscus View Drop Down
Beginner
Beginner


Joined: 17 Dec 19
Status: Offline
Points: 9
Post Options Post Options   Thanks (0) Thanks(0)   Quote Franciscus Quote  Post ReplyReply Direct Link To This Post Topic: ExtractFilePageText hangs on many PDF's
    Posted: 17 Dec 19 at 10:45AM
I am batch-processing 24 TB of PDF's (1.6 million).

Using v. 16.12.

The problem is that all the functions I have used so far sometimes hang indefinitely on legitimate PDF's, making Quik PDF Library unusable. I have purchased a Delphi Source license but no sources were supplied so I am unable to pinpoint the bug or fix it myself.

ExtractFilePageText hangs on many PDFs, including this one: http://fdg.am/UNTITLED1029.pdf

So this hangs forever:

s := PDFLibrary.ExtractFilePageText('UNTITLED1029.pdf', '', 1, 0);

I don't expect anyone having a solution but who knows. Thanks for your suggestions (a workaround would be greatly appreciated). Also, what does the Delphi sourcecode license entail? The actual sources for this function, in Delphi?
Back to Top
Franciscus View Drop Down
Beginner
Beginner


Joined: 17 Dec 19
Status: Offline
Points: 9
Post Options Post Options   Thanks (0) Thanks(0)   Quote Franciscus Quote  Post ReplyReply Direct Link To This Post Posted: 17 Dec 19 at 11:06AM
FIXED!

The problem was that the PDF's were encrypted.

if PDFLibrary.EncryptionStatus > 0 then
  PDFLibrary.Decrypt;

Of course, the function should do this by itself, automatically instead of hanging...


Edited by Franciscus - 17 Dec 19 at 11:32AM
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 17 Dec 19 at 9:36PM
Hi Franciscus,


if the real owner of a pdf encrypt it then he wants it encrypted and untouched.
So it's good to have this functionality separated.
BTW: If you've done a look into the samples before then you would have known ;-)
If you've so many pdfs to check you should think about using the DAfunctions...
Here's the developer-guide:
https://www.debenu.com/products/development/debenu-pdf-library/help/developer-guide/

Cheers and welcome here,
Ingo

Cheers,
Ingo

Back to Top
Franciscus View Drop Down
Beginner
Beginner


Joined: 17 Dec 19
Status: Offline
Points: 9
Post Options Post Options   Thanks (0) Thanks(0)   Quote Franciscus Quote  Post ReplyReply Direct Link To This Post Posted: 18 Dec 19 at 7:28AM
Hi Ingo,

Thanks. I am amazed by the ability of the library to hack encryption - esp. on-the-fly! How does it guess the password used to encrypt the PDF? Wow. Chapeau. Chinese hackers are good, obviously. Alternatively, I do not understand what PDF encryption means.

Yet I disagree - the Library works for me, the buyer of the library. Not for the author of the PDF. When I ask the lib to get me text from a PDF and the lib is able to do it, it should do it and not fail silently when it sees it needs to "decrypt" - whatever that means here - first. Anyway - not the most important thing here.

More serious is that there are many issues with the library that become apparent when processing 1,000,000+ PDF's made over the past decades by tens of thousands of people and dozens to hundreds of PDF-generators, so perhaps I'll not be able to use it for my purposes. They should give me the Delphi sourcecode, which I paid for but never received. A memory leak all the way to half a TB was fixble by instantiating and freeing the lib for every PDF - fortunately that is quick - but AV's and hangs on malformed and also proper PDF's is causing serious delays and countless restarts of hung processes so I'll end up writing code to extract what I need myself, I'm afraid... Again a reminder why I strongly adhere to the "not invented here" principle, whenever possible and reasonable. I'd love to have the sources - do you know whether the lib was written in Delphi?

I have the offending PDF's saved for FoxIt so I'll send them to them.


Edited by Franciscus - 18 Dec 19 at 7:47AM
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 18 Dec 19 at 8:17AM
Hi,

why not check the pdfs before processing?
Directly after load you can set "LastErrorCode" to check if the load was okay.
Additionally you can check if there is textcontent with "HasFontResources".
Additionally you can check if a user-password is set... ebooks can make problems as well.
All these things you can check before processing.
If there are performance issues while processing you should change your code using the DA-functions.
The library source is written in pure Delphi.

Cheers,
Ingo

Back to Top
Franciscus View Drop Down
Beginner
Beginner


Joined: 17 Dec 19
Status: Offline
Points: 9
Post Options Post Options   Thanks (0) Thanks(0)   Quote Franciscus Quote  Post ReplyReply Direct Link To This Post Posted: 18 Dec 19 at 8:23AM
Thank you!
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store