Print Page | Close Window

ExtractFilePageText hangs on many PDF's

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3770
Printed Date: 25 Apr 24 at 11:24PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: ExtractFilePageText hangs on many PDF's
Posted By: Franciscus
Subject: ExtractFilePageText hangs on many PDF's
Date Posted: 17 Dec 19 at 10:45AM
I am batch-processing 24 TB of PDF's (1.6 million).

Using v. 16.12.

The problem is that all the functions I have used so far sometimes hang indefinitely on legitimate PDF's, making Quik PDF Library unusable. I have purchased a Delphi Source license but no sources were supplied so I am unable to pinpoint the bug or fix it myself.

ExtractFilePageText hangs on many PDFs, including this one: http://fdg.am/UNTITLED1029.pdf" rel="nofollow - http://fdg.am/UNTITLED1029.pdf

So this hangs forever:

s := PDFLibrary.ExtractFilePageText(' http://fdg.am/UNTITLED1029.pdf" rel="nofollow - UNTITLED1029.pdf ', '', 1, 0);

I don't expect anyone having a solution but who knows. Thanks for your suggestions (a workaround would be greatly appreciated). Also, what does the Delphi sourcecode license entail? The actual sources for this function, in Delphi?



Replies:
Posted By: Franciscus
Date Posted: 17 Dec 19 at 11:06AM
FIXED!

The problem was that the PDF's were encrypted.

if PDFLibrary.EncryptionStatus > 0 then
  PDFLibrary.Decrypt;

Of course, the function should do this by itself, automatically instead of hanging...


Posted By: Ingo
Date Posted: 17 Dec 19 at 9:36PM
Hi Franciscus,


if the real owner of a pdf encrypt it then he wants it encrypted and untouched.
So it's good to have this functionality separated.
BTW: If you've done a look into the samples before then you would have known ;-)
If you've so many pdfs to check you should think about using the DAfunctions...
Here's the developer-guide:
https://www.debenu.com/products/development/debenu-pdf-library/help/developer-guide/

Cheers and welcome here,
Ingo



-------------
Cheers,
Ingo



Posted By: Franciscus
Date Posted: 18 Dec 19 at 7:28AM
Hi Ingo,

Thanks. I am amazed by the ability of the library to hack encryption - esp. on-the-fly! How does it guess the password used to encrypt the PDF? Wow. Chapeau. Chinese hackers are good, obviously. Alternatively, I do not understand what PDF encryption means.

Yet I disagree - the Library works for me, the buyer of the library. Not for the author of the PDF. When I ask the lib to get me text from a PDF and the lib is able to do it, it should do it and not fail silently when it sees it needs to "decrypt" - whatever that means here - first. Anyway - not the most important thing here.

More serious is that there are many issues with the library that become apparent when processing 1,000,000+ PDF's made over the past decades by tens of thousands of people and dozens to hundreds of PDF-generators, so perhaps I'll not be able to use it for my purposes. They should give me the Delphi sourcecode, which I paid for but never received. A memory leak all the way to half a TB was fixble by instantiating and freeing the lib for every PDF - fortunately that is quick - but AV's and hangs on malformed and also proper PDF's is causing serious delays and countless restarts of hung processes so I'll end up writing code to extract what I need myself, I'm afraid... Again a reminder why I strongly adhere to the "not invented here" principle, whenever possible and reasonable. I'd love to have the sources - do you know whether the lib was written in Delphi?

I have the offending PDF's saved for FoxIt so I'll send them to them.


Posted By: Ingo
Date Posted: 18 Dec 19 at 8:17AM
Hi,

why not check the pdfs before processing?
Directly after load you can set "LastErrorCode" to check if the load was okay.
Additionally you can check if there is textcontent with "HasFontResources".
Additionally you can check if a user-password is set... ebooks can make problems as well.
All these things you can check before processing.
If there are performance issues while processing you should change your code using the DA-functions.
The library source is written in pure Delphi.



-------------
Cheers,
Ingo



Posted By: Franciscus
Date Posted: 18 Dec 19 at 8:23AM
Thank you!



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk