Print Page | Close Window

Get PDF box data from large PDFs

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1285
Printed Date: 25 Jun 25 at 1:08PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Get PDF box data from large PDFs
Posted By: Peter
Subject: Get PDF box data from large PDFs
Date Posted: 30 Nov 09 at 6:55AM
Hi,
 
I use the iSED.dll to read the PDF page boxes from the first page of the PDF (MediaBox, CropBox, BleedBox, TrimBox and ArtBox).
 
First of all, I load the file with LoadFromFile()
Then I read the box information with GetPageBox()
 
Everything works fine with PDFs < 500MB
With larger PDFs I get an out of memory exeption from the LoadFromFile function (altough running on a PC with 4GB RAM).
 
I tried to load the file with the DA-functions - this works without exeption, but I can not access the GetPageBox() function with DA.
 
I get the same results with the newest QuickPDFDLL0717.dll
Cutting the originalPDF in smaller Multipage PDFs is no option.
 
Anyone who can help me to read page boxes from large PDFs?
 
Thanks, Peter
 



Replies:
Posted By: Ingo
Date Posted: 01 Dec 09 at 6:27AM
Hi Peter!

I didn't make the described experiences with the newer versions of QuickPDF but two years ago i've posted a code how to handle these memory-problems. In short: do a Free from time to time and keep the current page in mind and go on with a new instance...

Cheers, Ingo

- - -

Memory and time is often a problem working on big pdf files with quickpdf. It seems to me that in many code parts memory won't resetted ...

I'm using something like this:

   QP := TiSEDQuickPDF.Create;
   try
       QP.UnlockKey('mycode');
       dafh := QP.DAOpenFile(edit,'');
       x    := QP.DAGetPageCount(dafh);
       STR  := '';

       verztxt := edit + '.txt';
       AssignFile(cf,verztxt);
       Rewrite(cf);
       pc := 0;
       for i := 1 to x Do
       begin
          dapr := QP.DAFindPage(dafh,i);
          QP.CombineLayers;
          STR  := QP.DAExtractPageText(dafh,dapr,3);

          WriteLn(cf,Trim(STR));

          pc := pc + 1;
          if ( pc = 10 ) Then
             begin
                pc := 0;
                QP.DACloseFile(dafh);
                QP.Free;
                QP := TiSEDQuickPDF.Create;
                QP.UnlockKey('mycode');
                dafh := QP.DAOpenFile(edit,'');
              end; 
// . . .

This works fine on big files for textextraction - no problems anymore for me. After few pages i'm using "free" and then a new "creation" ... This needs only less time and avoid memory-exceptions for me.
 


Posted By: Peter
Date Posted: 01 Dec 09 at 9:57AM
Hi Ingo
 
Thanks for your reply!
The main problem still remains: I have no access the the GetPageBox() functions when I open a file with DAOpenFile().
Any idea how to get the PDF-Box information (trimbox, mediabox, cropbox, ...) with DA?
 
 


Posted By: Ingo
Date Posted: 01 Dec 09 at 12:01PM
Hi Peter!

So why not use the functionality your ide is offering?
You can do this job without any pdf-sdk.
Open the pdf with an editor and you'll know what i mean.
You'll find things like this:
451 0 obj
<</CropBox[0.0 0.0 595.276 841.89]/Parent 587 0 R/StructParents 27/Contents 453 0 R/Rotate 0/BleedBox[0 0 595.276 841.89]/MediaBox[0.0 0.0 595.276 841.89]/TrimBox[0 0 595.276 841.89]/Resources 452 0 R/Type/Page>>
endobj

Cheers, Ingo




Posted By: Peter
Date Posted: 01 Dec 09 at 12:34PM
Hi Ingo
 
I have my application written in Delphi 7. Intention of the program is to collect and enter job information by the user. Part of those information are the PDF box information.
To help fill in the jobmask automatically, the PDF box information are read from a PDF after the user selected a PDF from the file system. Thats why I use the iSED.dll
The Box information could be read and entered 'manually' by the user, but he needs then e.g. AdobeAcrobat Prof to see the PDF box information.
The PDF box info has to be read automatically from my application, without opening another Editor. 
 
 


Posted By: Ingo
Date Posted: 01 Dec 09 at 2:37PM
You don't want to understand me ;-)
The editor was only an example to show you that you can see this all without any decryption routines.
Read the filecontent directly into a string or stream and search for "your boxes" ;-)
It's easy!

Cheers, Ingo


Posted By: Peter
Date Posted: 01 Dec 09 at 3:15PM
Ok, now I know what you mean, I'll try that.
Thanks a lot for your help!
 
Regards, Peter



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk