Print Page | Close Window

Access violation while using ExtractFilePageText

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1065
Printed Date: 25 Jun 25 at 5:55PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Access violation while using ExtractFilePageText
Posted By: Fred
Subject: Access violation while using ExtractFilePageText
Date Posted: 18 Apr 09 at 6:09PM
Here below is my code, in Delphi7, QuickPdf v7.12.
The pdf file is 1 page long. You can put 0 or 1 as page #, none work.
The library doesn't return any error, I just get an ugly access violation while calling ExtractFilePageText. I think this is a pretty straight foward process so I must miss something obvious but I don't see what.
 
Some clue maybe: RenderPageToStream behaves the same way (ie doesn't work) whereas SecurityInfo works. Looks like accessing the page content is causing the pb...
 
procedure Debug;
var qp: TQuickPDF0712 ;
    s: string;
begin
  qp := TQuickPDF0712.Create ;
  try
    if qp.Unlocked = 0 then qp.UnlockKey(Wink);
    try
      s := qp.ExtractFilePageText('D:\PDF\Test\BillCingularFeb09.pdf','', 0, 0);
      ShowMessage(s) ;
    except on e: exception do
      begin
        if qp.LastErrorCode > 0 then
          MessageDlg('Error: '+ e.Message + #13#10 + IntToStr(qp.LastErrorCode)+': '+qp.LastRenderError, mtWarning,[mbOK],0)
        else
          MessageDlg('Error: '+ e.Message + #13#10 + 'No error message returned by the library', mtError,[mbOK],0)
      end;
    end;
  finally
    qp.Free ;
  end;
end;
 
Pdf file info is as followed:

User Password: No
Master Password: No
Printing: Fully Allowed
Changing the Document: Allowed
Content Copying or Extraction: Allowed
Authoring Comments and Form Fields: Allowed
Form Field Fill-in or Signing: Allowed
Content Accessibility Enabled: Allowed
Document Assembly: Allowed
Encryption Level: Blank



Replies:
Posted By: Ingo
Date Posted: 19 Apr 09 at 7:42AM
Hi Fred!

Before extracting you have to do a
SelectPage or DAFindPage (if you're using the DA-functions).
The best is to do first PageCount and then put the extraction in a for-loop.

Cheers, Ingo


Posted By: Fred
Date Posted: 20 Apr 09 at 11:06AM
Hi Ingo,
Thanks for your reply.
I had tried it before, and tried again (qp.SelectPage(1);) but calling ExtractFilePageText failed as well.
Maybe an issue with Delphi project option for the compiler ?


Posted By: Ingo
Date Posted: 20 Apr 09 at 3:32PM
Hi Fred!

I think it's a problem with the pdf.
What's the returning value after LoadFromFile?
Does all your pdf-files have the same problem with the routine?
Does the other extract options (mainly 3 and 4) working?

Oh... Now i've seen it ... It's your ExtractFilePageText!
That's a standalone function ... You shouldn't mix standalone and the "normal" functions! Try the "normal" GetPageText and it will working.

This structure should work:
   QP := TQuickPDF0712.Create;
   try
       QP.UnlockKey('MyKey');
       QP.LoadFromFile(the_pdf_file);
       x := QP.PageCount;
       QP.SetOrigin(1);
       QP.CombineLayers;
       for i := 1 to x Do
       begin
          QP.SelectPage(i);
          STR := STR + QP.GetPageText(0);
       end;
    finally
       QP.Free;
    end;
//  At this moment the text from all pages
//  should be in string STR.

Cheers, Ingo




Posted By: Fred
Date Posted: 20 Apr 09 at 11:01PM
Hi Ingo,
 
I use SelectPage along with GetPageText otherwise the ExtractFilePageText alone. Anyway, I tried your code and it still doesn't work.
 
I've tested a number of pdf generated by various vendor and here are my results:
  • Acrobat Distiller (Windows and Mac versions): no problem at all
  • Ghost Script (all Windows versions): no problem at all
  • Target Stream: RenderPageToStream works ExtractFilePageText raises the infamous access violation
  • Amyuni Pdf Converter: ExtractFilePageText returns empty strings with valid positions and other properties. The doc is mainly tables.
  • Pdf Lib: Same issue as above but this is mainly plain text
  • iText: this is the worse, except SecurityInfo, this is the one that made me post in this forum.

I'll send all these pdf files to QP suport if you want to take a look. Isn't the PDF format supposed to conform to strict rules (ISO 32000) so that one can expect where to retreive information ?

I also have image embedded into a pdf but except doing ocr, I don't expect to extract anything from that of course.
 
By the way, is there a Quick Pdf table reading function ?
Thanks for your time
Frederic


Posted By: Ingo
Date Posted: 21 Apr 09 at 1:35AM
Hi Fred!

If your samples are not too big, you can send them to me directly at
ingo -dot- schmoekel -at- ewetel -dot- net

I'll check them, making a complete error description and if i can't help you i'll send all to the Debenu support. Or you can do this immediately at:
technical support form:
http://www.quickpdflibrary.com/support/support-query.php
report a bug:
http://www.quickpdflibrary.com/support/report-bug.php

Cheers, Ingo




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk