Print Page | Close Window

Having difficulty with steps to extract text

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3641
Printed Date: 29 Mar 24 at 4:54AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Having difficulty with steps to extract text
Posted By: sdumont
Subject: Having difficulty with steps to extract text
Date Posted: 29 Nov 18 at 4:49PM

I am trying to work on extracting the text from a sample document I created with just a couple plainly typed sentences but every time I execute my code I get an empty string.

I think I'm missing some step, but it's difficult to tell exactly what order the steps should be completed in, and much of the documentation refers to functions which don't exist in my version.
 
I am using the .dll file loaded into a vb.net desktop application, and the version seems to be 7.xx. Here's the code I'm using(when I create a new PDFBuilder it creates a PDFLibrary and unlocks it):
 

Dim pdf As String = "C:\Users\sdumont\Desktop\testpdf.pdf"

Dim pdftester As New PDFBuilder()

Dim result As String = ""

Try

Select Case pdftester.QPD.LoadFromFile(pdf)

Case 1

MsgBox("The file was loaded successfully!")

Case 0

MsgBox("The file could not be read or processed")

MsgBox(pdftester.QPD.LastErrorCode)

End Select

Select Case pdftester.QPD.SelectPage(1)

Case 1

MsgBox("The page was selected successfully!")

Case 0

MsgBox("The page could not be found")

MsgBox(pdftester.QPD.LastErrorCode)

End Select

pdftester.QPD.SetOrigin(1)

MsgBox(pdftester.QPD.GetPageText(7))

Catch ex As Exception

MsgBox(ex.Message)

End Try




Replies:
Posted By: Ingo
Date Posted: 29 Nov 18 at 7:26PM
Hi S,

in my old archive i've found version 7.26... ;-)
In this old version the LoadFromFile works without entering a password... okay.
In this old version GetPageText oesn't offer option 7 - option 6 is the last one... this can be the problem.
A LastErrorCode after GetPageText would help ;-)
A Decrypt after LoadFromFile will help, too.
You'll know that your version of the library doesn't support the actual pf-specifications?

Cheers and welcome here,
Ingo



-------------
Cheers,
Ingo



Posted By: sdumont
Date Posted: 29 Nov 18 at 7:49PM
Thanks for the welcome, I'm glad to see there is documentation and a community for this code even all these years later.

I've been spending some more time on this since I posted and have a few updates to report.

The version I have reports it is 7.25
I did this using option 0 and it worked! At least for basic text on a plain white background. It doesn't produce any results for a more complicated pdf however.
I had put a lasterrorcode after getpagetext but it came back as 0, which isn't in the lasterrorcode documentation, I assumed it meant no error.
 
Could you clarify what you meant with your last point? Does this version not offer some features for text extraction which later versions offer? I noticed some functions have been added since then, but I wonder if there are any missing features which are essential to this task?


Posted By: Ingo
Date Posted: 29 Nov 18 at 8:10PM
Hi S,

if you're working with newer pdf-documents they can be encrypted with AES 256 and this standard isn't supported by lib-version 7.25.
For YOUR GetPageText you can use option 0 up to option 6.
If you're missing a developer- and reference-guide i can send you one belonging to version 7.26.
If you want them you can send the pdfs to your email-adress you've inserted here in the forum.



-------------
Cheers,
Ingo



Posted By: sdumont
Date Posted: 29 Nov 18 at 8:42PM

Sure I'll take copies of those documents, there's always a chance that they will help.

Thanks for your help with troubleshooting this, I think I understand what I can/can't do at this point.


Posted By: Ingo
Date Posted: 29 Nov 18 at 11:06PM
you've got it... now ;-)



-------------
Cheers,
Ingo




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk