Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Strange characters when using GetPageText
  FAQ FAQ  Forum Search   Register Register  Login Login

Strange characters when using GetPageText

 Post Reply Post Reply
Author
Message
smithmarkduane View Drop Down
Beginner
Beginner


Joined: 19 Feb 10
Status: Offline
Points: 6
Post Options Post Options   Thanks (0) Thanks(0)   Quote smithmarkduane Quote  Post ReplyReply Direct Link To This Post Topic: Strange characters when using GetPageText
    Posted: 19 Feb 10 at 3:39PM
I have several PDF documents from the same source that I need to scan the contents to extract some basic text information.  On some of the documents (not all) I get strange characters.  For example (see value that follows Test Date):

Test Date: 1�/3/2��9
 
where Test Date s/b 10/3/2009.  In fact it seems that in all cases where I see a problem it is when the 0 character should appear.  Any ideas how to resolve this?
 
Thanks,
Mark
 
 
 
 
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 3:48PM
Hi!

You should upload a few samples telling us the url.
So we can test ourselve.

Cheers and welcome here,
Ingo
Back to Top
smithmarkduane View Drop Down
Beginner
Beginner


Joined: 19 Feb 10
Status: Offline
Points: 6
Post Options Post Options   Thanks (0) Thanks(0)   Quote smithmarkduane Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 4:06PM
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 8:01PM
Hi Mark!

With my test-app i can do preview, textextraction, and so on.
Creation date and modification date is absolutely okay ... no strange characters.
If i'm opening the pdf with notepad i find the date in this format: 2009-10-03
Using QuickPDF it's the same... I've rebuilt it this way: 03.10.2009
So i think it should be something on your pc?
You can be sure that it has nothing to do with QuickPDF.
Perhaps you wanna show us your relevant code snippet to check it?

Cheers, Ingo



Edited by Ingo - 19 Feb 10 at 8:02PM
Back to Top
smithmarkduane View Drop Down
Beginner
Beginner


Joined: 19 Feb 10
Status: Offline
Points: 6
Post Options Post Options   Thanks (0) Thanks(0)   Quote smithmarkduane Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 10:13PM
Hi Ingo:

Thanks for looking into this.  The date I am referring to is not the Creation or Modification Date meta data for the PDF, but in the text content of the document.  In the header of the page you will notice 'Test Date: 10/3/2009'.  On my system if I open the document in Notepad/Notepad++ and search for 10/3/2009 no match is returned.  The test app I have  to demonstrate this result is simply:

...
PDFLibrary := TQuickPDF0717.Create;
UnlockResult := PDFLibrary.UnlockKey('123456789');
PDFLibrary.LoadFromFile('samp1.pdf');
PDFLibrary.SelectPage(2);
memo1.Lines.Add( PDFLibrary.GetPageText(0));
...

When I run the above code, a portion of what I see is:

�2008
Copyrigh
ANX 3.0

ANSAR  Medical

Patient: xxxxxxxxx
Weight: 145 lbs Height: 5 ft 6 in  Gender: Female Age: 64 DOB: 1/2/1945
ANS Medications:   
Other Medications & Symptoms: 

Test Date: 1�/3/2��9 Physician: xxxxxx

No. of Ectopic Beats: �

Could this be a system font or character set issue?  Since I am not familiar with the internal structure of pdf's I am not sure where to look next.

Thanks,
Mark
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 10:19PM
Hi Mark!

It's too strange. You should forget it. All numbers are correct only the one for the dates...
Try the extraction with a normal string and not tmemo... The same result?

Cheers, Ingo
Back to Top
smithmarkduane View Drop Down
Beginner
Beginner


Joined: 19 Feb 10
Status: Offline
Points: 6
Post Options Post Options   Thanks (0) Thanks(0)   Quote smithmarkduane Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 10:39PM
Yes, same result when I extract to Normal string.  Do you see this behavior also or is it just on my system?  What do you mean when you say ' You should forget it'?  The 'Test Date' in the page header is one of the pieces of Information I need to extract.

Thanks,
Mark
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 10:47PM
Hi Mark!

There are other functions in the library to get the creation date and modification date.
Your problem isn't a common problem. It seems to be only on your machine. I can't explain it and i can't imagine why. Sorry.

Cheers, Ingo

Back to Top
smithmarkduane View Drop Down
Beginner
Beginner


Joined: 19 Feb 10
Status: Offline
Points: 6
Post Options Post Options   Thanks (0) Thanks(0)   Quote smithmarkduane Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 10:54PM
Hi Ingo:

I just want to be sure I am explaining the problem correctly.  I am not interested in the Creation/Modification date of the document.  I am interested in extracting text from the document content, where the text happens to be a date string.  My question is if you use the code:
      PDFLibrary.SelectPage(2);
      memo1.Lines.Add( PDFLibrary.GetPageText(0));
on the sample pdf document I uploaded do you see the strange characters I reported or do you see 'normal' text?  I have run the sample app on 3 machines now (Win XP and Win 7) with the same result.

Thanks again for your help.

Mark
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 11:07PM
Hi Mark!

I've written it already! I've seen/extracted the "normal" text and numbers - no strange characters! You're the only one having this problem. I'm using GetPageText(3) ... perhaps this makes a difference for you...
You did a good and easy to understand description of the problem. We all understand it but i'm pretty sure that nobody here can imagine why. Sorry.

Cheers, Ingo

Back to Top
smithmarkduane View Drop Down
Beginner
Beginner


Joined: 19 Feb 10
Status: Offline
Points: 6
Post Options Post Options   Thanks (0) Thanks(0)   Quote smithmarkduane Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 11:13PM
Hi:

Thanks.  Would it be possible for you to upload your sample app for me to try to see if it is my code or my system(s).  The code is so simple I don't see how this could be it, but not sure where else to look.

Thanks,
Mark
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 19 Feb 10 at 11:22PM
Hi!

Try my freeware PDF-Analyzer ... ;-)

Cheers, Ingo
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store