Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
![]() |
Searching a string in an existing PDF file |
Post Reply
|
| Author | |
balane78
Beginner
Joined: 13 Feb 12 Location: Paris Status: Offline Points: 2 |
Post Options
Thanks(0)
Quote Reply
Topic: Searching a string in an existing PDF filePosted: 15 Feb 12 at 1:58PM |
|
Hi
Sorry for this newbie question which will most probably looks stupid but I am crawling since yesterday in documentation. Which function should I use to search a predefined string inside a PDF file and get page number ? |
|
![]() |
|
edvoigt
Senior Member
Joined: 26 Mar 11 Location: Berlin, Germany Status: Offline Points: 111 |
Post Options
Thanks(0)
Quote Reply
Posted: 15 Feb 12 at 5:27PM |
|
Hi,
the normal way is more than one step. You may gon page by page through your document. Depending from your goal, stop with the first result or building a list of places, where the searchstring is found. First get the text of a page by GetPageText. Depending from your wishes and knoeledge about the PDF you want to be searched, use the right extractoption. In the result of GetPageText you uses the correct kind of search (which you have to code in your program by yourself), depending on choosen extractoption. Play with extractionoptions and look into its output. Keep an eye on option 3 or 4. Important! Because a PDF is by definition not a kind of wordprocessing datafile, textextraction may not guarantee to detect words as words. In a PDF a word can be drawn letter by letter and in a wrong order. So the textextraction of QuickPDF has a harder job, as it seems to be. But in normal case (shall mean: a text is written with one font, one size and without tricks in order) you have only problems with words going from end of a line to start of next line. They will possible come as two words, but are in your searchstring only one word. To get more information use the search for other posts in forum, dealing with textextraction and searching words. Werner Edited by edvoigt - 15 Feb 12 at 5:35PM |
|
![]() |
|
balane78
Beginner
Joined: 13 Feb 12 Location: Paris Status: Offline Points: 2 |
Post Options
Thanks(0)
Quote Reply
Posted: 15 Feb 12 at 6:22PM |
|
OK thanks.
BTW I wonder how is Acrobat Reader search tool working. |
|
![]() |
|
Ingo
Moderator Group
Joined: 29 Oct 05 Status: Offline Points: 3530 |
Post Options
Thanks(0)
Quote Reply
Posted: 15 Feb 12 at 7:35PM |
|
Acrobat Reader comes along with an over 100-mb-installation...
so it's probably a bit faster ;-) |
|
![]() |
|
edvoigt
Senior Member
Joined: 26 Mar 11 Location: Berlin, Germany Status: Offline Points: 111 |
Post Options
Thanks(0)
Quote Reply
Posted: 15 Feb 12 at 7:56PM |
|
Hi,
I did the following test: a very small word-text, printed as PDF with PDF-Creator. It looks so: Test search- ing with acro- bat For better understanding, inside it looks so: [(T)-15.8907(e)-2.05734(s)3.21993(t)0.721099( )-3.16695(f)7.49943(o)-6.3339(r)-4.55617( )-3.16695(s)3.21993(e)-2.05734(a)-2.05734(r)-4.55617(c)-2.05726(h)5.7217(-)333]TJ 11.52 TL T*[(i)0.721099(n)5.7217(g)5.7217( )-27.2782(w)10.7194(i)0.721099(t)0.721099(h)5.7217( )-3.16695(a)-2.05734(c)-2.05734(r)-4.55617(o)-6.3339(-)]TJ 11.4 TL T*[(b)-6.3339(a)-2.05734(t)0.721161( )]TJ I have marked the word Test by red. Try a search for "search-", Acrobat Reader X dont find it! But "searching" is found. Conclusion: Acrobat does a lot of things, to get the searchresults. It seems, as they would prepare the text by omitting some chars (newline, -+newline). Its a little bit like a compiler ignores comments and spaces. In most cases, it finds you are searching for. You see it depend on quality of textextraction, preparation and searchtactics. Werner |
|
![]() |
|
Post Reply
|
|
|
Tweet
|
| Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store