Search text and get the bound boxes
Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3288
Printed Date: 16 Jul 25 at 10:10PM Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com
Topic: Search text and get the bound boxes
Posted By: pinozzy
Subject: Search text and get the bound boxes
Date Posted: 22 Mar 16 at 1:50PM
Hello All,
I'm using the Viewer SDK (with c#) for search some text in a document. I need to localize that text and get the co-ordinates of the single results.
My approach is now this:
SearchPDFText("my string");
what I obtain is the number of the occurrences. Now I need the bound rectangles of theese occurrences.
How can I do that? I tried the GetSelectedTextBlockBound() looping with NextSearchResult(), whithout any success.
I believe that there is a better way, what am I missing?
Many thanks.
|
Replies:
Posted By: Ingo
Date Posted: 22 Mar 16 at 6:30PM
Hi Pinozzy,
i don't know how similar is the viewer sdk to the QuickPDF-library... Perhaps this kb-post can help? http://www.debenu.com/kb/extract-text-from-pdfs-as-a-text-block-list/
Cheers and welcome here, Ingo
------------- Cheers, Ingo
|
Posted By: pinozzy
Date Posted: 23 Mar 16 at 8:42AM
oh, I see. I switched to the quickpdf lib, I wonder why it is missing here too that search feature. Thanks buddy for your advice, I'll start from there (and post the code, maybe someone have my same need)
|
Posted By: pinozzy
Date Posted: 26 Mar 16 at 3:11PM
This is how I solved my problem. This method returns results from a search by Regex.
Bye!
----
public struct FindResult { public string Text { get; set; } public RectangleF Rectangle { get; set; } public int Page { get; set; } }
public override List<FindResult> SearchPattern(int pageIndex, string pattern) { var retVal = new List<FindResult>(); var dpl = Document.DPL; dpl.SetTextExtractionWordGap(1); dpl.SetTextExtractionOptions(3, 0); var regex = new Regex(pattern); for (var i = 0; i < Pages; i++) { if(pageIndex > 0 && (pageIndex - 1) != i) continue; var id = dpl.ExtractPageTextBlocks(4); dpl.SelectPage(i); for (var f = 1; f <= dpl.GetTextBlockCount(id); f++) { var text = dpl.GetTextBlockText(id, f); var match = regex.Match(text); if (!match.Success) continue; var res = new FindResult { Rectangle = new RectangleF( (float)dpl.GetTextBlockBound(id, f, 7), (float)dpl.GetTextBlockBound(id, f, 8), (float)dpl.GetTextBlockBound(id, f, 5) - (float)dpl.GetTextBlockBound(id, f, 7), (float)dpl.GetTextBlockBound(id, f, 6) - (float)dpl.GetTextBlockBound(id, f, 4) ), Page = i+1, Text = text }; retVal.Add(res); } dpl.ReleaseTextBlocks(id); } return retVal; }
|
Posted By: Ingo
Date Posted: 27 Mar 16 at 1:46PM
Hi!
Thanks a lot for sharing your code. I've put it into the sample section. Perhaps it can be a help for other ones, too :)
------------- Cheers, Ingo
|
|