Print Page | Close Window

Search text and get the bound boxes

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3288
Printed Date: 16 Jul 25 at 10:10PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Search text and get the bound boxes
Posted By: pinozzy
Subject: Search text and get the bound boxes
Date Posted: 22 Mar 16 at 1:50PM
Hello All,

I'm using the Viewer SDK (with c#) for search some text in a document.
I need to localize that text and get the co-ordinates of the single results.

My approach is now this:

SearchPDFText("my string");

what I obtain is the number of the occurrences.
Now I need the bound rectangles of theese occurrences.

How can I do that? I tried the GetSelectedTextBlockBound() looping with NextSearchResult(), whithout any success.

I believe that there is a better way, what am I missing?

Many thanks.



Replies:
Posted By: Ingo
Date Posted: 22 Mar 16 at 6:30PM
Hi Pinozzy,

i don't know how similar is the viewer sdk to the QuickPDF-library...
Perhaps this kb-post can help?
http://www.debenu.com/kb/extract-text-from-pdfs-as-a-text-block-list/

Cheers and welcome here,
Ingo



-------------
Cheers,
Ingo



Posted By: pinozzy
Date Posted: 23 Mar 16 at 8:42AM
oh, I see. I switched to the quickpdf lib, I wonder why it is missing here too that search feature. Thanks buddy for your advice, I'll start from there (and post the code, maybe someone have my same need)



Posted By: pinozzy
Date Posted: 26 Mar 16 at 3:11PM
This is how I solved my problem.
This method returns results from a search by Regex.

Bye!

----

public struct FindResult
    {
      public string Text { get; set; }
      public RectangleF Rectangle { get; set; }
      public int Page { get; set; }
    }

    public override List<FindResult> SearchPattern(int pageIndex, string pattern)
    {
      var retVal = new List<FindResult>();
      var dpl = Document.DPL;
      dpl.SetTextExtractionWordGap(1);
      dpl.SetTextExtractionOptions(3, 0);
      var regex = new Regex(pattern);
      for (var i = 0; i < Pages; i++)
      {
        if(pageIndex > 0 && (pageIndex - 1) != i) continue;
        var id = dpl.ExtractPageTextBlocks(4);
        dpl.SelectPage(i);
        for (var f = 1; f <= dpl.GetTextBlockCount(id); f++)
        {
          var text = dpl.GetTextBlockText(id, f);
          var match = regex.Match(text);
          if (!match.Success) continue;
          var res = new FindResult
          {
            Rectangle = new RectangleF(
              (float)dpl.GetTextBlockBound(id, f, 7),
              (float)dpl.GetTextBlockBound(id, f, 8),
              (float)dpl.GetTextBlockBound(id, f, 5) - (float)dpl.GetTextBlockBound(id, f, 7),
              (float)dpl.GetTextBlockBound(id, f, 6) - (float)dpl.GetTextBlockBound(id, f, 4)
            ),
            Page = i+1, Text = text
          };
          retVal.Add(res);
        }
        dpl.ReleaseTextBlocks(id);
      }
      return retVal;
    }


Posted By: Ingo
Date Posted: 27 Mar 16 at 1:46PM
Hi!

Thanks a lot for sharing your code.
I've put it into the sample section.
Perhaps it can be a help for other ones, too :)



-------------
Cheers,
Ingo




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk