Print Page | Close Window

get the lines where a word belongs to

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3717
Printed Date: 06 May 24 at 5:08AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: get the lines where a word belongs to
Posted By: johnny
Subject: get the lines where a word belongs to
Date Posted: 12 Jun 19 at 4:23PM
hi all,

with the .GetPageText(4) you get the coordinates of the word.
so far so good, but i wish to get also the Line where this word belongs to if i had converted that pdf to a text file by using the .GetPageText(7).

anything out there ready that will do that, and not spend time coding my own function to locate the words in the text file? Embarrassed

thanks



Replies:
Posted By: Ingo
Date Posted: 12 Jun 19 at 7:21PM
Hi Johnny,

there isn't a ready-made routine here i fear ;-)
But it's not hard stuff.
With option 3 or 4 you can sort the extract with the first Y-value.
Similar Y-values means the same line if font height matches.
Using option 7 you can put the extract into a memo-field:
Here's something from scratch (a bit delphi)...
memo_page := TStringList.Create;
for i := 1 to QP.PageCount do...
  QP.SelectPage(i):
  memo_page.Clear;
  memo_page.Text := QP.GetPageText;
  for i2 := 0 to memo_page.Count - 1 do
// ...




-------------
Cheers,
Ingo



Posted By: johnny
Date Posted: 12 Jun 19 at 7:28PM
thanks for the reply... i have in the meantime done something similar... but it didn't hurt to ask...

for me the tricky part is that 1 word can be repeatable in many lines or in the same line and i should keep a dictionary of what i have already spotted in the text get its line..and move one.

also note to others to split the text you get.from getpagetxt(7) by "/n" char and before than get rid of the "/r" cause together in .net c#'s newLine will give you double the empty lines than it should be so lines you see and lines you get would not match..


anyway all good. would be nice though option 4 to provide that Line info in a future version... :)

bb


Posted By: Ingo
Date Posted: 12 Jun 19 at 10:12PM
"...
anyway all good. would be nice though option 4 to provide that Line info in a future version... :)
..."

So you should tell it to the publishers on their official page ;-)



-------------
Cheers,
Ingo




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk