<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="RSS_xslt_style.asp" version="1.0" ?>
<rss version="2.0" xmlns:WebWizForums="http://syndication.webwiz.co.uk/rss_namespace/">
 <channel>
  <title>Debenu Quick PDF Library - PDF SDK Community Forum : extract Text (words)</title>
  <link>http://www.quickpdf.org/forum/</link>
  <description><![CDATA[This is an XML content feed of; Debenu Quick PDF Library - PDF SDK Community Forum : I need help - I can help : extract Text (words)]]></description>
  <copyright>Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved.</copyright>
  <pubDate>Tue, 07 Apr 2026 05:33:17 +0000</pubDate>
  <lastBuildDate>Tue, 15 Sep 2009 11:48:06 +0000</lastBuildDate>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Web Wiz Forums 11.01</generator>
  <ttl>360</ttl>
  <WebWizForums:feedURL>www.quickpdf.org/forum/RSS_post_feed.asp?TID=1215</WebWizForums:feedURL>
  <image>
   <title><![CDATA[Debenu Quick PDF Library - PDF SDK Community Forum]]></title>
   <url>http://www.quickpdf.org/forum/forum_images/QPDF_Forum_Title.png</url>
   <link>http://www.quickpdf.org/forum/</link>
  </image>
  <item>
   <title><![CDATA[extract Text (words) : Hi!With option=3 you get the value...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text-words_topic1215_post5608.html#5608</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 1215<br /><strong>Posted:</strong> 15 Sep 09 at 11:48AM<br /><br />Hi!<br><br>With option=3 you get the value for font-height, too.<br>With a bit calculations regarding the other values (from the four rectangles) you can get the complete length, too.<br>If you want the character-length you can get the string-lenght  with "len(...)", "length(...)" or any similar syntax in many languages.<br>So you have the string-length and the height and it shouldn't be a big problem to find/calculate a matching factor for each character-width.<br><br>Cheers, Ingo<br><br>]]>
   </description>
   <pubDate>Tue, 15 Sep 2009 11:48:06 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text-words_topic1215_post5608.html#5608</guid>
  </item> 
  <item>
   <title><![CDATA[extract Text (words) : Thank you for your answer. I Have...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text-words_topic1215_post5607.html#5607</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=1152">munteanu24d</a><br /><strong>Subject:</strong> 1215<br /><strong>Posted:</strong> 15 Sep 09 at 10:30AM<br /><br />Thank you for your answer. I Have changed the pdf files, and now I am managing to search for words in the rows, with option 3.<br /><br />What I do not manage to do is to take the char width.<br /><br /><br /><strong>The _QP.CharWidth(myASCIcode) always returns a 0 value. </strong><br />I have checked the selected font ID and it is also 0. Here might be the problem, but i do not manage to fix it... :( <br /><br /><br /><span style="font-size:10px"><br /><br />Edited by munteanu24d - 15 Sep 09 at 10:31AM</span>]]>
   </description>
   <pubDate>Tue, 15 Sep 2009 10:30:18 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text-words_topic1215_post5607.html#5607</guid>
  </item> 
  <item>
   <title><![CDATA[extract Text (words) : Hi D.M.!Option 3 gets strings...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text-words_topic1215_post5596.html#5596</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 1215<br /><strong>Posted:</strong> 11 Sep 09 at 7:16PM<br /><br />Hi D.M.!<br><br>Option 3 gets strings if strings were inserted.<br>It's not a must that the line of characters you can see in the pdf was inserted in one run.<br>Another thing is: If there was one word missed in a pdf-row and if it was inserted later ... so this word would be extracted as the last content of the page 'cause it was inserted late and it doesn't matter to which row it belongs.<br>If you get always one word with option 3 then i think that all pdf-documents are from the same source and that they are automatically generated.<br>You can send two samples to me then we can examine them:<br>ingo&nbsp; -dot-&nbsp; schmoekel&nbsp; -at-&nbsp;&nbsp; ewetel&nbsp; -dot-&nbsp; net<br>The other way round i can send you a file with "longer" strings ;-)<br><br>Cheers, Ingo<br><br>]]>
   </description>
   <pubDate>Fri, 11 Sep 2009 19:16:05 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text-words_topic1215_post5596.html#5596</guid>
  </item> 
  <item>
   <title><![CDATA[extract Text (words) : hello!  I am trying to get the...]]></title>
   <link>http://www.quickpdf.org/forum/extract-text-words_topic1215_post5595.html#5595</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=1152">munteanu24d</a><br /><strong>Subject:</strong> 1215<br /><strong>Posted:</strong> 11 Sep 09 at 2:36PM<br /><br />hello!<br /><br />I am trying to get the text from a pdf file, using the getPageText(option) method.<br />I have tried option = 3 and option = 4.<br /><br />When I print the text obtained with option 3, I get just the first word from each row, but the coordinates of the whole row.<br /><br />When I print the text obtained with option 4, I get the fragmented piece of words, for instance for <strong> constant </strong> word, i get <strong>const</strong>  and <strong>ant</strong> . <br /><br />I need implement the find word functionality, but I cannot do it, as long as instead of the whole word, i get fragment of words.<br /><br />What can I do? <br /><br />P.S. I have tried for different pdfs and the result is the same. <br /><br />best wishes,<br />D.M.<span style="font-size:10px"><br /><br />Edited by munteanu24d - 11 Sep 09 at 2:38PM</span>]]>
   </description>
   <pubDate>Fri, 11 Sep 2009 14:36:49 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-text-words_topic1215_post5595.html#5595</guid>
  </item> 
 </channel>
</rss>