<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="RSS_xslt_style.asp" version="1.0" ?>
<rss version="2.0" xmlns:WebWizForums="http://syndication.webwiz.co.uk/rss_namespace/">
 <channel>
  <title>Debenu Quick PDF Library - PDF SDK Community Forum : problems with GetPageText()</title>
  <link>http://www.quickpdf.org/forum/</link>
  <description><![CDATA[This is an XML content feed of; Debenu Quick PDF Library - PDF SDK Community Forum : I need help - I can help : problems with GetPageText()]]></description>
  <copyright>Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved.</copyright>
  <pubDate>Sat, 04 Apr 2026 23:29:27 +0000</pubDate>
  <lastBuildDate>Sun, 13 Jan 2013 13:48:26 +0000</lastBuildDate>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Web Wiz Forums 11.01</generator>
  <ttl>360</ttl>
  <WebWizForums:feedURL>www.quickpdf.org/forum/RSS_post_feed.asp?TID=2484</WebWizForums:feedURL>
  <image>
   <title><![CDATA[Debenu Quick PDF Library - PDF SDK Community Forum]]></title>
   <url>http://www.quickpdf.org/forum/forum_images/QPDF_Forum_Title.png</url>
   <link>http://www.quickpdf.org/forum/</link>
  </image>
  <item>
   <title><![CDATA[problems with GetPageText() : We would need to see the actual...]]></title>
   <link>http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10402.html#10402</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=1483">AndrewC</a><br /><strong>Subject:</strong> 2484<br /><strong>Posted:</strong> 13 Jan 13 at 1:48PM<br /><br />We would need to see the actual PDF to explain exactly why the results looks the way they do. &nbsp;<div><br></div><div>I suspect the PDF uses different fonts and sizes for the text. &nbsp;Text extraction is not an exact science and it is a bit like putting together a jigsaw puzzle.</div><div><br></div><div>Andrew.</div>]]>
   </description>
   <pubDate>Sun, 13 Jan 2013 13:48:26 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10402.html#10402</guid>
  </item> 
  <item>
   <title><![CDATA[problems with GetPageText() :   Ingo,I am using option 7 but...]]></title>
   <link>http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10397.html#10397</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2128">tj asher</a><br /><strong>Subject:</strong> 2484<br /><strong>Posted:</strong> 09 Jan 13 at 10:02PM<br /><br />Ingo,<div><br>I am using option 7 but for some reason the actual text returned&nbsp;*from* the page is not how it looks&nbsp;*on* the page.</div><div><br>I will consider the option of the text with data positions.</div><div>&nbsp;</div><div>Trying to use Acrobat or Foxit PDF Reader and selecting just the&nbsp;text in question is difficult as other areas get selected that don't appear related so I suspect flaws in the orginization of the PDF document&nbsp;itself.</div><div>&nbsp;</div><div>Regards,</div><div>TJ Asher</div>]]>
   </description>
   <pubDate>Wed, 09 Jan 2013 22:02:27 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10397.html#10397</guid>
  </item> 
  <item>
   <title><![CDATA[problems with GetPageText() : Hi TJ!Option 7 is best for your...]]></title>
   <link>http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10396.html#10396</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 2484<br /><strong>Posted:</strong> 09 Jan 13 at 8:21PM<br /><br />Hi TJ!<br><br>Option 7 is best for your needs.<br>For better parsing you can try the word-by-word-extraction option.<br>Another idea: Do the extraction with the additional data regarding textformatting and positions.<br>Then you can do the layout by your own.<br><br>Cheers and welcome here,<br>Ingo<br><br>]]>
   </description>
   <pubDate>Wed, 09 Jan 2013 20:21:13 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10396.html#10396</guid>
  </item> 
  <item>
   <title><![CDATA[problems with GetPageText() :   Hello,Using Delphi XE2 and...]]></title>
   <link>http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10395.html#10395</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2128">tj asher</a><br /><strong>Subject:</strong> 2484<br /><strong>Posted:</strong> 09 Jan 13 at 7:26PM<br /><br />Hello,<div>&nbsp;</div><div>Using Delphi XE2 and version 912 of Debenu Library VCL component.</div><div>&nbsp;</div><div>Doing a GetPageText I get some odd decoding issues with some PDFs.</div><div>&nbsp;</div><div>A snippet of my code to get the page text which is pretty straigt forward:</div><div>&nbsp;</div><div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for x := 1 to PDFLibrary.PageCount do begin<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; PDFLibrary.SelectPage(x);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Memo1.Text := Memo1.Text + PDFLibrary.GetPageText(7);//passing 7 preserves formatting<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end;<br></div><div>&nbsp;</div><div>Here is how the text looks on the PDF. You'll have to trust me that it looks like this since I cannot post a screen shot.</div><div>&nbsp;</div><div>Tax</div><div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Labor Tax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @&nbsp;&nbsp;&nbsp;&nbsp;7.00%&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $4.20&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Parts Tax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @&nbsp; &nbsp; 7.00%&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $27.75&nbsp;&nbsp;</div><div>&nbsp; Tax Total&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$31.95 </div><div>&nbsp;</div><div>&nbsp;</div><div>When I get the page text I get stuff like this:</div><div>&nbsp;</div><div>Tax<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $4.20<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Labor Tax&nbsp;&nbsp; @&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7.00%&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $27.75<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Parts Tax<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7.00%<br>&nbsp;Tax Total&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $31.95<br></div><div>&nbsp;</div><div>I'm guessing there is something wacky about how this PDF is created. Is there anything I can do to get my page text in a format more closely to how it shows on the actual PDF? I need the text to be properly formatted to parse it.</div><div>&nbsp;</div><div>Thanks for any advice.</div><div>&nbsp;</div><div>Regards,</div><div>TJ Asher</div><div>&nbsp;</div>]]>
   </description>
   <pubDate>Wed, 09 Jan 2013 19:26:36 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10395.html#10395</guid>
  </item> 
 </channel>
</rss>