<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="RSS_xslt_style.asp" version="1.0" ?>
<rss version="2.0" xmlns:WebWizForums="http://syndication.webwiz.co.uk/rss_namespace/">
 <channel>
  <title>Debenu Quick PDF Library - PDF SDK Community Forum : ExtractFilePageText - Options 0 and 8</title>
  <link>http://www.quickpdf.org/forum/</link>
  <description><![CDATA[This is an XML content feed of; Debenu Quick PDF Library - PDF SDK Community Forum : I need help - I can help : ExtractFilePageText - Options 0 and 8]]></description>
  <copyright>Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved.</copyright>
  <pubDate>Fri, 01 May 2026 06:33:54 +0000</pubDate>
  <lastBuildDate>Thu, 03 Jul 2014 07:52:58 +0000</lastBuildDate>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Web Wiz Forums 11.01</generator>
  <ttl>360</ttl>
  <WebWizForums:feedURL>www.quickpdf.org/forum/RSS_post_feed.asp?TID=2929</WebWizForums:feedURL>
  <image>
   <title><![CDATA[Debenu Quick PDF Library - PDF SDK Community Forum]]></title>
   <url>http://www.quickpdf.org/forum/forum_images/QPDF_Forum_Title.png</url>
   <link>http://www.quickpdf.org/forum/</link>
  </image>
  <item>
   <title><![CDATA[ExtractFilePageText - Options 0 and 8 : I will but please understand me:...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11841.html#11841</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2557">mLipok</a><br /><strong>Subject:</strong> 2929<br /><strong>Posted:</strong> 03 Jul 14 at 7:52AM<br /><br />I will but please understand me: I apply security procedures for the protection of personal data. Encrypt PDF files using PGP in this case is a standard option, and I can not ignore this point my client's internal rules.]]>
   </description>
   <pubDate>Thu, 03 Jul 2014 07:52:58 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11841.html#11841</guid>
  </item> 
  <item>
   <title><![CDATA[ExtractFilePageText - Options 0 and 8 : Michael,You create a support case...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11840.html#11840</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=1483">AndrewC</a><br /><strong>Subject:</strong> 2929<br /><strong>Posted:</strong> 03 Jul 14 at 7:41AM<br /><br /><div><br></div><div>Michael,</div><div>You create a support case and it will only seen by support staff and can be deleted when resolved.</div><div><br></div><div>Andrew.</div>]]>
   </description>
   <pubDate>Thu, 03 Jul 2014 07:41:33 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11840.html#11840</guid>
  </item> 
  <item>
   <title><![CDATA[ExtractFilePageText - Options 0 and 8 : I can send you this PDF file but...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11839.html#11839</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2557">mLipok</a><br /><strong>Subject:</strong> 2929<br /><strong>Posted:</strong> 03 Jul 14 at 7:36AM<br /><br />I can send you this PDF file but you must send me your public GPG key for encrypt this file.<div><br></div>]]>
   </description>
   <pubDate>Thu, 03 Jul 2014 07:36:33 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11839.html#11839</guid>
  </item> 
  <item>
   <title><![CDATA[ExtractFilePageText - Options 0 and 8 : We would need to see the original...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11838.html#11838</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=1483">AndrewC</a><br /><strong>Subject:</strong> 2929<br /><strong>Posted:</strong> 03 Jul 14 at 7:02AM<br /><br /><div><br></div>We would need to see the original PDF file.<div><br></div><div>The problem is most likely that the two text blocks are using a different font or could have overlapping bounding boxes. FineReader doesn't always output the cleanest text boxes.</div><div><br></div><div>Option 0 will only work on some files. &nbsp;Option 8 extracts all text lines and outputs them 1 by 1. &nbsp;A line of text is consider a group of characters that have the same font and size and colour. &nbsp;You can ignore some of these options by using SetTextExtractionOptions.</div><div><br></div><div>SetTextExtractionOptions is quite powerful and can be used to solve all sorts of complex PDF issues. &nbsp;</div><div><br></div><div>Text extraction, like OCR, is not an exact science and Debenu Quick PDF Library has to make decisions about where words and linebreaks are located which requires characters to be first grouped and then analysed into words and then lines. &nbsp;We can get it wrong when PDF's use strange logic, fonts without any font information, fonts without a ToUnicode table, overlapping bounding boxes etc...</div><div><br></div><div><br></div><div>Andrew</div>]]>
   </description>
   <pubDate>Thu, 03 Jul 2014 07:02:22 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11838.html#11838</guid>
  </item> 
  <item>
   <title><![CDATA[ExtractFilePageText - Options 0 and 8 : I need this because of this:ht...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11837.html#11837</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2557">mLipok</a><br /><strong>Subject:</strong> 2929<br /><strong>Posted:</strong> 02 Jul 14 at 1:04PM<br /><br />I need this because of this:<div>http://www.quickpdf.org/forum/extractfilepagetext-strange-behavior_topic2906.html</div><div><br></div><div>btw.</div><div>option 7 works OK.</div><div><br></div><div><b>So now I have a question.</b></div><div><br></div><div><div>What is the real difference between the option 7 and 8 ?&nbsp;</div><div><br></div><div>I have observed that in the case of option 7, the result contains the indentation so that after writing the output to a file, text file, for example, is located on the right side (there are extra spaces on the left), provided that it was located in a PDF file.&nbsp;</div><div><br></div><div><br></div><div>Or in some specific cases, option 8, gives more text than option 7?</div></div><div><br></div>]]>
   </description>
   <pubDate>Wed, 02 Jul 2014 13:04:36 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11837.html#11837</guid>
  </item> 
  <item>
   <title><![CDATA[ExtractFilePageText - Options 0 and 8 : In some cases I have issue like...]]></title>
   <link>http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11836.html#11836</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2557">mLipok</a><br /><strong>Subject:</strong> 2929<br /><strong>Posted:</strong> 02 Jul 14 at 12:49PM<br /><br /><div>In some cases I have issue like this:</div><div><br></div><div>I have PDF scaned and OCR with FineReader Recognition Server 3..</div><div>there is something like this</div><div><br></div><div><font color="#0033ff">blabla</font></div><div><font color="#0033ff">TEXT1 TEXT2</font></div><div><font color="#0033ff">blablabla</font></div><div><br></div>When I use option 0 then I get:<div><font color="#0033ff">....</font></div><div><font color="#0033ff">....</font></div><div><font color="#0033ff">TEXT1 TEXT2</font></div><div><div><font color="#0033ff">....</font></div><div><font color="#0033ff">....</font></div></div><div><br></div><div><br></div><div>When I use option 8 then I get:<div><font color="#0033ff">....</font></div><div><font color="#0033ff">....</font></div><div><font color="#0033ff">TEXT1</font></div><div><font color="#0033ff">TEXT2</font></div><div><div><font color="#0033ff">....</font></div><div><font color="#0033ff">....</font></div></div><div><br></div><div>I need to use option 8 because this option give me all content.</div></div><div>But I want to get text in this same line like in option 0.</div><div><br></div>]]>
   </description>
   <pubDate>Wed, 02 Jul 2014 12:49:19 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extractfilepagetext-options-0-and-8_topic2929_post11836.html#11836</guid>
  </item> 
 </channel>
</rss>