<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="RSS_xslt_style.asp" version="1.0" ?>
<rss version="2.0" xmlns:WebWizForums="http://syndication.webwiz.co.uk/rss_namespace/">
 <channel>
  <title>Debenu Quick PDF Library - PDF SDK Community Forum : Table empty cells text extraction</title>
  <link>http://www.quickpdf.org/forum/</link>
  <description><![CDATA[This is an XML content feed of; Debenu Quick PDF Library - PDF SDK Community Forum : I need help - I can help : Table empty cells text extraction]]></description>
  <copyright>Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved.</copyright>
  <pubDate>Tue, 26 May 2026 08:23:45 +0000</pubDate>
  <lastBuildDate>Mon, 16 Jun 2014 15:17:39 +0000</lastBuildDate>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Web Wiz Forums 11.01</generator>
  <ttl>360</ttl>
  <WebWizForums:feedURL>www.quickpdf.org/forum/RSS_post_feed.asp?TID=2918</WebWizForums:feedURL>
  <image>
   <title><![CDATA[Debenu Quick PDF Library - PDF SDK Community Forum]]></title>
   <url>http://www.quickpdf.org/forum/forum_images/QPDF_Forum_Title.png</url>
   <link>http://www.quickpdf.org/forum/</link>
  </image>
  <item>
   <title><![CDATA[Table empty cells text extraction : Marco,Try this QP.SetTextExtractionScaling(0,...]]></title>
   <link>http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11799.html#11799</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=1483">AndrewC</a><br /><strong>Subject:</strong> 2918<br /><strong>Posted:</strong> 16 Jun 14 at 3:17PM<br /><br />Marco,<div><br></div><div>Try this</div><div><br></div><div>&nbsp; QP.SetTextExtractionScaling(0, 2, 8);</div><div>&nbsp; QP.SetTextExtractionWordGap(0.2);</div><div><br></div><div>And then call GetPageText(7); &nbsp;You can then split the text into columns and trim leading and trailing spaces.</div><div><br></div><div>Andrew.</div>]]>
   </description>
   <pubDate>Mon, 16 Jun 2014 15:17:39 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11799.html#11799</guid>
  </item> 
  <item>
   <title><![CDATA[Table empty cells text extraction : Hi Andrew,my files are not a lot...]]></title>
   <link>http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11798.html#11798</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2582">MarcoCir</a><br /><strong>Subject:</strong> 2918<br /><strong>Posted:</strong> 16 Jun 14 at 3:16PM<br /><br />Hi Andrew,<div>my files are not a lot of pages long, but I need to process them ...20 times a day, every day of the year.</div><div>So, according to Ingo suggestions I've now developed some Delphi code with the option 4, using x and y coordinates to find if a cell is empty (so, skipped by the library) and the results are not so bad.&nbsp;<span style="line-height: 1.4;">Anyway, I'll continue to search for a better solution, thank you all.</span></div><div><span style="line-height: 1.4;">Marco</span></div>]]>
   </description>
   <pubDate>Mon, 16 Jun 2014 15:16:26 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11798.html#11798</guid>
  </item> 
  <item>
   <title><![CDATA[Table empty cells text extraction : Marco,There is going to be no...]]></title>
   <link>http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11797.html#11797</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=1483">AndrewC</a><br /><strong>Subject:</strong> 2918<br /><strong>Posted:</strong> 16 Jun 14 at 3:07PM<br /><br />Marco,<div><br></div><div>There is going to be no easy way to read this with Debenu Quick PDF Library without some complex strategies.</div><div><br></div><div>How many pages do you need to process and how often ?</div><div><br></div><div>Andrew.</div>]]>
   </description>
   <pubDate>Mon, 16 Jun 2014 15:07:17 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11797.html#11797</guid>
  </item> 
  <item>
   <title><![CDATA[Table empty cells text extraction : Hi Andrew, and thanks.This is...]]></title>
   <link>http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11796.html#11796</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2582">MarcoCir</a><br /><strong>Subject:</strong> 2918<br /><strong>Posted:</strong> 16 Jun 14 at 9:21AM<br /><br />Hi Andrew, and thanks.<div>This is the link to that PDF file:&nbsp;</div><div><br></div><div>http://www.inbet.it/pdf/calcioq.pdf</div><div><br></div><div>As you see, some cells are empty (without odds), so my problems.</div><div>I've followed the path suggested by Ingo, with some improvement (but the algorithm is not trivial ;) ).</div><div><br></div><div>Marco</div><div><br></div><div><br></div>]]>
   </description>
   <pubDate>Mon, 16 Jun 2014 09:21:51 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11796.html#11796</guid>
  </item> 
  <item>
   <title><![CDATA[Table empty cells text extraction : Marco,You should call QP.NormalizePage(0);...]]></title>
   <link>http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11794.html#11794</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=1483">AndrewC</a><br /><strong>Subject:</strong> 2918<br /><strong>Posted:</strong> 16 Jun 14 at 7:17AM<br /><br />Marco,<div><br></div><div>You should call QP.NormalizePage(0); before calling the text extraction functions.</div><div><br></div><div>A PDF file has no concept of cells, sentences or paragraphs. &nbsp;PDF text is just placed on the page like a jigsaw puzzle. &nbsp;Some pieces can contain 1 character, others multiple characters and there are many functions that make this more complex such as character spacing, word spacing etc..</div><div><br></div><div>Debenu Quick PDF Library just like any other PDF library has to collect all the characters and their positions on the page and group characters into words and then into lines. &nbsp;It is not an exact science.</div><div><br></div><div>Your best option is to try to use GetPageText(7) and the SetTextExtractionScaling function if you are using tightly spaced proportional fonts.</div><div><br></div><div>If you provide a link to the PDF I could then suggest the best method for extraction. &nbsp;I have a few years of experience in complex table extraction from PDF files.</div><div><br></div><div>Andrew</div>]]>
   </description>
   <pubDate>Mon, 16 Jun 2014 07:17:56 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11794.html#11794</guid>
  </item> 
  <item>
   <title><![CDATA[Table empty cells text extraction :  Hi Marco,regarding the coordinates...]]></title>
   <link>http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11788.html#11788</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 2918<br /><strong>Posted:</strong> 12 Jun 14 at 12:29PM<br /><br />Hi Marco,<div>&nbsp;</div><div>regarding the coordinates this could have to do with the use of rotate-functionality. Rotation will rotate the position data, too ;-)</div><div>If you want to calculate and using some algos you should Keep in mind that pdf-documents can look similar but can be different inside. For example two documents with format DIN A4 but one was initially landscape and then rotated to DIN A4. So check the rotation first (there's an explained function in the reference) and use SetOrigin.</div><div>&nbsp;</div><div>Cheers, Ingo</div><span style="font-size:10px"><br /><br />Edited by Ingo - 12 Jun 14 at 12:31PM</span>]]>
   </description>
   <pubDate>Thu, 12 Jun 2014 12:29:56 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11788.html#11788</guid>
  </item> 
  <item>
   <title><![CDATA[Table empty cells text extraction : Hi Ingo.I&amp;#039;ve take a look...]]></title>
   <link>http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11787.html#11787</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2582">MarcoCir</a><br /><strong>Subject:</strong> 2918<br /><strong>Posted:</strong> 12 Jun 14 at 10:28AM<br /><br />Hi Ingo.<div>I've take a look a bit deeper to the option 4.</div><div>First of all I think I've found a little problem in the documentation: the coordinates of the points returned in the option 4 are in the Y,X order, not X,Y (moving from a cell of the table to the next one in horizontal order - same row - it seems that are the Ys that are changing instead of, of course, the Xs.</div><div>That said, yes, did you tell me to look for the coordinates returned cell by cell and from those values, with a simple algorithm, find if there is some cell that is missing from the extraction process?&nbsp;<span style="line-height: 1.4;">thank you, I'll try this path now!</span></div><div>Marco</div><div><br></div><div><br></div>]]>
   </description>
   <pubDate>Thu, 12 Jun 2014 10:28:23 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11787.html#11787</guid>
  </item> 
  <item>
   <title><![CDATA[Table empty cells text extraction : Hi Marco,you should use option...]]></title>
   <link>http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11783.html#11783</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 2918<br /><strong>Posted:</strong> 10 Jun 14 at 9:25PM<br /><br />Hi Marco,<div><br></div><div>you should use option 4.</div><div>Then you'll get returning strings with position data.</div><div>Having the position data page by page you can check where there's something missing.</div><div>My idea is getting data for example for row 1 col 1, row 1 col 3, row 1 col 4, row 2 col 1, row 2 col 2, ... let you know that row 1 col 2 is an empty field ;-)</div><div><br></div><div>Cheers and welcome here,</div><div>Ingo</div><div>&nbsp;</div>]]>
   </description>
   <pubDate>Tue, 10 Jun 2014 21:25:05 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11783.html#11783</guid>
  </item> 
  <item>
   <title><![CDATA[Table empty cells text extraction : Hi,I have a PDF with a big (multi...]]></title>
   <link>http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11778.html#11778</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2582">MarcoCir</a><br /><strong>Subject:</strong> 2918<br /><strong>Posted:</strong> 08 Jun 14 at 12:17PM<br /><br />Hi,<div>I have a PDF with a big (multi page) table and&nbsp;<span style="line-height: 1.4;">I'm trying to extract all text present in that table, using the QPDF library for Delphi 2007 and&nbsp;</span>ExtractFilePageText function (with Options "8" as the best algorithm choice for that type of table).</div><div><br></div><div>The problem is: not all table cells are filled with text , some are empty, s<span style="line-height: 1.4;">o I'm going crazy searching a way to let the library put in the output text file some sort of placeholder to inform me that some cells are empty. N</span><span style="line-height: 1.4;">ow I have an output text file with a new line of text for every table cell with text into, but NO lines at all when a cell contains no text (so no way to know that in a specific line and column there is an empty cell, a type of information that I need to recostruct the whole table content).</span></div><div><span style="line-height: 1.4;"><br></span></div><div><span style="line-height: 1.4;">My ideal solution could be to get the library to output a blank line when it finds an empty cell, and not, I repeat, simply skip to next cell with text (letting me with the only chance to reconstruct the total number of cells empty for each line of the table , but no chance to know in which column I have text and in which not).</span></div><div><span style="line-height: 1.4;"><br></span></div><div><span style="line-height: 1.4;">Any help will be much appreciated!</span></div><div><span style="line-height: 1.4;"><br></span></div><div>Marco</div><div><span style="line-height: 1.4;"><br></span></div>]]>
   </description>
   <pubDate>Sun, 08 Jun 2014 12:17:15 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/table-empty-cells-text-extraction_topic2918_post11778.html#11778</guid>
  </item> 
 </channel>
</rss>