<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="RSS_xslt_style.asp" version="1.0" ?>
<rss version="2.0" xmlns:WebWizForums="http://syndication.webwiz.co.uk/rss_namespace/">
 <channel>
  <title>Debenu Quick PDF Library - PDF SDK Community Forum : Extract table(s) from pdf-file (Delphi)</title>
  <link>http://www.quickpdf.org/forum/</link>
  <description><![CDATA[This is an XML content feed of; Debenu Quick PDF Library - PDF SDK Community Forum : I need help - I can help : Extract table(s) from pdf-file (Delphi)]]></description>
  <copyright>Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved.</copyright>
  <pubDate>Sun, 17 May 2026 16:33:24 +0000</pubDate>
  <lastBuildDate>Tue, 19 May 2020 02:05:40 +0000</lastBuildDate>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Web Wiz Forums 11.01</generator>
  <ttl>360</ttl>
  <WebWizForums:feedURL>www.quickpdf.org/forum/RSS_post_feed.asp?TID=3810</WebWizForums:feedURL>
  <image>
   <title><![CDATA[Debenu Quick PDF Library - PDF SDK Community Forum]]></title>
   <url>http://www.quickpdf.org/forum/forum_images/QPDF_Forum_Title.png</url>
   <link>http://www.quickpdf.org/forum/</link>
  </image>
  <item>
   <title><![CDATA[Extract table(s) from pdf-file (Delphi) : solved with zLib deflate.and now...]]></title>
   <link>http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15255.html#15255</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2557">mLipok</a><br /><strong>Subject:</strong> 3810<br /><strong>Posted:</strong> 19 May 20 at 2:05AM<br /><br />solved with zLib deflate.<div>and now ..... I see that there are in this test.pdf&nbsp; &nbsp; &nbsp; Tj as a glyph not a simple text.<div><br></div></div>]]>
   </description>
   <pubDate>Tue, 19 May 2020 02:05:40 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15255.html#15255</guid>
  </item> 
  <item>
   <title><![CDATA[Extract table(s) from pdf-file (Delphi) : Using this kind of code:	For $iObj_idx...]]></title>
   <link>http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15254.html#15254</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=2557">mLipok</a><br /><strong>Subject:</strong> 3810<br /><strong>Posted:</strong> 18 May 20 at 4:03AM<br /><br /><div>Using this kind of code:<br><div><br></div><div><span style="white-space:pre">	</span>For $iObj_idx =1 To $iObjectCount</div><div><span style="white-space:pre">		</span>ClipPut(BinaryToString($oQP.GetObjectToVariant($iObj_idx)))</div><div><span style="white-space:pre">		</span>MsgBox(0, 'QP Obj: ' &amp; $iObj_idx &amp;&nbsp; ' = ' &amp; $oQP.GetObjectDecodeError($iObj_idx), BinaryToString($oQP.GetObjectToVariant($iObj_idx)))</div><div><span style="white-space:pre">	</span>Next</div></div><div><br></div><div><br></div><div>I Get:</div><div><br></div><div><br></div><div><br></div><div><div>&lt;&lt;</div><div>/Filter /FlateDecode</div><div>/Length 269</div><div>&gt;&gt;</div><div>stream</div><div>xś&#093;‘ÁnĂ †ďy</div><div>Ţ &amp;<span style="white-space:pre">	</span>¤•*_şK›¦m/@ÁT9” šööĂxŮa‡é&#091;č·űóĺĺ’–Mőďeőź´©¸¤Pč±&gt;‹'uĄŰ’:=¨°ří×Úéď.wýůŐĺŻďLŞPswę?ôÜn´ôř5Đ#;OĹĄu'</div></div><div><br></div><div><br></div><div><br></div><div>Question:</div><div>How to decode stream using QuickPDF Library ?</div><div><br></div>]]>
   </description>
   <pubDate>Mon, 18 May 2020 04:03:28 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15254.html#15254</guid>
  </item> 
  <item>
   <title><![CDATA[Extract table(s) from pdf-file (Delphi) : In addition to my previous post:Since...]]></title>
   <link>http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15245.html#15245</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=3193">meligo</a><br /><strong>Subject:</strong> 3810<br /><strong>Posted:</strong> 08 May 20 at 4:52AM<br /><br />In addition to my previous post:<div><br></div><div>Since the test file table.pdf created directly does not store any data about the table as an independent object (as inside .docx: in document.xml, where each table is really a separate table with the &lt;w:tbl&gt; tag), it’s easy to see , looking at its contents in any text viewer (both table.pdf and document.xml), the question arose - what will happen when extracting the tables when there are several of them in the pdf-file and they are partially overlapped?</div><div><br></div><div>To clarify this issue, we will conduct the following experiment: make small changes to the program - after the original string:</div><div>&nbsp; &nbsp; QP.DrawTableRows (tableID, 50, 50, 400, 1, 0);</div><div><br></div><div>insert the lines:</div><div>&nbsp; &nbsp; QP.DrawTableRows (TableID, 70, 210, 400, 1, 0); // 2-nd table shift &lt;right / down&gt;</div><div>&nbsp; &nbsp; QP.DrawTableRows (TableID, 30, 300, 400, 1, 0); // 3-rd table shift &lt;left / up&gt; - partially overlaps 2-nd table</div><div><br></div><div>If you comment out the last row, we will have 2 tables and they will be successfully recognized and extracted from the pdf-file, despite the fact that the 2nd table is partially shifted horizontally and vertically relative to the first.</div><div><br></div><div>However, if we uncomment the third row, the third table partially overlaps the second table, and after recognizing and extracting them, we get an amazing picture: the first table is successfully recognized and can be extracted, but the second and third tables are glued into one complex composite table!&nbsp;<img src="http://www.quickpdf.org/forum/smileys/smiley3.gif" border="0" alt="Shocked" title="Shocked" /></div><div><br></div><div>This fully confirmed our assumption when examining the contents of a pdf file that it does not store any data about the table as an independent object and the pdf2word-algorithm actually performs non-trivial canvas calculations to recognize the table.</div><div><br></div><div>PS: Excuse me for my google-translate english&nbsp;<img src="http://www.quickpdf.org/forum/smileys/smiley9.gif" border="0" alt="Embarrassed" title="Embarrassed" /></div><span style="font-size:10px"><br /><br />Edited by meligo - 08 May 20 at 10:02PM</span>]]>
   </description>
   <pubDate>Fri, 08 May 2020 04:52:51 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15245.html#15245</guid>
  </item> 
  <item>
   <title><![CDATA[Extract table(s) from pdf-file (Delphi) : Dear Ingo!You in vain saw the...]]></title>
   <link>http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15244.html#15244</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=3193">meligo</a><br /><strong>Subject:</strong> 3810<br /><strong>Posted:</strong> 08 May 20 at 2:52AM<br /><br />Dear Ingo!<div>You in vain saw the irony in my previous post!</div><div>I really sincerely thanked you for your prompt reply!</div><div><br></div><div>Regarding the substance of the issue under discussion:</div><div><br></div><div>It could be a false impression that the method for extracting the tables discussed above applies exclusively to tables created in MSWord or OpenOffice documents sent to a virtual PDF printer.</div><div><br></div><div>However, if you create a PDF file with a table directly, exclusively using CreateTable () in DebenuPDFLibrary, as, for example, in the&nbsp;<a href="http://www.quickpdf.org/forum/create-table-exactly-like-this-sample_topic1907.html" rel="nofollow">http://www.quickpdf.org/forum/create-table-exactly-like-this-sample_topic1907.html</a>, it is also easy to extract this table from it using the same technology:</div><div><br></div><div>1. Send the created file (table.pdf) from this demo to a web service, for example,&nbsp;<a href="https://www.pdf2go.com/pdf-to-word" target="_blank" rel="nofollow">https://www.pdf2go.com/pdf-to-word</a>&nbsp;and get the MS Word document back.</div><div><br></div><div><div>2. Open the downloaded file in MS Word and select the table by clicking the icon in the upper left corner of the table.</div><div><br></div><div>3. Now the selected table can be easily extracted (for example, into open MS Excel) using simple Ctrl-C / Ctrl-V - Profit!</div></div><div><br></div><div>Obviously, DebenuPDFLibrary lacks the reverse functionality to CreateTable () - ExtractTable () or something like that!</div><span style="font-size:10px"><br /><br />Edited by meligo - 09 May 20 at 6:57PM</span>]]>
   </description>
   <pubDate>Fri, 08 May 2020 02:52:56 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15244.html#15244</guid>
  </item> 
  <item>
   <title><![CDATA[Extract table(s) from pdf-file (Delphi) : Hi Meligo,seems you don&amp;#039;t...]]></title>
   <link>http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15238.html#15238</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 3810<br /><strong>Posted:</strong> 06 May 20 at 10:26PM<br /><br />Hi Meligo,<br><br>seems you don't know that this is a user forum here and all given help here is given using personal free time?<br>Perhaps somebody can give you an detailed advice...<br><br>]]>
   </description>
   <pubDate>Wed, 06 May 2020 22:26:08 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15238.html#15238</guid>
  </item> 
  <item>
   <title><![CDATA[Extract table(s) from pdf-file (Delphi) : Hi Ingo!Thanks for such a quick...]]></title>
   <link>http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15231.html#15231</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=3193">meligo</a><br /><strong>Subject:</strong> 3810<br /><strong>Posted:</strong> 06 May 20 at 1:33AM<br /><br />Hi Ingo!<div>Thanks for such a quick reply!</div><div><br></div><div>Your answer involves analyzing tables in the more general complex case where the code for these tables can be drawn arbitrarily. And then indeed, the cells of these tables must be calculated in a poorly formalized non-trivial way.</div><div><br></div><div>I consider a simpler case when the source file with standard tables is created, for example, in MS Word or OpenOffice, then it is either saved as a PDF file or printed on a virtual PDF printer, generating the same PDF file in which the structure from the original tables are stored.</div><div><br></div><div>Now, if you send such a PDF file to convert PDF2Word to any of the many free web services and get the “.docx” document, then the whole structure of the source tables will be completely saved in this resulting file, and they can be easily extracted from this document .</div><div><br></div><div>I am attaching an archive <a href="https://yadi.sk/d/nHjE9l9Xkh8HNg" target="_blank" rel="nofollow"><u><b><font size="4">Test.zip</font></b></u></a> with an example MS Word test file with two tables created using the standard “Insert table” method, then a PDF file, obtained from it via a virtual PDF printer, and the result of the pdf2word file conversion, as well as two text data tables separated by &lt;TAB&gt; extracted from the last file (my own table extractor program from msword document).</div><div><br></div><div>In my case, I would like to eliminate the additional step of PDF2Word conversion and immediately extract these tables directly from PDF, since such information, as we see, is stored in the pdf file and is successfully restored during conversion.</div><div><br></div><div>Note: if you drag these two text files, for example, into open MS Excel, you will see two extracted tables from the PDF file, including, by the way, empty cells, the problem of which you discussed here on the forum in one of your posts.</div><span style="font-size:10px"><br /><br />Edited by meligo - 08 May 20 at 5:04AM</span>]]>
   </description>
   <pubDate>Wed, 06 May 2020 01:33:01 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15231.html#15231</guid>
  </item> 
  <item>
   <title><![CDATA[Extract table(s) from pdf-file (Delphi) : Hi Meligo,sorry but like you&amp;#039;ve...]]></title>
   <link>http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15229.html#15229</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=111">Ingo</a><br /><strong>Subject:</strong> 3810<br /><strong>Posted:</strong> 04 May 20 at 7:04PM<br /><br />Hi Meligo,<br><br>sorry but like you've determined already there aren't samples published about extraction of table content.<br>You should take a deeper look into the text extraction functionalities for your needs. The csv-option should be relevant for you to have data for positions of rows and columns.<br><br>Like AndrewC (R.I.P) told in the past:<br>"There is going to be no easy way to read this with Debenu Quick PDF Library without some complex strategies.".<br><br>His advice for the first steps was:<br>"Try this<br><br>&nbsp; QP.SetTextExtractionScaling(0, 2, 8);<br>&nbsp; QP.SetTextExtractionWordGap(0.2);<br><br>And then call GetPageText(7);&nbsp; You can then split the text into columns and trim leading and trailing spaces.".<br><br>Good luck!<br><br><br>Cheers and welcome here,<br>Ingo<br><br><br><br>]]>
   </description>
   <pubDate>Mon, 04 May 2020 19:04:07 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15229.html#15229</guid>
  </item> 
  <item>
   <title><![CDATA[Extract table(s) from pdf-file (Delphi) : Clarification:Each extracted table...]]></title>
   <link>http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15228.html#15228</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=3193">meligo</a><br /><strong>Subject:</strong> 3810<br /><strong>Posted:</strong> 04 May 20 at 5:17PM<br /><br />Clarification:<div>Each extracted table would have to be a text block in the form of a list of rows, where each row of the list is a row of the table, and the fields in these rows should be separated by a separator, for example, &lt;TAB&gt;.</div>]]>
   </description>
   <pubDate>Mon, 04 May 2020 17:17:11 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15228.html#15228</guid>
  </item> 
  <item>
   <title><![CDATA[Extract table(s) from pdf-file (Delphi) : Hi!How to extract a table (or...]]></title>
   <link>http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15227.html#15227</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://www.quickpdf.org/forum/member_profile.asp?PF=3193">meligo</a><br /><strong>Subject:</strong> 3810<br /><strong>Posted:</strong> 04 May 20 at 5:02PM<br /><br /><div>Hi!</div><div><br></div><div>How to extract a table (or tables, if there are several) from PDF file.&nbsp;</div><div>I looked at the functions on your site, and also looked for examples for extracting tables,&nbsp;</div><div>but did not find anything like it.</div><div><br></div><div>An example of use would be desirable on Delphi, if possible,&nbsp;</div><div>but it is possible on Sharpe too.&nbsp;</div><div><br></div><div>Thanks in advance!</div>]]>
   </description>
   <pubDate>Mon, 04 May 2020 17:02:26 +0000</pubDate>
   <guid isPermaLink="true">http://www.quickpdf.org/forum/extract-tables-from-pdffile-delphi_topic3810_post15227.html#15227</guid>
  </item> 
 </channel>
</rss>