Debenu Quick PDF Library - PDF SDK Community Forum : problems with GetPageText()

Debenu Quick PDF Library - PDF SDK Community Forum : problems with GetPageText() http://www.quickpdf.org/forum/ Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved. Sun, 21 Jun 2026 09:38:40 +0000 Sun, 13 Jan 2013 13:48:26 +0000 http://blogs.law.harvard.edu/tech/rss Web Wiz Forums 11.01 360 www.quickpdf.org/forum/RSS_post_feed.asp?TID=2484 <![CDATA[Debenu Quick PDF Library - PDF SDK Community Forum]]> http://www.quickpdf.org/forum/forum_images/QPDF_Forum_Title.png http://www.quickpdf.org/forum/ <![CDATA[problems with GetPageText() : We would need to see the actual...]]> http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10402.html#10402 Author: AndrewC
Subject: 2484
Posted: 13 Jan 13 at 1:48PM

We would need to see the actual PDF to explain exactly why the results looks the way they do.

I suspect the PDF uses different fonts and sizes for the text. Text extraction is not an exact science and it is a bit like putting together a jigsaw puzzle.

Andrew.

]]> Sun, 13 Jan 2013 13:48:26 +0000 http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10402.html#10402 <![CDATA[problems with GetPageText() : Ingo,I am using option 7 but...]]> http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10397.html#10397 Author: tj asher
Subject: 2484
Posted: 09 Jan 13 at 10:02PM

Ingo,

I am using option 7 but for some reason the actual text returned *from* the page is not how it looks *on* the page.

I will consider the option of the text with data positions.

Trying to use Acrobat or Foxit PDF Reader and selecting just the text in question is difficult as other areas get selected that don't appear related so I suspect flaws in the orginization of the PDF document itself.

Regards,

TJ Asher

]]> Wed, 09 Jan 2013 22:02:27 +0000 http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10397.html#10397 <![CDATA[problems with GetPageText() : Hi TJ!Option 7 is best for your...]]> http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10396.html#10396 Author: Ingo
Subject: 2484
Posted: 09 Jan 13 at 8:21PM

Hi TJ!

Option 7 is best for your needs.
For better parsing you can try the word-by-word-extraction option.
Another idea: Do the extraction with the additional data regarding textformatting and positions.
Then you can do the layout by your own.

Cheers and welcome here,
Ingo

]]> Wed, 09 Jan 2013 20:21:13 +0000 http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10396.html#10396 <![CDATA[problems with GetPageText() : Hello,Using Delphi XE2 and...]]> http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10395.html#10395 Author: tj asher
Subject: 2484
Posted: 09 Jan 13 at 7:26PM

Hello,

Using Delphi XE2 and version 912 of Debenu Library VCL component.

Doing a GetPageText I get some odd decoding issues with some PDFs.

A snippet of my code to get the page text which is pretty straigt forward:

          for x := 1 to PDFLibrary.PageCount do begin
            PDFLibrary.SelectPage(x);
            Memo1.Text := Memo1.Text + PDFLibrary.GetPageText(7);//passing 7 preserves formatting
          end;

Here is how the text looks on the PDF. You'll have to trust me that it looks like this since I cannot post a screen shot.

Tax

Labor Tax @ 7.00% $4.20
Parts Tax @ 7.00% $27.75

Tax Total $31.95

When I get the page text I get stuff like this:

Tax
                                             $4.20
           Labor Tax   @       7.00%        $27.75
                      @
            Parts Tax
                               7.00%
Tax Total                                  $31.95

I'm guessing there is something wacky about how this PDF is created. Is there anything I can do to get my page text in a format more closely to how it shows on the actual PDF? I need the text to be properly formatted to parse it.

Thanks for any advice.

Regards,

TJ Asher

]]> Wed, 09 Jan 2013 19:26:36 +0000 http://www.quickpdf.org/forum/problems-with-getpagetext_topic2484_post10395.html#10395