Debenu Quick PDF Library - PDF SDK Community Forum : Extract non-formatted Tabular Text

Extract non-formatted Tabular Text : Sorry Andrew I was too quick with...

Wed, 04 Feb 2015 10:17:24 +0000

Author: chrisreed
Subject: 3057
Posted: 04 Feb 15 at 10:17AM

Sorry Andrew I was too quick with my reply.

Yes if I use Option 7 it matches very well what is on the PDF file - thanks for your help.

Chris

Extract non-formatted Tabular Text : Hi Andrew, Sorry for the lateness...

Wed, 04 Feb 2015 07:10:50 +0000

Author: chrisreed
Subject: 3057
Posted: 04 Feb 15 at 7:10AM

Hi Andrew,

Sorry for the lateness in my reply, but I never received an e-mail that you had posted a reply

Believe me I tried all the Extraction Options (from 1 to 11) and none of them were any good. So instead of having the fields/values go across the page I just had them going down the page as follows:

Surname: Tester

Firstname: Kenneth

DOB: 29 Mar 1928

Exam Date: 30 Jan 2015 07:46

Site ID: RPH etc....

and used the Extraction Option (5) - Sort text blocks based on top left position.

This worked a lot better, in that this option returned most of the first and then the next, but some still got mixed up so that I couldn't associate all the correct with the matching .

Extract non-formatted Tabular Text : Chris,PDF's file do not have...

Tue, 27 Jan 2015 10:18:19 +0000

Author: AndrewC
Subject: 3057
Posted: 27 Jan 15 at 10:18AM

Chris,

PDF's file do not have TAB characters, words, sentences or paragraphs. Text is drawn at a specific x and y location. Extraction attempts to collect all the drawn text but is not always perfect.

GetPageText of DAExtractPageText using option 7 will be your best chance.

Andrew.

Extract non-formatted Tabular Text : Can't find any site to upload...

Wed, 21 Jan 2015 10:16:41 +0000

Author: chrisreed
Subject: 3057
Posted: 21 Jan 15 at 10:16AM

Can't find any site to upload the example PDF that I'm trying to process without our Firewall blocking it (tried docdroid, scribd, dropbox) so the best I can do is upload an image.

http://s5.postimg.org/5hncgugsn/Example_PDF.jpg

The text "looks" like it is separated by TABS, but there is no formatting. When I try to use the DAExtractPageText and DAExtractBlockText functions, instead of the : aligning with each, they are all over the place.

I also tried all the differenet options in DASetTextExtractionOptions to no avail.

How can I extract this unformatted text so the : align with each other

eg. Surname: TEST etc.

Thanks Chris.