Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
Extract non-formatted Tabular Text |
Post Reply |
Author | |
chrisreed
Team Player Joined: 29 Apr 13 Location: Australia Status: Offline Points: 35 |
Post Options
Thanks(0)
Posted: 21 Jan 15 at 10:16AM |
Can't find any site to upload the example PDF that I'm trying to process without our Firewall blocking it (tried docdroid, scribd, dropbox) so the best I can do is upload an image.
The text "looks" like it is separated by TABS, but there is no formatting. When I try to use the DAExtractPageText and DAExtractBlockText functions, instead of the <Field Name>: <Field Value> aligning with each, they are all over the place.
I also tried all the differenet options in DASetTextExtractionOptions to no avail.
How can I extract this unformatted text so the <Field Name>: <Field Value> align with each other
eg. Surname: TEST etc.
Thanks Chris. |
|
AndrewC
Moderator Group Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
Post Options
Thanks(0)
|
Chris,
PDF's file do not have TAB characters, words, sentences or paragraphs. Text is drawn at a specific x and y location. Extraction attempts to collect all the drawn text but is not always perfect. GetPageText of DAExtractPageText using option 7 will be your best chance. Andrew.
|
|
chrisreed
Team Player Joined: 29 Apr 13 Location: Australia Status: Offline Points: 35 |
Post Options
Thanks(0)
|
Hi Andrew,
Sorry for the lateness in my reply, but I never received an e-mail that you had posted a reply
Believe me I tried all the Extraction Options (from 1 to 11) and none of them were any good. So instead of having the fields/values go across the page I just had them going down the page as follows:
<Field Name> <Field Value>
Surname: Tester
Firstname: Kenneth
DOB: 29 Mar 1928
Exam Date: 30 Jan 2015 07:46
Site ID: RPH etc....
and used the Extraction Option (5) - Sort text blocks based on top left position.
This worked a lot better, in that this option returned most of the <Field Names> first and then the <Field Values> next, but some still got mixed up so that I couldn't associate all the correct <Field Name> with the matching <Field Value>.
|
|
chrisreed
Team Player Joined: 29 Apr 13 Location: Australia Status: Offline Points: 35 |
Post Options
Thanks(0)
|
Sorry Andrew I was too quick with my reply.
Yes if I use Option 7 it matches very well what is on the PDF file - thanks for your help.
Chris
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store