Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Text extraction and columns
  FAQ FAQ  Forum Search   Register Register  Login Login

Text extraction and columns

 Post Reply Post Reply
Author
Message
StMike38 View Drop Down
Beginner
Beginner


Joined: 05 Feb 11
Status: Offline
Points: 2
Post Options Post Options   Thanks (0) Thanks(0)   Quote StMike38 Quote  Post ReplyReply Direct Link To This Post Topic: Text extraction and columns
    Posted: 05 Feb 11 at 10:24PM
I am trying to extract text in continuous form -- right across lines when in normal paragraphs, successive lines within a column when the text is in columns.

GetPageText( 2 ) does a good job of distinguishing columns and keeping their contents separate when extracting text. But in ordinary paragraphs, GetPageText(2) often breaks lines mid-word, and that's a pain.

GetPageText( 3 ) provides better detail in ordinary paragraphs, but intermittently (not always!) it runs columns together.

Is there any way to GetPageText( 3 ) to return the text in each column separately [without sacrificing its ability to handle ordinary paragraphs]?

StMike38
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3529
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 06 Feb 11 at 12:40PM
Hi Mike!

QuickPDF doesn't offer real support for extracting text columns.
First in - first out... and if you're inserting few corrections in the end on the first pageline...
the corrections will be extracted last.

I would calculate the columns by my own.
The extract-functions offer all position and font data.

If you want the extraction for searching i can only suggest option 4 (for me the best).

Cheers and welcome here,
Ingo


Edited by Ingo - 06 Feb 11 at 12:41PM
Back to Top
StMike38 View Drop Down
Beginner
Beginner


Joined: 05 Feb 11
Status: Offline
Points: 2
Post Options Post Options   Thanks (0) Thanks(0)   Quote StMike38 Quote  Post ReplyReply Direct Link To This Post Posted: 10 Feb 11 at 12:49AM
Ingo,

I agree that option 4 is the best for getting at continuous text. On some PDFs it nicely puts out a word at a time. But in a 2 MB PDF file it suddenly decides to output one or two letters at a time. With variable width fonts, there is no sure way of recognizing / calculating space between words. Attempts so far have two undesirable results -- some words wind up fragmented, while parts of separate words get pushed together.

Is there any way to get control so that option 4 actually gets a word at a time?

StMike38
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store