Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
![]() |
Text extraction and columns |
Post Reply ![]() |
Author | |
StMike38 ![]() Beginner ![]() Joined: 05 Feb 11 Status: Offline Points: 2 |
![]() ![]() ![]() ![]() ![]() Posted: 05 Feb 11 at 10:24PM |
I am trying to extract text in continuous form -- right across lines when in normal paragraphs, successive lines within a column when the text is in columns.
GetPageText( 2 ) does a good job of distinguishing columns and keeping their contents separate when extracting text. But in ordinary paragraphs, GetPageText(2) often breaks lines mid-word, and that's a pain. GetPageText( 3 ) provides better detail in ordinary paragraphs, but intermittently (not always!) it runs columns together. Is there any way to GetPageText( 3 ) to return the text in each column separately [without sacrificing its ability to handle ordinary paragraphs]? StMike38 |
|
![]() |
|
Ingo ![]() Moderator Group ![]() ![]() Joined: 29 Oct 05 Status: Offline Points: 3529 |
![]() ![]() ![]() ![]() ![]() |
Hi Mike!
QuickPDF doesn't offer real support for extracting text columns. First in - first out... and if you're inserting few corrections in the end on the first pageline... the corrections will be extracted last. I would calculate the columns by my own. The extract-functions offer all position and font data. If you want the extraction for searching i can only suggest option 4 (for me the best). Cheers and welcome here, Ingo Edited by Ingo - 06 Feb 11 at 12:41PM |
|
![]() |
|
StMike38 ![]() Beginner ![]() Joined: 05 Feb 11 Status: Offline Points: 2 |
![]() ![]() ![]() ![]() ![]() |
Ingo,
I agree that option 4 is the best for getting at continuous text. On some PDFs it nicely puts out a word at a time. But in a 2 MB PDF file it suddenly decides to output one or two letters at a time. With variable width fonts, there is no sure way of recognizing / calculating space between words. Attempts so far have two undesirable results -- some words wind up fragmented, while parts of separate words get pushed together. Is there any way to get control so that option 4 actually gets a word at a time? StMike38 |
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store