Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
Extract Text From Renered Pages. |
Post Reply |
Author | |
alinux08
Team Player Joined: 20 Jun 12 Status: Offline Points: 20 |
Post Options
Thanks(0)
Posted: 20 Jun 12 at 3:25PM |
Is it possible to extract text from a rendered page based on a user-defined bounding box?
Thanks. Mark |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Mark!
A rendered page means an image for me. So it's not possible to extract text from it...? Cheers and welcome here, Ingo |
|
alinux08
Team Player Joined: 20 Jun 12 Status: Offline Points: 20 |
Post Options
Thanks(0)
|
Ingo, thanks.
What about extracting text from the real page based on a defined boundary box? |
|
AndrewC
Moderator Group Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
Post Options
Thanks(0)
|
You can use SetTextExtractionArea to limit the extraction results. If you are wanting to perform multiple extractions from the same page then it would be more efficient to process the bounding box results from GetPageText(3) or (4) yourself which is quite easy to do. If you can highlight and select (copy/paste) text using Acrobat Reader then it should be possible to use GetPageText to perform text extraction. Many image based documents have been processed using OCR. Andrew.
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store