Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Extract Text From Renered Pages.
  FAQ FAQ  Forum Search   Register Register  Login Login

Extract Text From Renered Pages.

 Post Reply Post Reply
Author
Message
alinux08 View Drop Down
Team Player
Team Player


Joined: 20 Jun 12
Status: Offline
Points: 20
Post Options Post Options   Thanks (0) Thanks(0)   Quote alinux08 Quote  Post ReplyReply Direct Link To This Post Topic: Extract Text From Renered Pages.
    Posted: 20 Jun 12 at 3:25PM
Is it possible to extract text from a rendered page based on a user-defined bounding box?

Thanks.

Mark
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 20 Jun 12 at 8:03PM
Hi Mark!

A rendered page means an image for me.
So it's not possible to extract text from it...?

Cheers and welcome here,
Ingo
Back to Top
alinux08 View Drop Down
Team Player
Team Player


Joined: 20 Jun 12
Status: Offline
Points: 20
Post Options Post Options   Thanks (0) Thanks(0)   Quote alinux08 Quote  Post ReplyReply Direct Link To This Post Posted: 22 Jun 12 at 10:42PM
Ingo, thanks.

What about extracting text from the real page based on a defined boundary box?
Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 23 Jun 12 at 8:44AM

You can use SetTextExtractionArea to limit the extraction results.


If you are wanting to perform multiple extractions from the same page then it would be more efficient to process the bounding box results from GetPageText(3) or (4) yourself which is quite easy to do.

If you can highlight and select (copy/paste) text using Acrobat Reader then it should be possible to use GetPageText to perform text extraction.  Many image based documents have been processed using OCR.

Andrew.
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store