Print Page | Close Window

text extraction

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1636
Printed Date: 04 Apr 26 at 8:43AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: text extraction
Posted By: rajeev
Subject: text extraction
Date Posted: 10 Nov 10 at 2:49PM
Hi,
I used php to successfully read the lines from pdf file of a newspaper. The problem is that it reads char by char or word by word only. I wish to read the file paragraph by paragraph. any help for this?
 
Also i could extract images from pdf, but i also need the coordinates where the image was placed.
any help will be appreciated..
 
 
 
 



Replies:
Posted By: Ingo
Date Posted: 10 Nov 10 at 7:32PM
Hi!

With QuickPDF you can do textextraction from pdf word by word, string by string and/or page by page. Have a look in the online reference accessable via www.quickpdf.org.

The image coordinates you can get via the relevant mediaboxes. Read the pdf with QuickPDF, then decryption, then reading the real content (like looking into pdf via notepad).

Cheers and welcome here,
Ingo



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk