Print Page | Close Window

Creating PDF from Image + hOCR data

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=2562
Printed Date: 02 May 25 at 12:53PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Creating PDF from Image + hOCR data
Posted By: Shuaib
Subject: Creating PDF from Image + hOCR data
Date Posted: 12 Mar 13 at 9:32AM
Hi,

I am using tesseract to ocr images, now I would like to create a pdf out of the original OCRed image, plus the hOCR output I get from ocr engine. Can anyone please guide me in the right direction on how I can use Quick PDF Library to achieve this? Google didn't turn up anything.

Thanks.



Replies:
Posted By: Ingo
Date Posted: 12 Mar 13 at 10:13PM
Hi!

With the draw-functionalities of QP you can insert the real text
without having an eye on the layout.
On a new layer you can use DrawImage to insert the text with
layout over all.
So you can work with textextraction as well as having the nice
original layout.
Here's the online reference:
http://www.debenu.com/docs/pdf_library_reference/FunctionGroups.php

Cheers and welcome here,
Ingo



Posted By: AndrewC
Date Posted: 14 Mar 13 at 10:22AM
Hello,

QP.SetTextMode(3);  will allow you to draw invisible text.  The text will still be searchable.

You will also need to use AddImageFromFile and DrawImage to create the visible part of the page.  This link shows how import and draw an image correctly - http://www.quickpdf.org/forum/creating-a-multi-page-pdf-from-a-multipage-tiff_topic2125.html

You will need to use Google to better understand the hOCR data correctly.

Andrew.





Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk