Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
![]() |
Creating Text Searchable PDF with hocr file |
Post Reply ![]() |
Author | |
McHaigh ![]() Beginner ![]() Joined: 18 Jun 13 Status: Offline Points: 2 |
![]() ![]() ![]() ![]() ![]() Posted: 18 Jun 13 at 4:16PM |
Hi there,
Could someone please elaborate on the answer posed here.
I am unsure what the mean by "without having an eye on the layout" as far as I can see the steps would have to parse the "hocr" files XML and calculate the position of the boxes described within relative to the document. Am I missing something. Is there simple functionality written to parse HOCR data onto a PDF document?
|
|
![]() |
|
Ingo ![]() Moderator Group ![]() ![]() Joined: 29 Oct 05 Status: Offline Points: 3529 |
![]() ![]() ![]() ![]() ![]() |
...without having an eye on the layout...
'cause you draw the text right from scratch. The original layout you can post over the drawed text on a new layer/content group. So the first text is unvisible but extractable. Regarding hocr there was a hint: "Looking at Google" ;-) Cheers and welcome here, Ingo |
|
![]() |
|
AndrewC ![]() Moderator Group ![]() ![]() Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
![]() ![]() ![]() ![]() ![]() |
Hello, Firstly you will need to turn the image into a PDF file by using AddImageFromFile to import the TIFF file that you have just OCR'ed. Then as you have said you will need to parse the XML based hocr files to calculate the x,y position and pointsize of the font. For each word you need to QP.SetTextMode(3); for i = 1 to ocr_wordcount do begin QP.SetTextSize(ocr_size); QP.DrawText(ocr_x, ocr_y, ocr_word); end; This will draw invisible text onto the PDF. It is this process that will make it searchable. Andrew. |
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store