Print Page | Close Window

PDF with OCR from Xerox WorkCentre 7535

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3557
Printed Date: 29 Apr 24 at 3:25PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: PDF with OCR from Xerox WorkCentre 7535
Posted By: mLipok
Subject: PDF with OCR from Xerox WorkCentre 7535
Date Posted: 02 May 18 at 2:32PM
I have scaned PDF document with text created by OCR engine embended in Xerox WorkCentre 7535


You can get this testing PDF here:
http://lipok.pl/Debenu/DOC_20180417150838.pdf" rel="nofollow - http://lipok.pl/Debenu/DOC_20180417150838.pdf


When I open this PDF in AcrobatReader and select all copy... paste to notepad then I see quite good extracted text.



But when I use GetPageText with ExtractOptions = 7 then the text is scattered.

Do you have idea how to fix it ?

btw. 
It seams that Option 7 is "broken" as using option 8 will properly place word "Komornik" at the the top of extracted text, instead down in option 7.



-------------
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600



Replies:
Posted By: Ingo
Date Posted: 03 May 18 at 10:50AM
Hi,

option 7 works on the principle "first in - first out".
If you make a correction on the first row of a finalized pdf-document these correction will be extracted at the end cause it was inserted last.

Option 8 is doing the extraction in a readable format (taken the position data of each string into account) and extract row by row.

You can do it like option 8 by your own if you're using option 2 or 3 - calculating strings to rows...



-------------
Cheers,
Ingo



Posted By: mLipok
Date Posted: 04 May 18 at 7:56AM
So it would be good to get new feature/option 9 to extract in a way which will be combinaton of option 7 and 8, which will do exactly what you say in your last statement.

Of course I post feature request to Debenu.

Thanks.

btw. I'm already in development proces to do what you say, I just now trying to calculate positions and keep this text toogether.



-------------
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600


Posted By: mLipok
Date Posted: 04 May 18 at 8:04AM
I just posted Feature Request

-------------
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk