Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
PDF with OCR from Xerox WorkCentre 7535 |
Post Reply |
Author | |
mLipok
Senior Member Joined: 23 Apr 14 Location: Poland, Zabrze Status: Offline Points: 449 |
Post Options
Thanks(0)
Posted: 02 May 18 at 2:32PM |
I have scaned PDF document with text created by OCR engine embended in Xerox WorkCentre 7535
You can get this testing PDF here: When I open this PDF in AcrobatReader and select all copy... paste to notepad then I see quite good extracted text. But when I use GetPageText with ExtractOptions = 7 then the text is scattered. Do you have idea how to fix it ? btw. It seams that Option 7 is "broken" as using option 8 will properly place word "Komornik" at the the top of extracted text, instead down in option 7. Edited by mLipok - 02 May 18 at 3:22PM |
|
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600 |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi,
option 7 works on the principle "first in - first out". If you make a correction on the first row of a finalized pdf-document these correction will be extracted at the end cause it was inserted last. Option 8 is doing the extraction in a readable format (taken the position data of each string into account) and extract row by row. You can do it like option 8 by your own if you're using option 2 or 3 - calculating strings to rows... |
|
Cheers,
Ingo |
|
mLipok
Senior Member Joined: 23 Apr 14 Location: Poland, Zabrze Status: Offline Points: 449 |
Post Options
Thanks(0)
|
So it would be good to get new feature/option 9 to extract in a way which will be combinaton of option 7 and 8, which will do exactly what you say in your last statement.
Of course I post feature request to Debenu. Thanks. btw. I'm already in development proces to do what you say, I just now trying to calculate positions and keep this text toogether. |
|
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600 |
|
mLipok
Senior Member Joined: 23 Apr 14 Location: Poland, Zabrze Status: Offline Points: 449 |
Post Options
Thanks(0)
|
I just posted Feature Request
|
|
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600 |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store