Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - PDF with OCR from Xerox WorkCentre 7535
  FAQ FAQ  Forum Search   Register Register  Login Login

PDF with OCR from Xerox WorkCentre 7535

 Post Reply Post Reply
Author
Message
mLipok View Drop Down
Senior Member
Senior Member
Avatar

Joined: 23 Apr 14
Location: Poland, Zabrze
Status: Offline
Points: 449
Post Options Post Options   Thanks (0) Thanks(0)   Quote mLipok Quote  Post ReplyReply Direct Link To This Post Topic: PDF with OCR from Xerox WorkCentre 7535
    Posted: 02 May 18 at 2:32PM
I have scaned PDF document with text created by OCR engine embended in Xerox WorkCentre 7535


You can get this testing PDF here:


When I open this PDF in AcrobatReader and select all copy... paste to notepad then I see quite good extracted text.



But when I use GetPageText with ExtractOptions = 7 then the text is scattered.

Do you have idea how to fix it ?

btw. 
It seams that Option 7 is "broken" as using option 8 will properly place word "Komornik" at the the top of extracted text, instead down in option 7.



Edited by mLipok - 02 May 18 at 3:22PM
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 03 May 18 at 10:50AM
Hi,

option 7 works on the principle "first in - first out".
If you make a correction on the first row of a finalized pdf-document these correction will be extracted at the end cause it was inserted last.

Option 8 is doing the extraction in a readable format (taken the position data of each string into account) and extract row by row.

You can do it like option 8 by your own if you're using option 2 or 3 - calculating strings to rows...

Cheers,
Ingo

Back to Top
mLipok View Drop Down
Senior Member
Senior Member
Avatar

Joined: 23 Apr 14
Location: Poland, Zabrze
Status: Offline
Points: 449
Post Options Post Options   Thanks (0) Thanks(0)   Quote mLipok Quote  Post ReplyReply Direct Link To This Post Posted: 04 May 18 at 7:56AM
So it would be good to get new feature/option 9 to extract in a way which will be combinaton of option 7 and 8, which will do exactly what you say in your last statement.

Of course I post feature request to Debenu.

Thanks.

btw. I'm already in development proces to do what you say, I just now trying to calculate positions and keep this text toogether.

Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600
Back to Top
mLipok View Drop Down
Senior Member
Senior Member
Avatar

Joined: 23 Apr 14
Location: Poland, Zabrze
Status: Offline
Points: 449
Post Options Post Options   Thanks (0) Thanks(0)   Quote mLipok Quote  Post ReplyReply Direct Link To This Post Posted: 04 May 18 at 8:04AM
I just posted Feature Request
Here you can find description how to test my examples:
http://www.quickpdf.org/forum/forum_posts.asp?TID=2932&PID=12600&title=drawcapturedpagematrix-matrix-howto#12600
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store