Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - ocr - where is the recognized text?
  FAQ FAQ  Forum Search   Register Register  Login Login

ocr - where is the recognized text?

 Post Reply Post Reply
Author
Message
vladob View Drop Down
Beginner
Beginner


Joined: 13 Jan 12
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote vladob Quote  Post ReplyReply Direct Link To This Post Topic: ocr - where is the recognized text?
    Posted: 13 Jan 12 at 5:36PM
Hi all
 
I have following question, when you ask OCR software to read picture PDF (scanned pictures into PDF), OCR engines inject recognized text into PDF file, can you let me know where? I mean how I can access those recognized text with QuickPDF?
 
Many thanks
 
 
Vladimir
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 13 Jan 12 at 6:47PM
Hi Vladimir!

Don't know if i understand your question right but ...
First there's a scanned invoice for example.
It's scanned as an image to pdf first.
You can view this pdf via QuickPDF, changing properties and so on but textextraction isn't possible.
Then there are ocr-tools available going through this pdf making readable textcontent from the "image-pdf".
For this the "image-pdf" remains the same but additionally the ocr-tool inserts real textcontent.
Now you can extract this text with QuickPDF and things like fulltext search and others are possible.

With QuickPDF you can determine if there's an "ocr-ed" 'cause while textextraction there's an option to extract with fontnames... ocr-fonts are very special fonts and mostly inside the fontname there's an "ocr" too.
The other thing how to determine an ocr-pdf is:
If the inserted imagecount is the same than the pagecount and if the images have the same dimensions as the pages.

I hope i could help a little bit and perhaps now you have further ideas ;-)

Cheers and welcome here,
Ingo
 
Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 16 Jan 12 at 2:22PM
OCR text is often inserted into an invisible text object that cannot be seen but can be extracted with GetPageText text extraction functions within QPL.

  int ret = QP.LoadFromFile("ocred.pdf", "");
  string s = QP.GetPageText(3);    // you can also try option 7 or 8.


Back to Top
vladob View Drop Down
Beginner
Beginner


Joined: 13 Jan 12
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote vladob Quote  Post ReplyReply Direct Link To This Post Posted: 17 Jan 12 at 7:43AM
Many thanks for your precious help
It works
Have a nice day
V.
Back to Top
wubuer View Drop Down
Beginner
Beginner


Joined: 25 Nov 15
Status: Offline
Points: 1
Post Options Post Options   Thanks (0) Thanks(0)   Quote wubuer Quote  Post ReplyReply Direct Link To This Post Posted: 25 Nov 15 at 7:56AM
Originally posted by AndrewC AndrewC wrote:

OCR text is often inserted into an invisible text object that cannot be seen but can be extracted with GetPageText text extraction functions within QPL.

  int ret = QP.LoadFromFile("ocred.pdf", "");
  string s = QP.GetPageText(3);    // you can also try option 7 or 8.



thanks, it's help a lot.
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store