Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage

Forum Home

Forum Home > For Users of the Library > I need help - I can help

New Posts

RSS Feed - ocr - where is the recognized text?

FAQ

FAQ

Register

Login

ocr - where is the recognized text?

Post Reply

Author

Topic Search

Topic Options

Topic Options

Create New Topic

Printable Version

Translate Topic

vladob

View Drop Down

Members Profile

Find Members Posts

Beginner

Beginner

Joined: 13 Jan 12
Status: Offline
Points: 4

Post Options

Post Options

Thanks (0)

Thanks(0)

Quote vladob

Quote

Post Reply

Reply

Direct Link To This Post

Topic: ocr - where is the recognized text?
Posted: 13 Jan 12 at 5:36PM

Hi all

I have following question, when you ask OCR software to read picture PDF (scanned pictures into PDF), OCR engines inject recognized text into PDF file, can you let me know where? I mean how I can access those recognized text with QuickPDF?

Many thanks

Vladimir

Back to Top

Ingo

View Drop Down

Members Profile

Find Members Posts

Moderator Group

Moderator Group

Joined: 29 Oct 05
Status: Offline
Points: 3529

Post Options

Post Options

Thanks (0)

Thanks(0)

Quote Ingo

Quote

Post Reply

Reply

Direct Link To This Post

Posted: 13 Jan 12 at 6:47PM

Hi Vladimir!

Don't know if i understand your question right but ...
First there's a scanned invoice for example.
It's scanned as an image to pdf first.
You can view this pdf via QuickPDF, changing properties and so on but textextraction isn't possible.
Then there are ocr-tools available going through this pdf making readable textcontent from the "image-pdf".
For this the "image-pdf" remains the same but additionally the ocr-tool inserts real textcontent.
Now you can extract this text with QuickPDF and things like fulltext search and others are possible.

With QuickPDF you can determine if there's an "ocr-ed" 'cause while textextraction there's an option to extract with fontnames... ocr-fonts are very special fonts and mostly inside the fontname there's an "ocr" too.
The other thing how to determine an ocr-pdf is:
If the inserted imagecount is the same than the pagecount and if the images have the same dimensions as the pages.

I hope i could help a little bit and perhaps now you have further ideas ;-)

Cheers and welcome here,
Ingo

Back to Top

AndrewC

View Drop Down

Members Profile

Find Members Posts

Moderator Group

Moderator Group

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841

Post Options

Post Options

Thanks (0)

Thanks(0)

Quote AndrewC

Quote

Post Reply

Reply

Direct Link To This Post

Posted: 16 Jan 12 at 2:22PM

OCR text is often inserted into an invisible text object that cannot be seen but can be extracted with GetPageText text extraction functions within QPL.

int ret = QP.LoadFromFile("ocred.pdf", "");

string s = QP.GetPageText(3); // you can also try option 7 or 8.

Back to Top

vladob

View Drop Down

Members Profile

Find Members Posts

Beginner

Beginner

Joined: 13 Jan 12
Status: Offline
Points: 4

Post Options

Post Options

Thanks (0)

Thanks(0)

Quote vladob

Quote

Post Reply

Reply

Direct Link To This Post

Posted: 17 Jan 12 at 7:43AM

Many thanks for your precious help

It works

Have a nice day

V.

Back to Top

wubuer

View Drop Down

Members Profile

Find Members Posts

Beginner

Beginner

Joined: 25 Nov 15
Status: Offline
Points: 1

Post Options

Post Options

Thanks (0)

Thanks(0)

Quote wubuer

Quote

Post Reply

Reply

Direct Link To This Post

Posted: 25 Nov 15 at 7:56AM

Originally posted by AndrewC

AndrewC wrote:

OCR text is often inserted into an invisible text object that cannot be seen but can be extracted with GetPageText text extraction functions within QPL.

int ret = QP.LoadFromFile("ocred.pdf", "");

string s = QP.GetPageText(3); // you can also try option 7 or 8.

thanks, it's help a lot.

Back to Top

Post Reply
Tweet

Forum Jump

Forum Permissions View Drop Down

View Drop Down

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot delete your posts in this forum
You cannot edit your posts in this forum
You cannot create polls in this forum
You cannot vote in polls in this forum

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store