Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
Problem with textextraction from ocr-pdf |
Post Reply |
Author | |
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
Posted: 22 Aug 08 at 9:30AM |
Hi!
I've a scanned pdf-document (with ScandAll PRO V1.0 / Adobe PDF Scan Library 2.3). While scanning the floating text was converted into ocr. With QP i didn't found a solution to get the textcontent via textextraction. Is here anybody with experiences in this case? What should i do?
The document is made with the pdf-specs 1.4 and has no encryption.
Thanks in advance and best regards,
Ingo Edited by Ingo - 22 Aug 08 at 9:32AM |
|
DELBEKE
Debenu Quick PDF Library Expert Joined: 31 Oct 05 Location: France Status: Offline Points: 151 |
Post Options
Thanks(0)
|
Hi, Ingo
Can you send me the file, i'll have a look in it
jean-luc(at)delbeke(dot)fr Edited by DELBEKE - 22 Aug 08 at 1:37PM |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Jean-Luc!
Thanks for helping.
I've sent it to you.
Cheers and best regards,
Ingo
|
|
DELBEKE
Debenu Quick PDF Library Expert Joined: 31 Oct 05 Location: France Status: Offline Points: 151 |
Post Options
Thanks(0)
|
Hi Ingo.
No problem for me, it works perfectly.
In your document, the text is on an separate layer. Try to use the CombineLayers function before then GetPageText function
I've made some enhancements for this function, perhaps, you missed this post http://www.quickpdf.org/forum/forum_posts.asp?TID=923
In the meanwhile, i've made more enhancements for the annotations functions and send them to Michel, but it seem to be on hollidays.
For now, i am working on a digital signing funtion, but i have somes difficulties whith Delphi (i've just begin with delphi on july).
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Jean-Luc!
I'm in England actually - so this late answer.
I'll try it with combine layers. Thanks a lot.
If you have any enhancements for the library please send it first to me. I'll always implement it and then i send the whole package to Michel and he upload it for new testing for the members.
Cheers and best regards,
Ingo
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Jean-Luc!
How are you using the extract function?
My code still doesn't work. If you want to look:
QP := TiSEDQuickPDF.Create;
try QP.UnlockKey('mycode'); dafh := QP.DAOpenFile(fneu,''); x := QP.DAGetPageCount(dafh); STR := ''; ProgressBar1.Min := 0; ProgressBar1.Max := x; verztxt := fneu + '.txt'; AssignFile(cf,verztxt); Rewrite(cf); for i := 1 to x Do begin ProgressBar1.Position := i; ProgressBar1.Repaint; dapr := QP.DAFindPage(dafh,i); QP.CombineLayers; STR := QP.DAExtractPageText(dafh,dapr,0); WriteLn(cf, ' ');
WriteLn(cf, ' page ' + IntToStr(i) + ' from ' + IntToStr(x) + ' '); WriteLn(cf, ' '); WriteLn(cf,Trim(STR));
end; CloseFile(cf); finally QP.DACloseFile(dafh); QP.Free; ProgressBar1.Position := 0; Screen.Cursor := Save_Cursor; { Always restore to normal } end; Thanks a lot!
Cheers,
Ingo
|
|
DELBEKE
Debenu Quick PDF Library Expert Joined: 31 Oct 05 Location: France Status: Offline Points: 151 |
Post Options
Thanks(0)
|
Hi Ingo
I have not try with the direct access functions. Today it is to late for me to have a try. And tomorrow , i will not be able to test. So i'll see that later, but i' ll dot it.
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Jean-Luc!
This is already a help - I'll try the "normal" extract function ;-) Cheers, Ingo |
|
DELBEKE
Debenu Quick PDF Library Expert Joined: 31 Oct 05 Location: France Status: Offline Points: 151 |
Post Options
Thanks(0)
|
Hi Ingo
I've tested with GetPageText(Parameter)
With Parameter 0 and 1, no text extraction. Others works correctly.
Strange
|
|
DELBEKE
Debenu Quick PDF Library Expert Joined: 31 Oct 05 Location: France Status: Offline Points: 151 |
Post Options
Thanks(0)
|
Hi again,
I've traced the program to understand the différence.
The text contained in your files is Unicoded, The method for extract the text with parameter 0/1 ( the parameter 0 and 1 are strictkly identical) do'nt use the rendering engine and ca'nt extract the Ansi string. This shoud be an improvement.
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Jean-Luc!
I've made similar experiences. I can only extract the text from my special document with GetPageText(4). Parameter 0 and 3 aren't working. Wonder why but i'm happy that it runs now with your help! Thanks a lot! Cheers, Ingo |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store