Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Problem with textextraction from ocr-pdf
  FAQ FAQ  Forum Search   Register Register  Login Login

Problem with textextraction from ocr-pdf

 Post Reply Post Reply
Author
Message
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Topic: Problem with textextraction from ocr-pdf
    Posted: 22 Aug 08 at 9:30AM
Hi!
 
I've a scanned pdf-document (with ScandAll PRO V1.0 / Adobe PDF Scan Library 2.3). While scanning the floating text was converted into ocr. With QP i didn't found a solution to get the textcontent via textextraction. Is here anybody with experiences in this case? What should i do?
 
The document is made with the pdf-specs 1.4 and has no encryption.
 
Thanks in advance and best regards,
Ingo


Edited by Ingo - 22 Aug 08 at 9:32AM
Back to Top
DELBEKE View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 31 Oct 05
Location: France
Status: Offline
Points: 151
Post Options Post Options   Thanks (0) Thanks(0)   Quote DELBEKE Quote  Post ReplyReply Direct Link To This Post Posted: 22 Aug 08 at 1:36PM
Hi, Ingo
Can you send me the file, i'll have a look in it
 
jean-luc(at)delbeke(dot)fr


Edited by DELBEKE - 22 Aug 08 at 1:37PM
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 23 Aug 08 at 8:27AM
Hi Jean-Luc!
 
Thanks for helping.
I've sent it to you.
 
Cheers and best regards,
Ingo 
Back to Top
DELBEKE View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 31 Oct 05
Location: France
Status: Offline
Points: 151
Post Options Post Options   Thanks (0) Thanks(0)   Quote DELBEKE Quote  Post ReplyReply Direct Link To This Post Posted: 23 Aug 08 at 1:16PM
Hi Ingo.
No problem for me, it works perfectly.
In your document, the text is on an separate layer. Try to use the CombineLayers function before then GetPageText function
I've made some enhancements for this function, perhaps, you missed this post http://www.quickpdf.org/forum/forum_posts.asp?TID=923
In the meanwhile, i've made more enhancements for the annotations functions and send them to Michel, but it seem to be on hollidays.
For now, i am working on a digital signing funtion, but i have somes difficulties whith Delphi (i've just begin with delphi on july).
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 28 Aug 08 at 5:18AM
Hi Jean-Luc!
 
I'm in England actually - so this late answer.
I'll try it with combine layers. Thanks a lot.
If you have any enhancements for the library please send it first to me. I'll always implement it and then i send the whole package to Michel and he upload it for new testing for the members.
 
Cheers and best regards,
Ingo
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 31 Aug 08 at 4:47PM
Hi Jean-Luc!
 
How are you using the extract function?
My code still doesn't work. If you want to look:
   QP := TiSEDQuickPDF.Create;
   try
       QP.UnlockKey('mycode');
       dafh := QP.DAOpenFile(fneu,'');
       x    := QP.DAGetPageCount(dafh);
       STR  := '';
       ProgressBar1.Min := 0;
       ProgressBar1.Max := x;
       verztxt := fneu + '.txt';
       AssignFile(cf,verztxt);
       Rewrite(cf);
       for i := 1 to x Do
       begin
          ProgressBar1.Position := i;
          ProgressBar1.Repaint;
          dapr := QP.DAFindPage(dafh,i);
          QP.CombineLayers;
          STR  := QP.DAExtractPageText(dafh,dapr,0);      
          WriteLn(cf, '   ');
          WriteLn(cf, ' page ' + IntToStr(i) + ' from ' + IntToStr(x) + ' ');
          WriteLn(cf, '   ');
          WriteLn(cf,Trim(STR));
       end;
       CloseFile(cf);
    finally
       QP.DACloseFile(dafh);
       QP.Free;
       ProgressBar1.Position := 0;
       Screen.Cursor := Save_Cursor;  { Always restore to normal }
    end;
 
Thanks a lot!
 
Cheers,
Ingo
Back to Top
DELBEKE View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 31 Oct 05
Location: France
Status: Offline
Points: 151
Post Options Post Options   Thanks (0) Thanks(0)   Quote DELBEKE Quote  Post ReplyReply Direct Link To This Post Posted: 31 Aug 08 at 5:33PM
Hi Ingo
I have not try with the direct access functions. Today it is to late for me to have a try. And tomorrow , i will not be able to test. So i'll see that later, but i' ll dot it.
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 01 Sep 08 at 1:48AM
Hi Jean-Luc!

This is already a help - I'll try the "normal" extract function ;-)

Cheers,
Ingo
Back to Top
DELBEKE View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 31 Oct 05
Location: France
Status: Offline
Points: 151
Post Options Post Options   Thanks (0) Thanks(0)   Quote DELBEKE Quote  Post ReplyReply Direct Link To This Post Posted: 02 Sep 08 at 12:59PM
Hi Ingo
 
I've tested with GetPageText(Parameter)
With Parameter 0 and 1, no text extraction. Others works correctly.
 
Strange
Back to Top
DELBEKE View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 31 Oct 05
Location: France
Status: Offline
Points: 151
Post Options Post Options   Thanks (0) Thanks(0)   Quote DELBEKE Quote  Post ReplyReply Direct Link To This Post Posted: 02 Sep 08 at 2:10PM
Hi again,
I've traced the program to understand the différence.
 
The text contained in your files is Unicoded, The method for extract the text with parameter 0/1 ( the parameter 0 and 1 are strictkly identical) do'nt use the rendering engine and ca'nt extract the Ansi string. This shoud be an improvement.
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 03 Sep 08 at 1:38AM
Hi Jean-Luc!

I've made similar experiences. I can only extract the text from my special document with GetPageText(4). Parameter 0 and 3 aren't working. Wonder why but i'm happy that it runs now with your help! Thanks a lot!

Cheers,
Ingo
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store