Print Page | Close Window

DPL.ExtractFilePageText's error

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3679
Printed Date: 30 Apr 24 at 7:52PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: DPL.ExtractFilePageText's error
Posted By: Jimmy Wu.
Subject: DPL.ExtractFilePageText's error
Date Posted: 16 Feb 19 at 9:04AM
Smile Dear Sir :
 
I'm coding using Delphi,I need to convert PDF to Text file using cmd " DPL.ExtractFilePageText"
,and just get an empty text file ,please kindly help me, source are as follows:
 
//INITIAL DPL
   DPL := TDebenuPDFLibrary1411.Create;
   //check authority
   if NOT DPL.UnlockKey(edtLicenseKey.Text) = 1 then
   begin
     showmessage('The key is invalid,exit now!');
     EXIT;
   end;
   DPL.LoadFromFile(PDF_File,'');
   iNumPages := DPL.PageCount;
   strText   := '';
   getdir(0,Cur_Path);
   TextF:=Cur_Path+'\'+'PDF2TextF.txt';
   AssignFile(f,TextF);
   if FileExists(TextF) then Erase(f);
   Rewrite(f);
  
   For nPage := 1 to iNumPages do
   Begin
      strText:= strText + DPL.ExtractFilePageText(PDF_File, '', nPage, 3);
      // Write all the data to a file
      Writeln(f,strText);
   End;
   CloseFile(f);
 
 



Replies:
Posted By: Ingo
Date Posted: 16 Feb 19 at 2:54PM
Hi Jimmy,

you should read the description about ExtractFilePageText:
https://www.debenu.com/docs/pdf_library_reference/ExtractFilePageText.php
"...internally use of DAfunctionality..."
LoadFromFile needs a lot memory - ExtractFilePageText not.
You destroy the advantages of ExtractFilePageText while using LoadFromFile.
Please read more about DA functionalities using QuickPDF:
https://www.debenu.com/docs/pdf_library_reference/DirectAccessFunctionality.php

The reason for your issue could be an encrypted pdf-file.
Before loading and processing a pdf document you should decrypted it.
Here's a sample-code. You should use it directly after LoadFromFile:
// . . .
   QP := TDebenuPDFLibrary1611.Create;
   try
      QP.UnlockKey('my_reg_key_i_have_to_insert_here');
      QP.LoadFromFile(fnew, '');
      If ( QP.EncryptionStatus > 0 ) Then
         QP.Decrypt;
      QP.SaveToFile(fnew + '.save.pdf');
   finally
      QP.Free;
   end;
// . . .

For extracting text from pdf you can use code like this as well:
// . . .
       for i := 1 to QP.PageCount Do
       begin
          QP.SelectPage(i);
          QP.SetOrigin(1);
          QP.CombineContentStreams;
          STR := STR + Trim(QP.GetPageText(8));
       end;
// . . .

Cheers and welcome here,
Ingo




-------------
Cheers,
Ingo



Posted By: Jimmy Wu.
Date Posted: 18 Feb 19 at 7:26AM
Yes,just encrypt error,that's not problem now !
The problem is  one string:"聖品脂肪抹醬(16KG 彩鐵 金黃蓋"
     that is splited into 4 string fields by DPL.ExtractFilePageText
 
     as :  聖品脂肪抹醬(統清)G 彩鐵 金黃蓋
            1
              
            K
 
     are there any ways to solove it? 
 
                       Thanks!
 
   <PS>: After ExtractFilePageText ,the string should still be one field:
              as::"聖品脂肪抹醬(16KG 彩鐵 金黃蓋"


Posted By: Ingo
Date Posted: 18 Feb 19 at 8:02AM
I can't read your post - seams to be asian character set...
But i know what you're meaning.
The extraction works the other way round like inserted.
What's inserted last will be extracted first.
And if there was a short insertion after complete textcreation these last insertion will be extracted first.
There are several options using textextraction try option 0, 7 or 8 to get a human readable result.



-------------
Cheers,
Ingo




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk