Print Page | Close Window

Problems with DAExtractPageText

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=205
Printed Date: 16 May 24 at 11:22AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Problems with DAExtractPageText
Posted By: Ingo
Subject: Problems with DAExtractPageText
Date Posted: 12 Dec 05 at 4:48PM
Hello!

I've problems with DAExtractPageText on few pdf files.
The first problem you can find in this file here:
http://www.is-soft.de/brochure.pdf
Here i can't extract the german umlauts like ü or ä for example.
From a link like http://www... i get only httpwww... What's that?
The pdf file was made with a very old pdf tool and pdf version 1.1 ;-)
The second problem are files like this one:
http://www.is-soft.de/invoice.pdf
There is text but no embedded fonts... I can't extract anything.
If anybody here have few advices for me in this case i would be very happy!
Thanks a lot in advance!

Below there's my code...


    QP := TiSEDQuickPDF.Create;
    try
       QP.UnlockKey('*****************************');
       dafh := QP.DAOpenFile(FName,'');
       x    := QP.DAGetPageCount(dafh);
       STR := '';

       lstr := 0;
       strges := '';
       i1 := 1;

       for i := 1 to x Do
       begin
          dapr := QP.DAFindPage(dafh,i);
          STR := QP.DAExtractPageText(dafh,dapr,0);
          strges := strges + STR + #10#13 + #10#13;
          lstr := lstr + Length(Trim(STR));
       end;
       QP.DACloseFile(dafh);
    finally
       QP.Free;
    end;



Replies:
Posted By: Michel_K17
Date Posted: 13 Dec 05 at 11:28PM
   I've been using the "GetPageText(3)" function and it returns the backslashes properly (/). However, I have found that I can only read text if it's a standard font (Arial/ Courier or Times-Roman). Embedded fonts return nothing, but I only tested subsetted fonts.

I have not tested umlauts, so I can't comment on those.


Posted By: Ingo
Date Posted: 14 Dec 05 at 6:03PM
Hi!
Any hints what to do in this case?


-------------
Cheers,
Ingo



Posted By: bluelizard
Date Posted: 05 Jan 06 at 10:55AM

Hi Ingo,

I have the same problem... also with a INVOICE

iSED LastErrorCode report "401 Could not open input file"

I have try to included a font (Courier or Arial), but not work,
If i open the INVOICE with Acrobat and save it, i can extract text with iSED or Pdftext

have you find something new or find a fix to PdfText?

Regard



-------------
GG


Posted By: Ingo
Date Posted: 06 Jan 06 at 1:19AM

Hi!

I only know that it ahve something to do with using the pdftools FreePDF and/or PDFCreator (they're working with GhostScript). Documents created with acrobat or something else in this price section don't have this problem.

 



-------------
Cheers,
Ingo




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk