Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Problems with DAExtractPageText
  FAQ FAQ  Forum Search   Register Register  Login Login

Problems with DAExtractPageText

 Post Reply Post Reply
Author
Message
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Topic: Problems with DAExtractPageText
    Posted: 12 Dec 05 at 4:48PM
Hello!

I've problems with DAExtractPageText on few pdf files.
The first problem you can find in this file here:
http://www.is-soft.de/brochure.pdf
Here i can't extract the german umlauts like ü or ä for example.
From a link like http://www... i get only httpwww... What's that?
The pdf file was made with a very old pdf tool and pdf version 1.1 ;-)
The second problem are files like this one:
http://www.is-soft.de/invoice.pdf
There is text but no embedded fonts... I can't extract anything.
If anybody here have few advices for me in this case i would be very happy!
Thanks a lot in advance!

Below there's my code...


    QP := TiSEDQuickPDF.Create;
    try
       QP.UnlockKey('*****************************');
       dafh := QP.DAOpenFile(FName,'');
       x    := QP.DAGetPageCount(dafh);
       STR := '';

       lstr := 0;
       strges := '';
       i1 := 1;

       for i := 1 to x Do
       begin
          dapr := QP.DAFindPage(dafh,i);
          STR := QP.DAExtractPageText(dafh,dapr,0);
          strges := strges + STR + #10#13 + #10#13;
          lstr := lstr + Length(Trim(STR));
       end;
       QP.DACloseFile(dafh);
    finally
       QP.Free;
    end;


Edited by Michel_K17
Back to Top
Michel_K17 View Drop Down
Newbie
Newbie
Avatar
www.exp-systems.com

Joined: 25 Jan 03
Status: Offline
Points: 297
Post Options Post Options   Thanks (0) Thanks(0)   Quote Michel_K17 Quote  Post ReplyReply Direct Link To This Post Posted: 13 Dec 05 at 11:28PM
   I've been using the "GetPageText(3)" function and it returns the backslashes properly (/). However, I have found that I can only read text if it's a standard font (Arial/ Courier or Times-Roman). Embedded fonts return nothing, but I only tested subsetted fonts.

I have not tested umlauts, so I can't comment on those.
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 14 Dec 05 at 6:03PM
Hi!
Any hints what to do in this case?
Cheers,
Ingo

Back to Top
bluelizard View Drop Down
Beginner
Beginner
Avatar

Joined: 05 Jan 06
Location: Canada
Status: Offline
Points: 1
Post Options Post Options   Thanks (0) Thanks(0)   Quote bluelizard Quote  Post ReplyReply Direct Link To This Post Posted: 05 Jan 06 at 10:55AM

Hi Ingo,

I have the same problem... also with a INVOICE

iSED LastErrorCode report "401 Could not open input file"

I have try to included a font (Courier or Arial), but not work,
If i open the INVOICE with Acrobat and save it, i can extract text with iSED or Pdftext

have you find something new or find a fix to PdfText?

Regard



Edited by bluelizard
GG
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 06 Jan 06 at 1:19AM

Hi!

I only know that it ahve something to do with using the pdftools FreePDF and/or PDFCreator (they're working with GhostScript). Documents created with acrobat or something else in this price section don't have this problem.

 

Cheers,
Ingo

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store