Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Extract text and images from any PDF
  FAQ FAQ  Forum Search   Register Register  Login Login

Extract text and images from any PDF

 Post Reply Post Reply
Author
Message
Santiago View Drop Down
Beginner
Beginner
Avatar

Joined: 28 Mar 06
Location: Puerto Rico
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote Santiago Quote  Post ReplyReply Direct Link To This Post Topic: Extract text and images from any PDF
    Posted: 29 Mar 06 at 11:25AM

Hello,

Is it possible to extract text and images from a given PDF file?

I'm a totally newbie so can you give an example of how this can be done? I use Delphi but any language will do for me.

Thank you,
Santiago

Back to Top
swb1 View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 05 Dec 05
Location: United States
Status: Offline
Points: 100
Post Options Post Options   Thanks (0) Thanks(0)   Quote swb1 Quote  Post ReplyReply Direct Link To This Post Posted: 29 Mar 06 at 12:49PM

Santiago,

 

Yes it is possible and fairly easy to extract page text. Use the GetPageText function and it will return a StringList filled with all of the text found on a given page. Depending upon extract options you can usually figure out the location of the text on the page. The only problem I’ve encountered is that not all PDF are created to same way. I have see PDFs where every single character is a text element. When that happens it can be very difficult to turn those characters back into words.

 

Image extraction is also simple:

 

procedure TForm1.btnExtractImagesClick(Sender: TObject);

var   ImageID, i : Integer;

      imageCount : integer;

      SaveName, FirstName : string;

      sl : TStringList;

begin

 

      qPDF := TiSEDQuickPDF.Create;

      qPDF.UnlockKey(MY_QDPF_KEY);

      if OpenDialog1.Execute then

            begin

            qPDF.LoadFromFile( OpenDialog1.FileName );

            FirstName := copy(OpenDialog1.FileName,1,pos('.pdf',lowercase(OpenDialog1.FileName)));

            imageCount := qPDF.FindImages; //This is Required before ImageID will work

            ShowMessage('ImageCount ' + IntToStr(imageCount) );

            for i := 1 to imageCount do

                  begin

                  ImageID := qPDF.ImageID( i );

                  qPDF.SelectImage( ImageID );

                  case qPDF.ImageType of

                  1 : SaveName := Format('%s%d.jpg',[FirstName,i]);    // The selected image is a JPEG image.

                  2 : SaveName := Format('%s%d.bmp',[FirstName,i]);    // The selected image is a BMP image.

                  3 : SaveName := Format('%s%d.tif',[FirstName,i]);    // The selected image is a TIFF image.

                        end;

                  qPDF.SaveImageToFile( SaveName );

                  end;

            end;

      qPDF.Free;

end;

 

 

regards,

sb
Back to Top
Santiago View Drop Down
Beginner
Beginner
Avatar

Joined: 28 Mar 06
Location: Puerto Rico
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote Santiago Quote  Post ReplyReply Direct Link To This Post Posted: 29 Mar 06 at 7:44PM

That looks really easy. I'm gonna test it.

How did you learn to program with this component? I've seen a manual but it doesn't have like a starter guide, or does it?

Regards,
Santiago

Back to Top
swb1 View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 05 Dec 05
Location: United States
Status: Offline
Points: 100
Post Options Post Options   Thanks (0) Thanks(0)   Quote swb1 Quote  Post ReplyReply Direct Link To This Post Posted: 29 Mar 06 at 8:02PM

A lot of trial and error.

 

It helps to be a Delphi programmer and to have the source code and, of course, this wonderful forum.

 

 

sb

 

Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 30 Mar 06 at 12:59AM
Hi Santiago!

For the first start you should look here:
http://isedquickpdf.com/?pg=kb

It's from the original iSED-team. There you can find very short and easily to understand sample-sources.

Best regards,
Ingo
Back to Top
Santiago View Drop Down
Beginner
Beginner
Avatar

Joined: 28 Mar 06
Location: Puerto Rico
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote Santiago Quote  Post ReplyReply Direct Link To This Post Posted: 30 Mar 06 at 7:52AM

The knowledge base helped me a lot and I was able to test sb's code successfully.

This component rocks!

My last question would be which version do you suggest using? I saw there were some bug fixes in the last beta.

Thanks,
Santiago

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store