Print Page | Close Window

Extract text and images from any PDF

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=376
Printed Date: 19 May 24 at 6:08AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Extract text and images from any PDF
Posted By: Santiago
Subject: Extract text and images from any PDF
Date Posted: 29 Mar 06 at 11:25AM

Hello,

Is it possible to extract text and images from a given PDF file?

I'm a totally newbie so can you give an example of how this can be done? I use Delphi but any language will do for me.

Thank you,
Santiago




Replies:
Posted By: swb1
Date Posted: 29 Mar 06 at 12:49PM

Santiago,

 

Yes it is possible and fairly easy to extract page text. Use the GetPageText function and it will return a StringList filled with all of the text found on a given page. Depending upon extract options you can usually figure out the location of the text on the page. The only problem I’ve encountered is that not all PDF are created to same way. I have see PDFs where every single character is a text element. When that happens it can be very difficult to turn those characters back into words.

 

Image extraction is also simple:

 

procedure TForm1.btnExtractImagesClick(Sender: TObject);

var   ImageID, i : Integer;

      imageCount : integer;

      SaveName, FirstName : string;

      sl : TStringList;

begin

 

      qPDF := TiSEDQuickPDF.Create;

      qPDF.UnlockKey(MY_QDPF_KEY);

      if OpenDialog1.Execute then

            begin

            qPDF.LoadFromFile( OpenDialog1.FileName );

            FirstName := copy(OpenDialog1.FileName,1,pos('.pdf',lowercase(OpenDialog1.FileName)));

            imageCount := qPDF.FindImages; //This is Required before ImageID will work

            ShowMessage('ImageCount ' + IntToStr(imageCount) );

            for i := 1 to imageCount do

                  begin

                  ImageID := qPDF.ImageID( i );

                  qPDF.SelectImage( ImageID );

                  case qPDF.ImageType of

                  1 : SaveName := Format('%s%d.jpg',[FirstName,i]);    // The selected image is a JPEG image.

                  2 : SaveName := Format('%s%d.bmp',[FirstName,i]);    // The selected image is a BMP image.

                  3 : SaveName := Format('%s%d.tif',[FirstName,i]);    // The selected image is a TIFF image.

                        end;

                  qPDF.SaveImageToFile( SaveName );

                  end;

            end;

      qPDF.Free;

end;

 

 

regards,

sb


Posted By: Santiago
Date Posted: 29 Mar 06 at 7:44PM

That looks really easy. I'm gonna test it.

How did you learn to program with this component? I've seen a manual but it doesn't have like a starter guide, or does it?

Regards,
Santiago



Posted By: swb1
Date Posted: 29 Mar 06 at 8:02PM

A lot of trial and error.

 

It helps to be a Delphi programmer and to have the source code and, of course, this wonderful forum.

 

 

sb

 



Posted By: Ingo
Date Posted: 30 Mar 06 at 12:59AM
Hi Santiago!

For the first start you should look here:
http://isedquickpdf.com/?pg=kb

It's from the original iSED-team. There you can find very short and easily to understand sample-sources.

Best regards,
Ingo


Posted By: Santiago
Date Posted: 30 Mar 06 at 7:52AM

The knowledge base helped me a lot and I was able to test sb's code successfully.

This component rocks!

My last question would be which version do you suggest using? I saw there were some bug fixes in the last beta.

Thanks,
Santiago




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk