Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
![]() |
Extract text and images from any PDF |
Post Reply ![]() |
Author | |
Santiago ![]() Beginner ![]() Joined: 28 Mar 06 Location: Puerto Rico Status: Offline Points: 4 |
![]() ![]() ![]() ![]() ![]() Posted: 29 Mar 06 at 11:25AM |
Hello, Is it possible to extract text and images from a given PDF file? I'm a totally newbie so can you give an example of how this can be done? I use Delphi but any language will do for me. Thank you, |
|
![]() |
|
swb1 ![]() Debenu Quick PDF Library Expert ![]() Joined: 05 Dec 05 Location: United States Status: Offline Points: 102 |
![]() ![]() ![]() ![]() ![]() |
Santiago, Yes it is possible and fairly easy to extract page text. Use the GetPageText function and it will return a StringList filled with all of the text found on a given page. Depending upon extract options you can usually figure out the location of the text on the page. The only problem I’ve encountered is that not all PDF are created to same way. I have see PDFs where every single character is a text element. When that happens it can be very difficult to turn those characters back into words. Image extraction is also simple: procedure TForm1.btnExtractImagesClick(Sender: TObject); var ImageID, i : Integer; imageCount : integer; SaveName, FirstName : string; sl : TStringList; begin qPDF := TiSEDQuickPDF.Create; qPDF.UnlockKey(MY_QDPF_KEY); if OpenDialog1.Execute then begin qPDF.LoadFromFile( OpenDialog1.FileName ); FirstName := copy(OpenDialog1.FileName,1,pos('.pdf',lowercase(OpenDialog1.FileName))); imageCount := qPDF.FindImages; //This is Required before ImageID will work ShowMessage('ImageCount ' + IntToStr(imageCount) ); for i := 1 to imageCount do begin ImageID := qPDF.ImageID( i ); qPDF.SelectImage( ImageID ); case qPDF.ImageType of 1 : SaveName := Format('%s%d.jpg',[FirstName,i]); // The selected image is a JPEG image. 2 : SaveName := Format('%s%d.bmp',[FirstName,i]); // The selected image is a BMP image. 3 : SaveName := Format('%s%d.tif',[FirstName,i]); // The selected image is a TIFF image. end; qPDF.SaveImageToFile( SaveName ); end; end; qPDF.Free; end; regards, |
|
![]() |
|
Santiago ![]() Beginner ![]() Joined: 28 Mar 06 Location: Puerto Rico Status: Offline Points: 4 |
![]() ![]() ![]() ![]() ![]() |
That looks really easy. I'm gonna test it. How did you learn to program with this component? I've seen a manual but it doesn't have like a starter guide, or does it? Regards, |
|
![]() |
|
swb1 ![]() Debenu Quick PDF Library Expert ![]() Joined: 05 Dec 05 Location: United States Status: Offline Points: 102 |
![]() ![]() ![]() ![]() ![]() |
A lot of trial and error. It helps to be a Delphi programmer and to have the source code and, of course, this wonderful forum. sb
|
|
![]() |
|
Ingo ![]() Moderator Group ![]() ![]() Joined: 29 Oct 05 Status: Offline Points: 3529 |
![]() ![]() ![]() ![]() ![]() |
Hi Santiago!
For the first start you should look here: http://isedquickpdf.com/?pg=kb It's from the original iSED-team. There you can find very short and easily to understand sample-sources. Best regards, Ingo |
|
![]() |
|
Santiago ![]() Beginner ![]() Joined: 28 Mar 06 Location: Puerto Rico Status: Offline Points: 4 |
![]() ![]() ![]() ![]() ![]() |
The knowledge base helped me a lot and I was able to test sb's code successfully. This component rocks! My last question would be which version do you suggest using? I saw there were some bug fixes in the last beta. Thanks, |
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store