General Discussion - DAGetImageListCount does not work

DAGetImageListCount does not work

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: General Discussion
Forum Description: Discussion board for Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3432
Printed Date: 12 Jun 25 at 8:39AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com

Topic: DAGetImageListCount does not work

Posted By: skoschke
Subject: DAGetImageListCount does not work
Date Posted: 11 Jan 17 at 12:36PM

If i open different PDF files and use QP.DAGetImageListCount to get images from these files, i get no images.

If i open the file in any pdf viewer, i can see a background image.

What can i do to extract the images?

Second problem:

if i use QP.ExtractPageTextBlocks(3) i get text from pdf file, but in some cases i get wrong text, for example "PhoNe nuMber" insted of "Phone number" what i can see in any pdf viewer?

Stefan

Replies:

Posted By: tfrost
Date Posted: 11 Jan 17 at 12:59PM

I suggest you show your actual code, so that people can first eliminate obvious problems.

Posted By: skoschke
Date Posted: 11 Jan 17 at 1:32PM

Code with DirectAccess:

procedure TForm1.ReadImages(fn: string; var l: Tstringlist);
var
i: integer;
imagecount: integer;
imageid: integer;
x: single;
y: single;
b: single;
h: single;
IL: integer;
PR: integer;
p: integer;
ImageData: ansistring; // !!!!!! wegen Stream !!!
seitenhoehe: single;
FH: integer;
begin
// Für DirectAccess mit FileHandle öffnen
FH := PDFLibrary.DAOpenFile(fn, '');
// alte Streams löschen
for i := 1 to High(streamarray) do
streamarray.Free;
setlength(streamarray, 1);
for p := 1 to PDFLibrary.PageCount do
begin
PR := PDFLibrary.DAFindPage(FH, p);
IL := PDFLibrary.DAGetPageImageList(FH, PR);
seitenhoehe := PDFLibrary.DAGetPageHeight(FH, PR);
for i := 1 to PDFLibrary.DAGetImageListCount(FH, IL) do
begin
// Read the image data
ImageData := PDFLibrary.DAGetImageDataToString(FH, IL, i);
// Determine the location and size of the image on the page
x := PDFLibrary.DAGetImageDblProperty(FH, IL, i, 501);
y := seitenhoehe - PDFLibrary.DAGetImageDblProperty(FH, IL, i, 502);
b := PDFLibrary.DAGetImageDblProperty(FH, IL, i, 503) -
PDFLibrary.DAGetImageDblProperty(FH, IL, i, 501);
h := PDFLibrary.DAGetImageDblProperty(FH, IL, i, 502) -
PDFLibrary.DAGetImageDblProperty(FH, IL, i, 508);
// für jedes gefundene Bild Streamarray verlängern und speichern
setlength(streamarray, length(streamarray) + 1);
streamarray[high(streamarray)] := TMemorystream.Create;
streamarray[high(streamarray)].Position := 0;
streamarray[high(streamarray)].WriteBuffer(Pointer(ImageData)^,
length(ImageData));
// in Liste eintragen
l.Add('P:' + p.ToString);
l.Add('{' + 'Image' + high(streamarray).ToString + '}');
l.Add('X:' + x.ToString);
l.Add('Y:' + y.ToString);
l.Add('B:' + b.ToString);
l.Add('H:' + h.ToString);
end; // End image loop
end;
PDFLibrary.DACloseFile(FH);
end;

and with normal Access:

procedure TForm1.ReadImages2();
var
i, k: integer;
IL: integer;
ic: integer;
it: integer;
gid: integer;
filename : string;
begin
for i := 1 to PDFLibrary.PageCount do
begin
// Select current page
PDFLibrary.SelectPage(i);
// Get list of images on the page
IL := PDFLibrary.GetPageImageList(0);
// Count number of images in the list
ic := PDFLibrary.GetImageListCount(IL);
for k := 1 to ic do
begin
// Iterate through each image and get the
// image type and image ID
it := PDFLibrary.GetImageListItemIntProperty(IL, k, 400);
gid := PDFLibrary.GetImageListItemIntProperty(IL, k, 405);
// Choose the approrpriate file extenion based on
// the returned image type
case it of
1:
filename := 'c:\temp\image-' + gid.ToString + '-' + k.ToString + '.jpg';
2:
filename := 'c:\temp\image-' + gid.ToString + '-' + k.ToString + '.bmp';
3:
filename := 'c:\temp\image-' + gid.ToString + '-' + k.ToString + '.tif';
4:
filename := 'c:\temp\image-' + gid.ToString + '-' + k.ToString + '.png';
end;
// Save the selected image to disk
PDFLibrary.SaveImageListItemDataToFile(IL, k, 0, filename);
end;
end
end;

The result of

DAGetImageListCount in example 1 and

GetImageListCount in example 2 is 0

Both codes don't give me images back, in different pdf files, in other pdf files the images are given back correctly!

Is it possible that the images i see are watermarks ore stamps?

How to read out these?

Stefan

Posted By: Ingo
Date Posted: 11 Jan 17 at 8:39PM

Hi Stefan,

only embedded (inserted as files) images can be extracted from pdf.

It's similar to a word document:

You can insert a screenshot from the clipboard or you can insert an image file - that's different.

Images which are not really enbedded/inserted you can't extract using QuickPDF-functionality.

Your second issue will have to do with a strange font or something similar - so QuickPDF have some trouble in recognition.

Please keep in mind that QuickPDF is still a light weight pdf library instead of - for example - an adobe installation which comes around with 100 mb ;-)

It will help to upload the pdf-file anywhere - so we could make tests with own routines...

Cheers and welcome here,

Ingo

-------------
Cheers,
Ingo

Posted By: skoschke
Date Posted: 12 Jan 17 at 3:02PM

Hi Ingo,

thank you for welcome , i understand now and will go another way in this case:

I try to use RenderPageToStream so i get the full page as background for my new pdf file...

Stefan

Posted By: Ingo
Date Posted: 13 Jan 17 at 10:46AM

Hi Stefan,

yes... RenderPage is the only way out of your issue ;-)

-------------
Cheers,
Ingo