Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
![]() |
Determine if a PDF file contains only images |
Post Reply ![]() |
Author | |
Eric24 ![]() Team Player ![]() ![]() Joined: 28 Jun 09 Location: Dallas, TX Status: Offline Points: 29 |
![]() ![]() ![]() ![]() ![]() Posted: 24 Dec 10 at 8:48PM |
Is there an easy way to determine if a given PDF file contains only images and no other "visible" objects? For example, scanned documents are often stored as PDF files, where the visible portion of the file is CCITT images. If this is the case, and nothing else has been added (like an annotations, text, or other objects, then I can simple extract the images directly, rather than "print" the PDF to a capture printer. But if anything has been added to the PDF (other than meta data that isn't visible), then I must "print" the PDF to a capture printer in order to get an accurate representation of the PDF contents.
|
|
![]() |
|
Shotgun Tom ![]() Senior Member ![]() ![]() Joined: 14 Aug 09 Location: Phoenix, AZ Status: Offline Points: 53 |
![]() ![]() ![]() ![]() ![]() |
Hi Eric and Merry Christmas! One way would be to look for fonts. Use the the "HasFontResources" function. If the return equals false then it probably is a rasterized PDF.
For the reverse (text only PDF), you can use the "FindImages" method.
Tom
|
|
![]() |
|
Eric24 ![]() Team Player ![]() ![]() Joined: 28 Jun 09 Location: Dallas, TX Status: Offline Points: 29 |
![]() ![]() ![]() ![]() ![]() |
Merry Christmas to you, too!
I had thought of that, but it doesn't cover the possibility that someone added "markup" in the form of lines, shapes, or other non-text objects. Is there a similar call that would "reveal" such objects? If so, is there anything else that's "visible" other than images, fonts, and "other non-text objects"?
|
|
![]() |
|
Ingo ![]() Moderator Group ![]() ![]() Joined: 29 Oct 05 Status: Offline Points: 3529 |
![]() ![]() ![]() ![]() ![]() |
Hi Eric!
You can use the textextraction from each page.
If there's no result then there are only images.
Another thing to determine if there are scanned
pages without ocr-run: Check if there are as many
imageobjects as there are pages.
Cheers, Ingo
|
|
![]() |
|
Eric24 ![]() Team Player ![]() ![]() Joined: 28 Jun 09 Location: Dallas, TX Status: Offline Points: 29 |
![]() ![]() ![]() ![]() ![]() |
Thanks! That sounds like a very solid approach.
|
|
![]() |
|
Dimitry ![]() Team Player ![]() Joined: 18 Feb 10 Status: Offline Points: 37 |
![]() ![]() ![]() ![]() ![]() |
Beside font resources and extractable text, PDF file can contain vector graphics, tables or Acroforms that can be visually rendered on the page.
To make sure that there are no visual elements on the page beside images we just need to remove all images from the page.
After this page should become visually empty (blank).
Here is PageContainsImages() function that basically answers are there any images on PDF page.
Your opinion and testing results are welcome.
procedure ClonePageDimensions(QPL: TQuickPDF;
SourcePage, TargetPage: Integer); type TPageBox = record Left: Double; Top: Double; Width: Double; Height: Double; end; var i: Integer; width, height: Double; rotation: Integer; boxes: array [1..5] of TPageBox; begin with QPL do begin // Reading dimensions from Source Page SelectPage(SourcePage); width := PageWidth; height := PageHeight; rotation := PageRotation; for i := 1 to 5 do begin boxes.Left := GetPageBox(i, 0); boxes.Top := GetPageBox(i, 1); boxes.Width := GetPageBox(i, 2); boxes.Height := GetPageBox(i, 3); end; // Saving dimensions to Target Page SelectPage(TargetPage); SetPageDimensions(width, height); RotatePage(rotation); for i := 1 to 5 do SetPageBox(i, boxes.Left, boxes.Top, boxes.Width, boxes.Height); end; end; function PageContainsImages(QPL: TQuickPDF;
Page: Integer; DPI: Integer): Boolean; var i: Integer; doc, doc_tmp: Integer; s, s_tmp: AnsiString; begin Result := False; with QPL do begin try // custom Document is selected if FindImages = 0 then Exit; doc := SelectedDocument; doc_tmp := NewDocument; // temporary Document is selected CopyPageRanges(doc, IntToStr(Page)); // Page 2 contains customer's page copy atm SelectPage(2); // clear all image content on Page 2 for i := 1 to FindImages do ClearImage(GetImageID(i)); // Page 1 is empty and its dimensions should be equal to Page 2 ClonePageDimensions(QPL, 2, 1); s := RenderPageToString(DPI, 2, 0); s_tmp := RenderPageToString(DPI, 1, 0); // Compare Page 1 and Page 2 by size and content if Length(s) <> Length(s_tmp) then Exit; Result := True; for i := 1 to Length(s) do if s <> s_tmp then begin Result := False; Exit; end; finally RemoveDocument(doc_tmp); end; end; end; Edited by Dimitry - 13 Jan 11 at 10:44AM |
|
Regards,
Dmitry |
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store