Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Determine if a PDF file contains only images
  FAQ FAQ  Forum Search   Register Register  Login Login

Determine if a PDF file contains only images

 Post Reply Post Reply
Author
Message
Eric24 View Drop Down
Team Player
Team Player
Avatar

Joined: 28 Jun 09
Location: Dallas, TX
Status: Offline
Points: 29
Post Options Post Options   Thanks (0) Thanks(0)   Quote Eric24 Quote  Post ReplyReply Direct Link To This Post Topic: Determine if a PDF file contains only images
    Posted: 24 Dec 10 at 8:48PM
Is there an easy way to determine if a given PDF file contains only images and no other "visible" objects? For example, scanned documents are often stored as PDF files, where the visible portion of the file is CCITT images. If this is the case, and nothing else has been added (like an annotations, text, or other objects, then I can simple extract the images directly, rather than "print" the PDF to a capture printer. But if anything has been added to the PDF (other than meta data that isn't visible), then I must "print" the PDF to a capture printer in order to get an accurate representation of the PDF contents.
Back to Top
Shotgun Tom View Drop Down
Senior Member
Senior Member
Avatar

Joined: 14 Aug 09
Location: Phoenix, AZ
Status: Offline
Points: 53
Post Options Post Options   Thanks (0) Thanks(0)   Quote Shotgun Tom Quote  Post ReplyReply Direct Link To This Post Posted: 25 Dec 10 at 2:11PM

Hi Eric and Merry Christmas!

One way would be to look for fonts.  Use the the "HasFontResources" function.  If the return equals false then it probably is a rasterized PDF.
 
For the reverse (text only PDF), you can use the "FindImages" method. 
 
Tom
Back to Top
Eric24 View Drop Down
Team Player
Team Player
Avatar

Joined: 28 Jun 09
Location: Dallas, TX
Status: Offline
Points: 29
Post Options Post Options   Thanks (0) Thanks(0)   Quote Eric24 Quote  Post ReplyReply Direct Link To This Post Posted: 25 Dec 10 at 6:00PM
Merry Christmas to you, too!
 
I had thought of that, but it doesn't cover the possibility that someone added "markup" in the form of lines, shapes, or other non-text objects. Is there a similar call that would "reveal" such objects? If so, is there anything else that's "visible" other than images, fonts, and "other non-text objects"?
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3529
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 25 Dec 10 at 6:35PM
Hi Eric!
 
You can use the textextraction from each page.
If there's no result then there are only images.
 
Another thing to determine if there are scanned
pages without ocr-run: Check if there are as many
imageobjects as there are pages.
 
Cheers, Ingo
 
Back to Top
Eric24 View Drop Down
Team Player
Team Player
Avatar

Joined: 28 Jun 09
Location: Dallas, TX
Status: Offline
Points: 29
Post Options Post Options   Thanks (0) Thanks(0)   Quote Eric24 Quote  Post ReplyReply Direct Link To This Post Posted: 03 Jan 11 at 2:59AM
Thanks! That sounds like a very solid approach.
Back to Top
Dimitry View Drop Down
Team Player
Team Player


Joined: 18 Feb 10
Status: Offline
Points: 37
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dimitry Quote  Post ReplyReply Direct Link To This Post Posted: 13 Jan 11 at 10:40AM
Beside font resources and extractable text, PDF file can contain vector graphics, tables or Acroforms that can be visually rendered on the page.
To make sure that there are no visual elements on the page beside images we just need to remove all images from the page.
After this page should become visually empty (blank).
Here is PageContainsImages() function that basically answers are there any images on PDF page.
Your opinion and testing results are welcome.
 
procedure ClonePageDimensions(QPL: TQuickPDF;
    SourcePage, TargetPage: Integer);
  type
    TPageBox = record
      Left: Double;
      Top: Double;
      Width: Double;
      Height: Double;
    end;
  var
    i: Integer;
    width, height: Double;
    rotation: Integer;
    boxes: array [1..5] of TPageBox;
  begin
    with QPL do
    begin
      // Reading dimensions from Source Page
      SelectPage(SourcePage);
      width := PageWidth;
      height := PageHeight;
      rotation := PageRotation;
      for i := 1 to 5 do
      begin
        boxes.Left := GetPageBox(i, 0);
        boxes.Top := GetPageBox(i, 1);
        boxes.Width := GetPageBox(i, 2);
        boxes.Height := GetPageBox(i, 3);
      end;
      // Saving dimensions to Target Page
      SelectPage(TargetPage);
      SetPageDimensions(width, height);
      RotatePage(rotation);
      for i := 1 to 5 do
        SetPageBox(i, boxes.Left, boxes.Top, boxes.Width, boxes.Height);
    end;
  end;
 
  function PageContainsImages(QPL: TQuickPDF;
    Page: Integer; DPI: Integer): Boolean;
  var
    i: Integer;
    doc, doc_tmp: Integer;
    s, s_tmp: AnsiString;
  begin
    Result := False;
    with QPL do
    begin
      try
        // custom Document is selected
        if FindImages = 0 then
          Exit;
        doc := SelectedDocument;
        doc_tmp := NewDocument;
        // temporary Document is selected
        CopyPageRanges(doc, IntToStr(Page));
        // Page 2 contains customer's page copy atm
        SelectPage(2);
        // clear all image content on Page 2
        for i := 1 to FindImages do
          ClearImage(GetImageID(i));
        // Page 1 is empty and its dimensions should be equal to Page 2
        ClonePageDimensions(QPL, 2, 1);
        s := RenderPageToString(DPI, 2, 0);
        s_tmp := RenderPageToString(DPI, 1, 0);
        // Compare Page 1 and Page 2 by size and content
        if Length(s) <> Length(s_tmp) then
          Exit;
        Result := True;
        for i := 1 to Length(s) do
        if s <> s_tmp then
        begin
          Result := False;
          Exit;
        end;
      finally
        RemoveDocument(doc_tmp);
      end;
    end;
  end;
 


Edited by Dimitry - 13 Jan 11 at 10:44AM
Regards,
Dmitry
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store