Print Page | Close Window

get images and all sizes?

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3721
Printed Date: 03 May 24 at 12:02PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: get images and all sizes?
Posted By: johnny
Subject: get images and all sizes?
Date Posted: 18 Jun 19 at 5:31PM
hi all

i want to determin if a pdf uses images as background or in general take all images and check their sizes.
with the FindImages i get the count,
after that what function exist to loop into the found images and check their sizes?


C#

thank you






Replies:
Posted By: Ingo
Date Posted: 18 Jun 19 at 9:58PM
Hi Johnny.

this sample could be a kickstart for your first codes...
https://www.debenu.com/kb/extract-images-from-pdf-files-as-the-appropriate-image-type/

...and GetImageListItemIntProperty is the most important function for this:
https://www.debenu.com/docs/pdf_library_reference/GetImageListItemIntProperty.php

Good luck and much success :)



-------------
Cheers,
Ingo



Posted By: johnny
Date Posted: 19 Jun 19 at 10:30AM
Originally posted by Ingo Ingo wrote:

Hi Johnny.

this sample could be a kickstart for your first codes...
https://www.debenu.com/kb/extract-images-from-pdf-files-as-the-appropriate-image-type/

...and GetImageListItemIntProperty is the most important function for this:
https://www.debenu.com/docs/pdf_library_reference/GetImageListItemIntProperty.php

Good luck and much success :)

thanks so much IngoStar

the tricky part was to find this
401 = Width in pixels
402 = Height in pixels

there is no properly named properties you need to pass "int code numbers" and that makes is hard during search.... Dead



Posted By: johnny
Date Posted: 19 Jun 19 at 11:27AM
btw why the GetPageImageList() usually start with 0 and not 1?
i mean in other functions the page of your pdf document starts from 1 not index 0, but in this function seems to start from 0.


Posted By: johnny
Date Posted: 19 Jun 19 at 3:23PM
since sharing is caring here is my function in case it will help someone else with similar question in the future.

private Boolean IsValidPDF(String FilePath)
        {
            //force proper handling of all digits and points. 1.2345,67
            CultureInfo.DefaultThreadCurrentCulture = new CultureInfo("el-GR", false);

            ValidationErrorMsg = String.Empty;

            //check if the file is corrupted or in use or something else is wrong with it
            if (DPL.LoadFromFile(FilePath, "") != 1)
            {
                ValidationErrorMsg = string.Format("Error code {0}, while loading file {1}", DPL.LastErrorCode(), FilePath);
                return false;
            }

            //check if the file doesn't contain any text at all so is a 100% image/photo/scan
            if (DPL.HasFontResources() == 0)
            {
                ValidationErrorMsg = "This file doesn't contain any text!";
                return false;
            }

            //check if the file contains images that are big enough (background) to mean that this is not a proper text pdf and should be handled by ocr scan
            Int32 il = DPL.GetPageImageList(0);
            Int32 lc = DPL.GetImageListCount(il);

            for (Int32 i = 1; i <= lc; ++i)
            {
                Int32 imgWidth = DPL.GetImageListItemIntProperty(il, i, 401);
                Int32 imgHeight = DPL.GetImageListItemIntProperty(il, i, 402);

                if (imgWidth > 1500 || imgHeight > 1500)
                {
                    ValidationErrorMsg = "This file seems like an image!";
                    return false;
                }
            }

            return true;
        }


Posted By: Ingo
Date Posted: 19 Jun 19 at 7:21PM
a small hint:
I've test-pdfs from architects with a lot of text and numbers on the pages and embedded cad images with extreme dimensions.
You can have a text-pdf with standard pdf dimensions with a small image inside which has huge dimensions after extraction.



-------------
Cheers,
Ingo



Posted By: johnny
Date Posted: 19 Jun 19 at 7:49PM
is ok...cause i am only interested in pdf that are A4 or similar size invoices for my app. no chance a CAD will be used for this. just invoices that are scanned and imported automatically to many popular Account Application :)



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk