I need help - I can help - Determine page type?

Print Page | Close Window

Determine page type?

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1191
Printed Date: 07 Jan 26 at 6:40AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com

Topic: Determine page type?

Posted By: Sam
Subject: Determine page type?
Date Posted: 28 Aug 09 at 8:43PM

Greetings,

I'm working on a PDF redaction product that will be used by our clients. This is a secure redaction process and the only way I know to do this is to render the PDF page to image, draw on the image, then save it back to PDF. This is working fine, but it's incredibly slow. I've got a few questions I'm hoping someone here can answer.

I've tried working differently depending on the document type. I try to determine the make up of the document by finding images and if images = number of pages I assume these are scanned pages and extract the Tiff. Otherwise, I render document to file then convert the rendered pages to tiff. My redaction software creates thumbnails and the preview image to draw on from tiff files (I'm using Imageman).

I'm using an old version of QuickPDF (4.49) and am wondering if some of the bugs and slowness have been improved in the latest version.

FindImages seems to extract jpegs into the PDF's directory, this is quite annoying, anyone know anything about that?

FindImages is really slow, is there a better/faster way to determine page type (text/image or scanned image)?

Lastly, has PDF rendering been improved? I've found the only acceptable text -> image rendering is png at 150dpi. Anything less than that and there's too much detail lost in the text.

Here is the code I'm using (yes, it's vb5):
    'QP is the QuickPDF ActiveX component

    If QP.LoadFromFile(sPDFFileName) = 1 Then    'loads document
        lTotal = QP.PageCount

        notTiff = True
        rendered = False
        sPDFShortName = Mid$(sPDFFileName, 1, Len(sPDFFileName) - 4)
        If lTotal > 0 Then
            images = QP.FindImages
             If images = lTotal Then
                notTiff = False
                For page = 1 To images
                    imgID = QP.ImageID(page)
                    QP.SelectImage (imgID)
                    If QP.ImageType = 3 And QP.ImageWidth > 400 And QP.ImageHeight > 600 Then
                        QP.SaveImageToFile sPDFShortName & page & ".tif"
                    Else
                        notTiff = True
                    End If
                Next page
            End If
            If notTiff Then
                QP.RenderDocumentToFile 150, 1, lTotal, 5, sPDFShortName & ".png"
                For iC = 1 To lTotal
                    frmUploadDocs.imPreview.Picture = sPDFShortName & iC & ".png"
                    sTempFile = sPDFShortName & iC & ".tif"
                    If FileExist(sTempFile) Then Kill sTempFile
                    frmUploadDocs.imPreview.SaveAs sPDFShortName & iC & ".tif"
                    Kill sPDFShortName & iC & ".png"
                Next iC
                rendered = True
            End If
        End If
        ConvertPDFtoTIF = lTotal ' success
    End If

Thanks for any help

Sam

Replies:

Posted By: Michel_K17
Date Posted: 29 Aug 09 at 2:38PM

Hello Sam,

   I can answer a few of your questions (but not the one about finding images - sorry).

   The library has absolutely improved dramatically since v4.49, in two ways. Today, we are at v7.15 which reflects the last of the improvements from iSed, the improvements from the user community, and the work done by Debenu and their programmers. By "improvements", I mean that a large number of bug fixes have been addressed as well as improved compatibility with PDF content. You mentioned rendering in particular, and yes, that portion of the code is far better - with rendering that now matches the rendering to Adobe's Reader in terms of quality.

   Finally, Debenu is steadily adding new features for which I am very thankful for as it brings the library back in line with the new technology being brought to the PDF format. For example, this includes the ability to digitally sign documents, and so much more.

   To be sure, it's a never ending task, but Debenu has been really pro-active at regular updates and addressing specific requests from the users when they can.

   Hopefully, someone else can address your image question.

   But, there is no doubt that you should upgrade, as a minimum, to the last version that iSed published with the modifications from the users. This would be a free upgrade for you. It's available [ http://www.quickpdflibrary.com/products/quickpdf/trial.php - here ].

   There is a list [ http://www.quickpdflibrary.com/products/quickpdf/updates.php - here ] of all the improvements by Debenu since v5.11 that you should take a look at. I believe that the offer to upgrade to the v7.xx series is still available to the users of the old version (you will need to provide proof of ownership). As I recall, they offer a $100 discount. The purchase page is [ https://secure.shareit.com/shareit/checkout.html?PRODUCT%5B300303640%5D=1&DELIVERY%5B300303640%5D=EML&stylefrom=168179&backlink=http%3A%2F%2Fwww.quickpdflibrary.com%2Fstore%2Fquickpdf%2Findex.php - here ].

   I hope that helps.

   Cheers!

Michel

-------------
Michel

Posted By: Shotgun Tom
Date Posted: 29 Aug 09 at 5:32PM

A couple of thoughts for you, Sam.

1. HasFontResources is a fairly quick way to determine if the entire document consists of images. From the QuickPDF Manual: Determines if the selected document has font resources. If the document does not it can be assumed to be an image only PDF.

2. I'm not all that familar with Imageman... however there is an ActiveX component called GdPicture Imaging SDK at http://www.gdpicture.com - www.gdpicture.com . At one point it directly supported the ised library. This component has a method that quickly converts a pdf (and pdf/a) to multipage tiff and also multipage tiff to pdf or pdf/a. The package includes a viewer that renders pdf and multipage tiff very quickly. In combination with the latest QuickPDF library you would have a very powerful pdf/tiff toolbox.

Posted By: Sam
Date Posted: 03 Sep 09 at 5:40PM

Thanks for the help. I upgraded to 7.15 and it is indeed quite a bit faster. I was also able to reduce DPI which made the resulting image files smaller.

Tom, if I have time I may play with hasfontresources. I don't know how that would work though, if a page is made up of multiple images, or if that's even common enough to worry about.

Posted By: Ingo
Date Posted: 03 Sep 09 at 6:56PM

Hi!

It's not a must that in an "only-image-pdf" there are no fontresources. I have a sample as really "only-image-pdf" with helvetica.

What you can do is to extract the textcontent. If there isn't any textcontent and if there are embedded images (function FindImages) then you can be pretty sure that it's a scanned or image-only-pdf.

Cheers, Ingo