Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Removing unreferenced images
  FAQ FAQ  Forum Search   Register Register  Login Login

Removing unreferenced images

 Post Reply Post Reply
Author
Message
petem View Drop Down
Beginner
Beginner
Avatar

Joined: 17 Jun 12
Location: UK
Status: Offline
Points: 2
Post Options Post Options   Thanks (0) Thanks(0)   Quote petem Quote  Post ReplyReply Direct Link To This Post Topic: Removing unreferenced images
    Posted: 17 Jun 12 at 5:07AM
Hi All,

I've recently started using Quick PDF and I've had no problems in general, but I've hit one thing that a customer wants which I can't see a way to do. They have large numbers of PDFs which include a couple of pages of ads at the end that they want to remove because they only need to keep the part before the ads for archiving purposes and the ads consist of image-only full pages so they make up about 95% of the file size. The simple solution seemed to be just to delete the ad pages, but when I do that the file size hardly drops at all.

I'm assuming that this is because the images are stored as resources in the PDF header and removing the pages doesn't automatically remove them, it just leaves them unreferenced but still present. I can see why that might be better from a performance point of view but in this case I do need all unreferenced images to be removed. I also tried extracting the pages they want to keep into a new PDF but again the file size hardly reduces at all so I assume that all the image resources are placed into the new PDF just in case they might be needed (which they aren't in this case).

So is there any way to delete pages or extract pages into a new PDF such that it would cause those images which are no longer referenced to be removed? Or failing that a function which optimizes a PDF file including removal of unreferenced resources?

I thought I might be able to engineer this myself by stepping through the pages I'm about to remove and calling GetPageImageList on each one then using ClearImage to shrink the image size (which would be good enough), but unfortunately I then only have access to an image list rather than individual image IDs and I can't see any way to find an image ID from the list. That means I can only use the functions which have a list-specific version (e.g. GetImageListItemIntProperty). ClearImage needs an image ID which I can't get from the list so without a ClearImageListImage function or similar, I can't see any way to clear the images.

Of course I can get to them from the document level and I can then clear them, but I can't see any way to know for sure which images are used on the pages I'm about to delete. It's unlikely any image would be used on more than one page so multiple references aren't really a problem in this case, but without knowing which page each image is used on I could be clearing one that's on a page I'm keeping. I could try looking at the image sizes as they're generally larger than I'd expect to see on the non-ad pages but that's the sort of approach I suspect would come back and bite me when they suddenly decide to add large images on the keeper pages.

If anyone has any thoughts on how I can make this work reliably I'd really appreciate it.

Alternately, perhaps I should just send out our Sales Prevention Officer and see if he can get rid of this particular customer! LOL

Pete
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3529
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 17 Jun 12 at 12:11PM
Hi Pete!

Deleting referenced and always inserted (longer ago) images with QuickPDF isn't possible.
You can only create a new pdf-document with QuickPDF.
For this you can work with these functionalities:
http://www.quickpdflibrary.com/help/quickpdf/PageManipulation.php
with this you can built your new document.
I think this will work without referenced images (they will be directly on each page).
Additional idea: With the extract-functionalities:http://www.quickpdflibrary.com/help/quickpdf/Extraction.php
you can check if there's text on a page... if not it could be an image.

Cheers and welcome here,
Ingo

 
Back to Top
petem View Drop Down
Beginner
Beginner
Avatar

Joined: 17 Jun 12
Location: UK
Status: Offline
Points: 2
Post Options Post Options   Thanks (0) Thanks(0)   Quote petem Quote  Post ReplyReply Direct Link To This Post Posted: 17 Jun 12 at 11:52PM
Hi Ingo,

Thanks for the reply! I'm still not sure I can see a solution though.

I realise you can't actually delete an image, but using ClearImage would be ok as it reduces the size the image takes up to virtually zero, which is all they really want.

I understand about creating a new document, but that's ok because I can do it by extracting just the pages they want to keep and putting them into the new document, or I can open the existing PDF and delete the pages they don't want. Both of those work fine and produce a new PDF which only has the pages they want. The problem is that doing either of those still leaves the image data in the file even though none of the pages left in the file are using any images, so the file is 10 times bigger than it needs to be.

My problem with using ClearImage is that I can't identify which images to use it on, because I can only get an image ID through FindImages and that lists all images in the file without telling me which page each image is used on (I understand why it doesn't, but it's still a problem for me). I could clear every image in the file but there might be images on those pages that aren't going to be deleted, and obviously I don't want to clear those images as there would be gaps in the content. So I need to be able to tell which images are used on the pages I'm going to delete.

I can use GetPageImageList to do that and it does find the images that are on a particular page, but it doesn't return an image ID - it returns an image list and I can't see any way to find an image ID from the image list, so I can't call ClearImage.

So I can either get a list of images which I could clear but which are not linked to a page, or I can get a list of images which are linked to a page but then I can't clear them.

I'd already spent a lot of time looking through all of the functions in the reference guide, but nothing there seems to provide a solution as far as I can see, which is why I thought I'd ask in case I've missed something or one of the functions does something that's not obvious.

Oh well, looks like I may have to tell them we can only do it with limited reliability. I don't believe they're expecting to have any images on the pages we keep so maybe just clearing *all* the images in the file will be acceptable to them. I just know that in 6 months they will have forgotten about this and will put a logo on the first page and wonder why it ends up blank!

Thanks for your help.

Pete



Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 18 Jun 12 at 8:26AM
You can call the QP.ClearImage(imageid); to repleace the large image with a 1x1 single transparent image.  Removing images is fraught with dander but replacing the image danger should cause much less problems and will reduce the overall filesize.

You may still be able to filter the images by width and height as many advertising banners are of a consistent size.  if (QP.ImageWidth() = 200 and QP.ImageHeight = 100) then QP.ClearImage().

Andrew.


Edited by AndrewC - 18 Jun 12 at 8:29AM
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store