Debenu Quick PDF Library - PDF SDK Community Forum : Image Legibility

Debenu Quick PDF Library - PDF SDK Community Forum : Image Legibility http://www.quickpdf.org/forum/ Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved. Wed, 20 May 2026 05:08:42 +0000 Tue, 18 Mar 2008 11:37:03 +0000 http://blogs.law.harvard.edu/tech/rss Web Wiz Forums 11.01 360 www.quickpdf.org/forum/RSS_post_feed.asp?TID=878 <![CDATA[Debenu Quick PDF Library - PDF SDK Community Forum]]> http://www.quickpdf.org/forum/forum_images/QPDF_Forum_Title.png http://www.quickpdf.org/forum/ <![CDATA[Image Legibility : Thanks for the reply. I came-up...]]> http://www.quickpdf.org/forum/image-legibility_topic878_post4199.html#4199 Author: bb46970
Subject: 878
Posted: 18 Mar 08 at 11:37AM

Thanks for the reply. I came-up with an option that may work. I believe that they do scan the documents as 1-bit. However, I went with a grey scale option. I use scanline, to examine each pixel. I add the red, green, and blue values, for each pixel. If it falls below 50%, I assume that it is "dark." If necessary, I can adjust the 50%. I keep a tally of all of the dark pixels. Then I set a threshold for the page. For example, if 45% of the page is dark pixels, I flag it as "suspect." I do that for each page in the document. My only concern is finding suspect pages, for a human to examine, and determine if the document needs to be rescanned.

If anyone has a better option - particularly faster - I am open to it. Some of these documents are hundreds of pages long, and I may have to process hundreds of documents at a time.

]]> Tue, 18 Mar 2008 11:37:03 +0000 http://www.quickpdf.org/forum/image-legibility_topic878_post4199.html#4199 <![CDATA[Image Legibility : First of all, your problem is...]]> http://www.quickpdf.org/forum/image-legibility_topic878_post4198.html#4198 Author: peteratoce
Subject: 878
Posted: 18 Mar 08 at 8:49AM

First of all, your problem is more in the field of image processing than in PDF handling. Further, it is almost impossible to generate a good image from a really bad scan.

That said, you can have a look at e.g. PixEdit, which allows you to load PDFs and offers a COM interface to do all kinds of operations on your (hopefully monochrome?) images.

If you want to identify images that are too dark you probably would have to look at a region in the margin that should be white = without pixels.

There is unfortunately no API function in PixEdit that returns the number of black pixels in a given area, but you can excerpt an area to file and then perhaps turn to ImageMagick for the counting of black pixels.

On second thought, simply save to format "Uncompressed, No header", read in the bytes and count the number of "1"-bits in each byte yourself.

Peter

Edited by peteratoce - 18 Mar 08 at 9:10AM]]> Tue, 18 Mar 2008 08:49:31 +0000 http://www.quickpdf.org/forum/image-legibility_topic878_post4198.html#4198 <![CDATA[Image Legibility : I have a client who scans text...]]> http://www.quickpdf.org/forum/image-legibility_topic878_post4192.html#4192 Author: bb46970
Subject: 878
Posted: 14 Mar 08 at 5:32PM

I have a client who scans text documents. They do not perform OCR. They just save the text as images, in the PDFs. Sometimes the people scanning do poor jobs, resulting in some pages that are really dark or black. I am looking for a way to programmatically check the pages and see if any of the pages are "suspect."]]> Fri, 14 Mar 2008 17:32:31 +0000 http://www.quickpdf.org/forum/image-legibility_topic878_post4192.html#4192