Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
Image Legibility |
Post Reply |
Author | |
bb46970
Team Player Joined: 06 Mar 06 Status: Offline Points: 33 |
Post Options
Thanks(0)
Posted: 14 Mar 08 at 5:32PM |
I have a client who scans text documents. They do not perform OCR. They just save the text as images, in the PDFs. Sometimes the people scanning do poor jobs, resulting in some pages that are really dark or black. I am looking for a way to programmatically check the pages and see if any of the pages are "suspect."
|
|
peteratoce
Beginner Joined: 23 Feb 07 Location: Germany Status: Offline Points: 8 |
Post Options
Thanks(0)
|
First of all, your problem is more in the field of image processing than in PDF handling. Further, it is almost impossible to generate a good image from a really bad scan.
That said, you can have a look at e.g. PixEdit, which allows you to load PDFs and offers a COM interface to do all kinds of operations on your (hopefully monochrome?) images.
If you want to identify images that are too dark you probably would have to look at a region in the margin that should be white = without pixels.
There is unfortunately no API function in PixEdit that returns the number of black pixels in a given area, but you can excerpt an area to file and then perhaps turn to ImageMagick for the counting of black pixels.
On second thought, simply save to format "Uncompressed, No header", read in the bytes and count the number of "1"-bits in each byte yourself.
Peter Edited by peteratoce - 18 Mar 08 at 9:10AM |
|
bb46970
Team Player Joined: 06 Mar 06 Status: Offline Points: 33 |
Post Options
Thanks(0)
|
Thanks for the reply. I came-up with an option that may work. I believe that they do scan the documents as 1-bit. However, I went with a grey scale option. I use scanline, to examine each pixel. I add the red, green, and blue values, for each pixel. If it falls below 50%, I assume that it is "dark." If necessary, I can adjust the 50%. I keep a tally of all of the dark pixels. Then I set a threshold for the page. For example, if 45% of the page is dark pixels, I flag it as "suspect." I do that for each page in the document. My only concern is finding suspect pages, for a human to examine, and determine if the document needs to be rescanned.
If anyone has a better option - particularly faster - I am open to it. Some of these documents are hundreds of pages long, and I may have to process hundreds of documents at a time.
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store