Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Image Legibility
  FAQ FAQ  Forum Search   Register Register  Login Login

Image Legibility

 Post Reply Post Reply
Author
Message
bb46970 View Drop Down
Team Player
Team Player


Joined: 06 Mar 06
Status: Offline
Points: 33
Post Options Post Options   Thanks (0) Thanks(0)   Quote bb46970 Quote  Post ReplyReply Direct Link To This Post Topic: Image Legibility
    Posted: 14 Mar 08 at 5:32PM
I have a client who scans text documents.  They do not perform OCR.  They just save the text as images, in the PDFs.  Sometimes the people scanning do poor jobs, resulting in some pages that are really dark or black.  I am looking for a way to programmatically check the pages and see if any of the pages are "suspect."
Back to Top
peteratoce View Drop Down
Beginner
Beginner


Joined: 23 Feb 07
Location: Germany
Status: Offline
Points: 8
Post Options Post Options   Thanks (0) Thanks(0)   Quote peteratoce Quote  Post ReplyReply Direct Link To This Post Posted: 18 Mar 08 at 8:49AM
First of all, your problem is more in the field of image processing than in PDF handling. Further, it is almost impossible to generate a good image from a really bad scan.
 
That said, you can have a look at e.g. PixEdit, which allows you to load PDFs and offers a COM interface to do all kinds of operations on your (hopefully monochrome?) images.
If you want to identify images that are too dark you probably would have to look at a region in the margin that should be white = without pixels. 
 
There is unfortunately no API function in PixEdit that returns the  number of black pixels in a given area, but you can excerpt an area to file and then perhaps turn to ImageMagick for the counting of black pixels.
On second thought, simply save to format "Uncompressed, No header", read in the bytes and count the number of "1"-bits in each byte yourself.
 
Peter


Edited by peteratoce - 18 Mar 08 at 9:10AM
Back to Top
bb46970 View Drop Down
Team Player
Team Player


Joined: 06 Mar 06
Status: Offline
Points: 33
Post Options Post Options   Thanks (0) Thanks(0)   Quote bb46970 Quote  Post ReplyReply Direct Link To This Post Posted: 18 Mar 08 at 11:37AM
Thanks for the reply.  I came-up with an option that may work.  I believe that they do scan the documents as 1-bit.  However, I went with a grey scale option.  I use scanline, to examine each pixel.  I add the red, green, and blue values, for each pixel.  If it falls below 50%, I assume that it is "dark."  If necessary, I can adjust the 50%.  I keep a tally of all of the dark pixels.  Then I set a threshold for the page.  For example, if 45% of the page is dark pixels, I flag it as "suspect."  I do that for each page in the document.  My only concern is finding suspect pages, for a human to examine, and determine if the document needs to be rescanned.
 
If anyone has a better option - particularly faster - I am open to it.  Some of these documents are hundreds of pages long, and I may have to process hundreds of documents at a time.
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store