Thresholding methods are applied here to document images and their experimental results compared. In one set of tests, different thresholding methods are used to binarize document images, then optical character recognition (OCR) is performed on the resulting text and the recognition results are compared. In the other set of tests, multi-thresholding is performed on document images-to obtain three or more levels for images with more than binary levels-and the results are compared. Four thresholding methods are compared in the experiments: a discriminant analysis method, a maximum entropy method, a moment-preserving method, and a connectivity-preserving method. A method using a minimum-error criterion is also commented upon. The moment-preserving and connectivity-preserving methods are found to yield the best OCR results from the binarized images, and the connectivity-preserving method yields the fewest binarization and multi-thresholding failures.
展开▼