首页> 外文学位 >Document image binarization based on texture analysis.
【24h】

Document image binarization based on texture analysis.

机译:基于纹理分析的文档图像二值化。

获取原文
获取原文并翻译 | 示例

摘要

Document image binarization has been a long standing problem for unconstrained document images. Although various thresholding algorithms have been developed over the years, problems associated with strong noise, complex patterns, poor contrast, and variable modalities in gray-scale histograms still limit the performance of document image analysis systems. Given the unpredictable nature of these image attributes, few thresholding algorithms work consistently well for document image binarization. This dissertation presents texture feature based thresholding algorithms to address these difficulties.; The philosophy of our thresholding approach is that the texture domain knowledge of document images is important to judge the binarization quality and thus guide the binarization process; that is, suitably defined texture features of document images can be used to assist the optimal threshold selection.; Our thresholding scheme consists of three steps. First, candidate thresholds are produced through the iterative use of Otsu's algorithm. Second, texture features associated with each candidate threshold are extracted from the run-length histogram of the accordingly binarized image. Third, the optimal threshold is selected so that the most desirable document texture features are preserved. This thresholding scheme was implemented in both global and adaptive modes. With our program design the algorithms require only one image scan pass, facilitating their hardware implementation for a commercial system.; Experimental results with 9000 machine printed address blocks from an unconstrained US mail stream demonstrated that over 99.6% of the images were well binarized by our thresholding method, which are appreciably better than those obtained by existing thresholding techniques. Also a system run with 500 difficult mail address blocks showed that an 8.1% higher character recognition rate was achieved by our algorithm in comparison to that by Otsu's algorithm.
机译:对于不受约束的文档图像,文档图像二值化一直是一个长期存在的问题。尽管多年来已经开发了各种阈值算法,但是与灰度直方图中的强噪声,复杂图案,差的对比度以及可变模态相关的问题仍然限制了文档图像分析系统的性能。考虑到这些图像属性的不可预测性,很少有阈值算法可以很好地用于文档图像二值化。本文提出了基于纹理特征的阈值算法来解决这些难题。我们的阈值方法的理念是文档图像的纹理域知识对于判断二值化质量并从而指导二值化过程很重要;也就是说,可以使用适当定义的文档图像纹理特征来辅助最佳阈值选择。我们的阈值方案包括三个步骤。首先,通过重复使用Otsu算法生成候选阈值。其次,从相应的二值化图像的游程直方图提取与每个候选阈值关联的纹理特征。第三,选择最佳阈值,以便保留最理想的文档纹理特征。此阈值方案已在全局和自适应模式下实施。通过我们的程序设计,这些算法只需要进行一次图像扫描即可,从而简化了其在商业系统中的硬件实现。来自不受限制的美国邮件流的9000机器打印的地址块的实验结果表明,超过99.6%的图像已通过我们的阈值方法很好地进行了二值化,这比通过现有阈值技术获得的图像明显更好。同样,使用500个困难邮件地址块运行的系统显示,与大津的算法相比,我们的算法可将字符识别率提高8.1%。

著录项

  • 作者

    Liu, Ying.;

  • 作者单位

    State University of New York at Buffalo.;

  • 授予单位 State University of New York at Buffalo.;
  • 学科 Computer Science.; Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 1995
  • 页码 136 p.
  • 总页数 136
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号