首页> 外文会议>2010 20th International Conference on Pattern Recognition >A Self-Training Learning Document Binarization Framework
【24h】

A Self-Training Learning Document Binarization Framework

机译:自我训练学习文档二值化框架

获取原文

摘要

Document Image Binarization techniques have been studied for many years, and many practical binarization techniques have been developed and applied successfully on commercial document analysis systems. However, the current state-of-the-art methods, fail to produce good binarization results for many badly degraded document images. In this paper, we propose a self-training learning framework for document image binarization. Based on reported binarization methods, the proposed framework first divides document image pixels into three categories, namely, foreground pixels, background pixels and uncertain pixels. A classifier is then trained by learning from the document image pixels in the foreground and background categories. Finally, the uncertain pixels are classified using the learned pixel classifier. Extensive experiments have been conducted over the dataset that is used in the recent Document Image Binarization Contest(DIBCO) 2009. Experimental results show that our proposed framework significantly improves the performance of reported document image binarization methods.
机译:已经对文档图像二值化技术进行了多年研究,并且已经开发了许多实用的二值化技术并将其成功应用于商业文档分析系统。但是,当前的最新技术无法对许多质量严重下降的文档图像产生良好的二值化结果。在本文中,我们提出了一种用于文档图像二值化的自训练学习框架。基于报告的二值化方法,该框架首先将文档图像像素分为三类,即前景像素,背景像素和不确定像素。然后通过从前景和背景类别中的文档图像像素中学习来训练分类器。最后,使用学习的像素分类器对不确定像素进行分类。已经对最近的文档图像二值化竞赛(DIBCO)2009中使用的数据集进行了广泛的实验。实验结果表明,我们提出的框架显着提高了已报告文档图像二值化方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号