首页> 外文会议>International Conference for Convergence in Technology >Degraded Document Image Binarization using Novel Background Estimation Technique
【24h】

Degraded Document Image Binarization using Novel Background Estimation Technique

机译:利用新颖的背景估计技术降级文献图像二值化

获取原文
获取外文期刊封面目录资料

摘要

Over the past few decades, the use of scanned historical document images has increased dramatically, especially with the emergence of online libraries and standard benchmark datasets like DIBCO. The historical documents are usually in very-poor conditions containing noises like large ink stains, bleed-through, liquid spills, uneven-background, spots, faded-ink, weak/thin text that makes the task of binarization very difficult. In this paper, we propose an effective degraded document image binarization algorithm that performs accurate text segmentation. Our method first estimates the background utilizing information from neighboring pixels and filter smoothening. The next step is background subtraction that helps in the compensation of background distortions. The document is segmented using Otsu thresholding, and then we process the image to remove the remaining noise and maximize text content using labelled connected components. Our method outperforms several existing and widely used binarization algorithms on F-measure, PSNR, DRD, and pseudo F-measure when evaluated on H-DIBCO 2016 and H-DIBCO 2018 datasets and can very effectively detect faint characters from a document image.
机译:在过去的几十年中,使用扫描的历史文档图像的使用显着增加,特别是在线图书馆的出现和Dibco等标准基准数据集。历史文件通常在含有大型墨水污渍,渗透,液体溢出,不均匀的斑点,斑点,褪色的墨水,弱/薄文本中的噪声的非常差的情况下,这使得二值化的任务非常困难。在本文中,我们提出了一种有效的降级文档图像二值化算法,其执行准确的文本分段。我们的方法首先利用来自相邻像素和滤波平滑的信息的背景。下一步是背景减法,有助于补偿背景失真。该文档使用OTSU阈值处理分段,然后我们处理图像以删除剩余的噪声并使用标记的连接组件最大化文本内容。当在H-Dibco 2016和H-Dibco 2018数据集上评估时,我们的方法优于F-Measure,PSNR,DRD和Pseudo F测量的几种现有和广泛使用的二值化算法,并且可以非常有效地从文档图像中检测到微弱字符。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号