Degraded Document Image Binarization using Novel Background Estimation Technique

机译：利用新颖的背景估计技术降级文献图像二值化

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Over the past few decades, the use of scanned historical document images has increased dramatically, especially with the emergence of online libraries and standard benchmark datasets like DIBCO. The historical documents are usually in very-poor conditions containing noises like large ink stains, bleed-through, liquid spills, uneven-background, spots, faded-ink, weak/thin text that makes the task of binarization very difficult. In this paper, we propose an effective degraded document image binarization algorithm that performs accurate text segmentation. Our method first estimates the background utilizing information from neighboring pixels and filter smoothening. The next step is background subtraction that helps in the compensation of background distortions. The document is segmented using Otsu thresholding, and then we process the image to remove the remaining noise and maximize text content using labelled connected components. Our method outperforms several existing and widely used binarization algorithms on F-measure, PSNR, DRD, and pseudo F-measure when evaluated on H-DIBCO 2016 and H-DIBCO 2018 datasets and can very effectively detect faint characters from a document image.

机译：在过去的几十年中，使用扫描的历史文档图像的使用显着增加，特别是在线图书馆的出现和Dibco等标准基准数据集。历史文件通常在含有大型墨水污渍，渗透，液体溢出，不均匀的斑点，斑点，褪色的墨水，弱/薄文本中的噪声的非常差的情况下，这使得二值化的任务非常困难。在本文中，我们提出了一种有效的降级文档图像二值化算法，其执行准确的文本分段。我们的方法首先利用来自相邻像素和滤波平滑的信息的背景。下一步是背景减法，有助于补偿背景失真。该文档使用OTSU阈值处理分段，然后我们处理图像以删除剩余的噪声并使用标记的连接组件最大化文本内容。当在H-Dibco 2016和H-Dibco 2018数据集上评估时，我们的方法优于F-Measure，PSNR，DRD和Pseudo F测量的几种现有和广泛使用的二值化算法，并且可以非常有效地从文档图像中检测到微弱字符。

著录项

来源
《International Conference for Convergence in Technology》|2021年|1-8|共8页
会议地点
作者
Harshit Jindal; Manoj Kumar; Akhil Tomar; Ayush Malik;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Image segmentation; Liquids; Estimation; Ink; Filtering algorithms; Approximation algorithms; Libraries;

机译：图像分割;液体;估计;墨水;过滤算法;近似算法;图书馆;

相似文献

外文文献
中文文献
专利

1. Improved Degraded Document Image Binarization Using Median Filter for Background Estimation [J] . Khitas Mehdi, Ziet Lahcene, Bouguezel Saad Elektronika ir Elektrotechnika . 2018,第3期

机译：使用中位滤波器改进了降级的文档图像二值化以进行背景估计
2. Structural feature-based evaluation method of binarization techniques for word retrieval in the degraded Arabic document images [J] . Sari Toufik, Kefali Abderrahmane, Bahi Halima International Journal on Document Analysis and Recognition . 2016,第1期

机译：基于结构特征的二值化技术在退化阿拉伯文档图像中检索词的评估方法
3. Fast Binarization of Unevenly Illuminated Document Images Based on Background Estimation for Optical Character Recognition Purposes [J] . Hubert Michalak, Krzysztof Okarma Journal of Universal Computer Science . 2019,第6期

机译：基于背景估计的不均匀照明文档图像的快速二值化，用于光学字符识别
4. Multi-spectral document image binarization using image fusion and background subtraction techniques [C] . Mitianoudis Nikolaos, Papamarkos Nikolaos IEEE International Conference on Image Processing . 2014

机译：使用图像融合和背景减法技术的多光谱文档图像二值化
5. Effective and efficient binarization of degraded document images. [D] . Parker, Jon Ivan. 2016

机译：对退化的文档图像进行有效和高效的二值化。
6. Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition [O] . Hubert Michalak, Krzysztof Okarma 2020

机译：非均匀照明文档图像的鲁棒组合二值化方法用于字母数字字符识别
7. Document Image Binarization Technique for Degraded Document Images [O] . Supriya Lokhande 2015

机译：降级文档图像的文档图像二值化技术

Degraded Document Image Binarization using Novel Background Estimation Technique

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅