A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents

机译：一种鲁棒且无二值化的历史文档文本行检测方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text line extraction from complex handwritten documents, especially for historical collections, is still an unsolved problem. There is a strong demand for reliable and robust approaches since text line extraction is a crucial pre-processing step for modern text recognition and keyword spotting systems. We propose a binarization-free system which employs a newly developed clustering approach based on so-called 'superpixels'. Although multiple ways of generating superpixels were developed in the past, we demonstrate that even a standard method yields impressive results. Our clustering approach is applicable to various scenarios by making use of general characteristics of text lines (e.g., curvilinearity, interline spacings, local homogeneity), and without adapting its parametrization. State-of-the-art results are achieved by the same parametrization for 8 different well-established benchmarking datasets. These datasets cover historical and modern texts as well as images with diverse resolutions and fonts. The system is developed for detecting text lines in complex scenarios. It is not tuned to assign foreground pixels to detected text lines. Thus, superior performance is achieved for the historical datasets for which no pixel hit accuracy of 95% is required. Remarkably, for the dataset of the ICDAR 2015 Competition on Text Line Detection in Historical Documents, the average cost per text line was reduced from 9.77 (winning team) to 8.19.

机译：从复杂的手写文档中提取文本行，尤其是对于历史收藏，仍然是一个尚未解决的问题。由于文本行提取是现代文本识别和关键字识别系统的关键预处理步骤，因此对可靠和健壮的方法提出了很高的要求。我们提出了一种无二值化的系统，该系统采用了基于所谓的“超像素”的最新开发的聚类方法。尽管过去开发了多种生成超像素的方法，但我们证明，即使是标准方法也可以产生令人印象深刻的结果。我们的聚类方法可以利用文本行的一般特征（例如曲线线性，行间间距，局部均匀性）适用于各种场景，而无需对其参数化进行调整。通过对8个不同的公认基准数据集进行相同的参数化，可获得最先进的结果。这些数据集涵盖了历史和现代文本以及具有不同分辨率和字体的图像。该系统是为检测复杂场景中的文本行而开发的。不调整为将前景像素分配给检测到的文本行。因此，对于不需要像素命中精度为95％的历史数据集，可以获得卓越的性能。值得注意的是，对于ICDAR 2015年历史文档中文本行检测竞赛的数据集，每条文本行的平均成本从9.77（获胜团队）降低到8.19。

著录项

来源
《IAPR International Conference on Document Analysis and Recognition》|2017年|236-241|共6页
会议地点
作者
Tobias Gruuening; Gundram Leifert; Tobias Strauss; Roger Labahn;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Robustness; Image segmentation; Standards; Text recognition; Clustering algorithms; State estimation;

机译：鲁棒性;图像分割;标准;文本识别;聚类算法;状态估计;

相似文献

外文文献
中文文献
专利

1. Skew detection for complex document images using robust borderlines in both text and non-text regions [J] . Hong Liu, Qi Wu, Hongbin Zha, Pattern recognition letters . 2008,第13期

机译：使用文本和非文本区域中的可靠边界线对复杂文档图像进行歪斜检测
2. A two-stage method for text line detection in historical documents [J] . Gruening Tobias, Leifert Gundram, Strauss Tobias, International Journal on Document Analysis and Recognition . 2019,第3期

机译：历史文档中文本行检测的两阶段方法
3. An Approach to the Estimation of Global and Local Text Skew in Historical Printed Documents [J] . Cedomir A. Maluckov, Darko Brodic, Zoran N. Milivojevic Journal of Control Engineering and Applied Informatics . 2015,第2期

机译：估计历史印刷文献中全局和局部文本偏斜的方法
4. A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents [C] . Tobias Gruuening, Gundram Leifert, Tobias Strauss, IAPR International Conference on Document Analysis and Recognition . 2017

机译：在历史文档中的文本线路检测的鲁棒和二值化方法
5. Automatic Detection of Section Title and Prose Text in HTML Documents Using Unsupervised and Supervised Learning [D] . Mysore Gopinath, Abhijith Athreya 2018

机译：使用无监督和有监督的学习自动检测HTML文档中的节标题和散文
6. Thematic clustering of text documents using an EM-based approach [O] . Sun Kim, W John Wilbur 2012

机译：使用基于EM的方法对文本文档进行主题聚类
7. A ROBUST BINARIZATION AND TEXT LINE DETECTION IN HISTORICAL HANDWRITTEN DOCUMENTS ANALYSIS [O] . Jakub Leszek Pach, Piotr Bilski 2016

机译：历史手写文档分析中的强大二值化和文本线路检测
8. Signal Detection in Fractional Gaussian Noise and an RKHS (Reproducing Kernel Hilbert Space) Approach to Robust Detection and Estimation [R] . Barton, R. J. 1989

机译：分数高斯噪声中的信号检测和RKHs（再生核Hilbert空间）鲁棒检测和估计方法

A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents

摘要

著录项

相似文献

相关主题

期刊订阅