Towards Document Image Quality Assessment: A Text Line Based Framework and a Synthetic Text Line Image Dataset

机译：迈向文档图像质量评估：基于文本行的框架和合成文本行图像数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Since the low quality of document images will greatly undermine the chances of success in automatic text recognition and analysis, it is necessary to assess the quality of document images uploaded in online business process, so as to reject those images of low quality. In this paper, we attempt to achieve document image quality assessment and our contributions are twofold. Firstly, since document image quality assessment is more interested in text, we propose a text line based framework to estimate document image quality, which is composed of three stages: text line detection, text line quality prediction, and overall quality assessment. Text line detection aims to find potential text lines with a detector. In the text line quality prediction stage, the quality score is computed for each text line with a CNN-based prediction model. The overall quality of document images is finally assessed with the ensemble of all text line quality. Secondly, to train the prediction model, a large-scale dataset, comprising 52,094 text line images, is synthesized with diverse attributes. For each text line image, a quality label is computed with a piecewise function. To demonstrate the effectiveness of the proposed framework, comprehensive experiments are evaluated on two popular document image quality assessment benchmarks. Our framework significantly outperforms the state-of-the-art methods by large margins on the large and complicated dataset.

机译：由于文档图像质量低下会大大破坏自动文本识别和分析的成功机会，因此有必要评估在线业务流程中上载的文档图像质量，以拒绝那些质量低下的图像。在本文中，我们尝试实现文档图像质量评估，并且我们的贡献是双重的。首先，由于文档图像质量评估对文本更感兴趣，因此我们提出了一种基于文本行的框架来估计文档图像质量，该框架包括三个阶段：文本行检测，文本行质量预测和总体质量评估。文本行检测旨在通过检测器查找潜在的文本行。在文本行质量预测阶段，使用基于CNN的预测模型为每个文本行计算质量得分。最后，以所有文本行质量为整体评估文档图像的整体质量。其次，为了训练预测模型，合成了包含52,094个文本行图像的大规模数据集，并具有多种属性。对于每个文本行图像，使用分段函数计算质量标签。为了证明所提出框架的有效性，在两个流行的文档图像质量评估基准上对综合实验进行了评估。在庞大而复杂的数据集上，我们的框架大大优于最新方法。

著录项

来源
《International Conference on Document Analysis and Recognition》|2019年|551-558|共8页
会议地点
作者
Hongyu Li; Fan Zhu; Junhua Qiu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Image quality; Detectors; Predictive models; Quality assessment; Feature extraction; Image recognition;

机译：图像质量;检测器;预测模型;质量评估;特征提取;图像识别;
入库时间 2022-08-26 14:34:50

相似文献

外文文献
中文文献
专利

1. Text/Non-Text Separation from Handwritten Document Images Using LBP Based Features: An Empirical Study [J] . Sourav Ghosh, Dibyadwati Lahiri, Showmik Bhowmik, Journal of Imaging . 2018,第4期

机译：使用基于LBP的功能从手写文档图像中分离文本/非文本的实证研究
2. A knowledge-based system for extracting text-lines from mixed and overlapping text/graphics compound document images [J] . Yen-Lin Chen, Zeng-Wei Hong, Cheng-Hung Chuang Expert systems with applications . 2012,第1期

机译：基于知识的系统，用于从混合和重叠的文本/图形复合文档图像中提取文本行
3. src="/images/tex/38447.gif" alt="{text{Tl}_2}{text{LiYCl}_6}"> ( src="/images/tex/38448.gif" alt="{text{Ce}^{3 + }}"> ): New Tl-based Elpasolite Scintillation Material [J] . H. J. Kim, Gul Rooh, H. Park, IEEE Transactions on Nuclear Science . 2016,第2期

机译： src =“ / images / tex / 38447.gif” alt =“ {text {Tl} _2} {text {LiYCl} _6}”> （ src =“ / images / tex / 38448.gif” alt =“ {text {Ce} ^ {3 +}}”> ）：新的基于Tl的Elpasolite闪烁材料
4. Towards Document Image Quality Assessment: A Text Line Based Framework and a Synthetic Text Line Image Dataset [C] . Hongyu Li, Fan Zhu, Junhua Qiu International Conference on Document Analysis and Recognition . 2019

机译：对文档图像质量评估：基于文本线的框架和合成文本线图像数据集
5. Evaluation of text-based and image-based representations for moving image documents. [D] . Goodrum, Abby Ann. 1997

机译：评估运动图像文档的基于文本和基于图像的表示形式。
6. Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images [O] . Asghar Ali Chandio, Md. Asikuzzaman, Mark Pickering, 2020

机译：草书文本：用于自然场景图像中端到端乌尔都语文本识别的综合数据集
7. PDF (40 K) View thumbnail images View full size images Add to my quick links Cited by E-mail article Save as citation alert Export citation + link Set up a citation RSS feed (Opens new window) Related Articles in ScienceDirect Contents of volume 154 Physics of The Earth and Planetary Interiors Close You are entitled to access the full text of this document Contents of volume 154 Physics of The Earth and Planetary Interiors, Volume 154, Issues 3-4, 16 March 2006, Pages 350-351 PDF (25 K) Special issue contents page Physics of The Earth and Planetary Interiors Close You are entitled to access the full text of this document Special issue contents page Physics of The Earth and Planetary Interiors, Volume 154, Issues 3-4, 16 March 2006, Page iv PDF (22 K) View More Related Articles Bookmark and share in 2collab (opens in new window) Request permission to reuse this article View Record in Scopus Cited By in Scopus (0) doi:10.1016/j.pepi.2005.12.002 How to Cite or Link Using DOI (Opens New Window) Copyright © 2006 Elsevier B.V. All rights reserved. Preface [O] . Lagroix France, Muxworthy Adrian, Hoffmann Viktor 2006

机译：PDF（40 K）查看缩略图查看全尺寸图像添加到我的快速链接被电子邮件引用引用另存为引用警报导出引用+链接设置引用RSS提要（打开新窗口）ScienceDirect中的相关文章第154卷的内容地球和行星内部物理学您有权访问本文档的全文。第154卷的内容2006年3月16日，第154卷，第3-4期，第154卷，第350-351页PDF（25 K）特刊内容页地球和行星内饰关闭您有权访问本文档的全文特别发行内容页面地球与行星内饰物理，第154卷，第3-4期，2006年3月16日，第iv PDF（22 K）查看更多相关文章在2collab中添加书签并共享（在新窗口中打开）请求重新使用本文的权限在Scopus中查看记录在Scopus中被引用（0）doi：10.1016 / j.pepi.2005.12.002如何使用DOI进行引用或链接（打开新窗口）版权所有©2006 Elsevier B .V。保留所有权利。前言

Towards Document Image Quality Assessment: A Text Line Based Framework and a Synthetic Text Line Image Dataset

摘要

著录项

相似文献

相关主题

期刊订阅