Script and Language Identification in Noisy and Degraded Document Images

Shijian Lu; Lim Tan Chew

首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Script and Language Identification in Noisy and Degraded Document Images

【24h】

Script and Language Identification in Noisy and Degraded Document Images

机译：嘈杂和降级的文档图像中的脚本和语言识别

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper reports an identification technique that detects scripts and languages of noisy and degraded document images. In the proposed technique, scripts and languages are identified through the document vectorization, which converts each document image into a document vector that characterizes the shape and frequency of the conta ned character or word images. Document images are vectorized by using vertical component cuts and character extremum points, which are both tolerant to the variation in text fonts and styles, noise, and various types of document degradation. For each script or language under study, a script or language template is first constructed through a training process. Scripts and languages of document images are then determined according to the distances between converted document vectors and the pre-constructed script and language templates. Experimental results show that the proposed technique is accurate, easy for extension, and tolerant to noise and various types of document degradation.

机译：本文报告了一种识别技术，该技术可检测嘈杂和降级的文档图像的脚本和语言。在提出的技术中，通过文档矢量化来识别脚本和语言，该文档矢量化将每个文档图像转换为表征受污染的字符或文字图像的形状和频率的文档矢量。通过使用垂直分量剪切和字符极值点对文档图像进行矢量化处理，这两个方面都可以忍受文本字体和样式的变化，噪声以及各种类型的文档退化。对于每种正在研究的脚本或语言，首先通过培训过程构建脚本或语言模板。然后，根据转换后的文档向量与预先构建的脚本和语言模板之间的距离来确定文档图像的脚本和语言。实验结果表明，所提出的技术准确，易于扩展，耐噪声和各种类型的文档退化。

著录项

来源
《IEEE Transactions on Pattern Analysis and Machine Intelligence》 |2008年第1期|p.14-24|共11页
作者
Shijian Lu; Lim Tan Chew;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Document analysis; association rules; classification; clustering; language identification; script identification; shape;

机译：文档分析;关联规则;分类;聚类;语言识别;文字识别;形状;

相似文献

外文文献
中文文献
专利

1. Script and language identification for handwritten document images [J] . Judith Hochberg, Kevin Bowers, Michael Cannon International Journal on Document Analysis and Recognition . 1999,第2a3期

机译：手写文档图像的脚本和语言识别
2. Identification of scripts and orientations of degraded document images [J] . Shijian Lu, Linlin Li, Chew Lim Tan Pattern Analysis and Applications . 2010,第4期

机译：识别脚本和退化文档图像的方向
3. Script Segmentation of Printed Devnagari and Bangla Languages Document Images OCR [J] . International Journal of Computer Science and Technology . 2011,第2期

机译：印刷的天语和孟加拉语言文档图像OCR的脚本分割
4. Script and Language Identification in Degraded and Distorted Document Images [C] . Shijian Lu, Chew Lim Tan National Conference on Artificial Intelligence(AAAI-06);Innovative Applications of Artificial Intelligence Conference(IAAI-06) . 2006

机译：退化和失真的文档图像中的脚本和语言识别
5. Handwriting identification, matching, and indexing in noisy document images. [D] . Zheng, Yefeng. 2005

机译：在嘈杂的文档图像中进行手写识别，匹配和索引。
6. Natural Language Processing Versus Content-Based Image Analysis for Medical Document Retrieval [O] . Aurélie Névéol, Thomas M. Deserno, Stéfan J. Darmoni, -1

机译：自然语言处理与基于内容的图像分析在医学文献检索中的应用
7. Script and Language Identification for Handwritten Document Images [O] . Judith Hochberg, Kevin Bowers, Michael Cannon, 1998

机译：手写文档图像的脚本和语言识别
8. Handwriting Identification, Matching, and Indexing in Noisy Document Images [R] . Zheng, Y. 2006

机译：嘈杂文档图像中的手写识别，匹配和索引

Script and Language Identification in Noisy and Degraded Document Images

摘要

著录项

相似文献

相关主题

期刊订阅