Language Identification in Degraded and Distorted Document Images

机译：劣化和扭曲文档图像中的语言识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, character segmentation errors, and slight skew distortions. For each language studied, a word shape template and a word frequency template are firstly constructed based on the proposed word shape coding scheme. Identification is then accomplished based on Bray Curtis or Hamming distance between the word shape code of query images and the constructed word shape and frequency templates. Experiments show the average identification rate upon eight Latin-based languages reaches over 99%. ...

机译：本文介绍了一种语言识别技术，可以在劣化和失真的文档图像中区分拉丁语语言。不同于报告的方法通过字符形状编码过程转换字图像，我们的方法直接捕获与本地极值点和水平交叉口的字形状，这既容忍噪声，字符分割错误和轻微偏斜畸变。对于所研究的每种语言，首先基于所提出的字形编码方案来构造字形模板和单词频率模板。然后基于查询图像的单词形状代码与构造的字形和频率模板之间的Bray Curtis或汉明距离来完成识别。实验显示八个拉丁语语言的平均识别率超过99％。 ......

著录项

来源
《International Workshop on Document Analysis Systems》|2006年||共11页
会议地点
作者
Shijian Lu; Chew Lim Tan; Weihua Huang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Script and Language Identification in Noisy and Degraded Document Images [J] . Shijian Lu, Lim Tan Chew IEEE Transactions on Pattern Analysis and Machine Intelligence . 2008,第1期

机译：嘈杂和降级的文档图像中的脚本和语言识别
2. Language Identification in Document Images [J] . Barlas P., Hebert D., Chatelain C., Journal of Imaging Science and Technology . 2016,第1期

机译：文档图像中的语言识别
3. Language identification for handwritten document images using a shape codebook [J] . Zhu GY, Yu XD, Li Y, Pattern Recognition: The Journal of the Pattern Recognition Society . 2009,第12期

机译：使用形状码本识别手写文档图像的语言
4. Script and Language Identification in Degraded and Distorted Document Images [C] . Shijian Lu, Chew Lim Tan National Conference on Artificial Intelligence(AAAI-06);Innovative Applications of Artificial Intelligence Conference(IAAI-06) . 2006

机译：退化和失真的文档图像中的脚本和语言识别
5. Effective and efficient binarization of degraded document images. [D] . Parker, Jon Ivan. 2016

机译：对退化的文档图像进行有效和高效的二值化。
6. Natural Language Processing Versus Content-Based Image Analysis for Medical Document Retrieval [O] . Aurélie Névéol, Thomas M. Deserno, Stéfan J. Darmoni, -1

机译：自然语言处理与基于内容的图像分析在医学文献检索中的应用
7. Language Identification in Degraded and Distorted Document Images [O] . Shijian Lu, Chew Lim Tan, Weihua Huang 2006

机译：劣化和扭曲文档图像中的语言识别
8. Handwriting Identification, Matching, and Indexing in Noisy Document Images [R] . Zheng, Y. 2006

机译：嘈杂文档图像中的手写识别，匹配和索引

Language Identification in Degraded and Distorted Document Images

摘要

著录项

相似文献

相关主题

期刊订阅