A Robust Script Identification System for Historical Indian Document Images

S. Kavitha; P. Shivakumara; G. Hemantha Kumar; C. L. Tan

首页> 外文期刊>Malaysian Journal of Computer Science >A Robust Script Identification System for Historical Indian Document Images

【24h】

A Robust Script Identification System for Historical Indian Document Images

机译：强大的印度历史文献图像脚本识别系统

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic script identification in archives of documents is essential for searching a specific document in order to choose an appropriate Optical Character Recognizer (OCR) for recognition. Besides, identification of one of the oldest historical documents such as Indus scripts is challenging and interesting because of inter script similarities. In this work, we propose a new robust script identification system for Indian scripts that includes Indus documents and other scripts, namely, English, Kannada, Tamil, Telugu, Hindi and Gujarati which helps in selecting an appropriate OCR for recognition. The proposed system explores the spatial relationship between dominant points,namely, intersection points, end points and junction points of the connected components in the documents to extract the structure of the components. The degree of similarity between the scripts is studied by computing the variances of the proximity matrices of dominant points of the respective scripts. The method is evaluated on 700 scanned document images. Experimentalresults show that the proposed system outperforms the existing methods in terms of classification rate.

机译：文档档案中的自动脚本识别对于搜索特定文档以选择合适的光学字符识别器（OCR）进行识别至关重要。此外，由于脚本之间的相似性，识别印度梧桐等最古老的历史文献之一具有挑战性和趣味性。在这项工作中，我们为印度文字提出了一个新的健壮的文字识别系统，其中包括印度语文档和其他文字，例如英语，卡纳达语，泰米尔语，泰卢固语，北印度语和古吉拉特语，这有助于选择合适的OCR进行识别。所提出的系统探索了文档中所连接的组件的主要点（即交点，端点和交点）之间的空间关系，以提取组件的结构。通过计算各个脚本的优势点的邻近矩阵的方差来研究脚本之间的相似度。在700个扫描的文档图像上评估该方法。实验结果表明，提出的系统在分类率上优于现有方法。

著录项

来源
《Malaysian Journal of Computer Science 》 |2015年第4期| 共页
作者
S. Kavitha; P. Shivakumara; G. Hemantha Kumar; C. L. Tan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类情报学、情报工作 ;
关键词

相似文献

外文文献
中文文献
专利

1. AUTOMATIC LINE-LEVEL SCRIPT IDENTIFICATION FROM HANDWRITTEN DOCUMENT IMAGES - A REGION-WISE CLASSIFICATION FRAMEWORK FOR INDIAN SUBCONTINENT [J] . Sk Md Obaidullah, Chayan Halder, K. C. Santosh, Malaysian Journal of Computer Science . 2018 ,第1期

机译：手写文档图像的自动行级脚本识别-印度次大陆的区域明智分类框架
2. Script Identification from Printed Indian Document Images and Performance Evaluation Using Different Classifiers [J] . Sk MdObaidullah, AnamikaMondal, NibaranDas, Applied computational intelligence and soft computing . 2014 ,第1期

机译：从印刷的印度文档图像中识别脚本并使用不同的分类器进行性能评估
3. Script Identification from Printed Indian Document Images and Performance Evaluation Using Different Classifiers [J] . Sk Md Obaidullah, Anamika Mondal, Nibaran Das, Applied computational intelligence and soft computing . 2014 ,第期

机译：从印刷的印度文档图像中识别脚本并使用不同的分类器评估性能
4. HVS Inspired System for Script Identification in Indian Multi-script Documents [C] . Peeta Basa Pati, A.G. Ramakrishnan International Workshop on Document Analysis Systems . 2006

机译：HVS灵感系统在印度多脚本文档中的脚本识别系统
5. Visual Information Retrieval from Historical Document Images =La recherche d’information visuelle à partir d’images de documents historiques [D] . Zhalehpour, Sara. 2018

机译：从历史文档检索的视觉信息检索=搜索历史文档的视觉信息
6. Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition [O] . Hubert Michalak, Krzysztof Okarma 2020

机译：非均匀照明文档图像的鲁棒组合二值化方法用于字母数字字符识别
7. AUTOMATIC LINE-LEVEL SCRIPT IDENTIFICATION FROM HANDWRITTEN DOCUMENT IMAGES - A REGION-WISE CLASSIFICATION FRAMEWORK FOR INDIAN SUBCONTINENT [O] . Sk Md Obaidullah, Chayan Halder, K. C. Santosh, 2018

机译：手写文档图像的自动线路级脚本识别 - 印度次大陆的一个区域明智的分类框架
8. Automatic script identification from images using cluster-based templates [R] . Hochberg, J. , Kerns, L. , Kelly, P. , 1995

机译：使用基于群集的模板从图像中自动识别脚本

A Robust Script Identification System for Historical Indian Document Images

摘要

著录项

相似文献

相关主题

期刊订阅