...
首页> 外文期刊>Malaysian Journal of Computer Science >A Robust Script Identification System for Historical Indian Document Images
【24h】

A Robust Script Identification System for Historical Indian Document Images

机译:强大的印度历史文献图像脚本识别系统

获取原文

摘要

Automatic script identification in archives of documents is essential for searching a specific document in order to choose an appropriate Optical Character Recognizer (OCR) for recognition. Besides, identification of one of the oldest historical documents such as Indus scripts is challenging and interesting because of inter script similarities. In this work, we propose a new robust script identification system for Indian scripts that includes Indus documents and other scripts, namely, English, Kannada, Tamil, Telugu, Hindi and Gujarati which helps in selecting an appropriate OCR for recognition. The proposed system explores the spatial relationship between dominant points,namely, intersection points, end points and junction points of the connected components in the documents to extract the structure of the components. The degree of similarity between the scripts is studied by computing the variances of the proximity matrices of dominant points of the respective scripts. The method is evaluated on 700 scanned document images. Experimentalresults show that the proposed system outperforms the existing methods in terms of classification rate.
机译:文档档案中的自动脚本识别对于搜索特定文档以选择合适的光学字符识别器(OCR)进行识别至关重要。此外,由于脚本之间的相似性,识别印度梧桐等最古老的历史文献之一具有挑战性和趣味性。在这项工作中,我们为印度文字提出了一个新的健壮的文字识别系统,其中包括印度语文档和其他文字,例如英语,卡纳达语,泰米尔语,泰卢固语,北印度语和古吉拉特语,这有助于选择合适的OCR进行识别。所提出的系统探索了文档中所连接的组件的主要点(即交点,端点和交点)之间的空间关系,以提取组件的结构。通过计算各个脚本的优势点的邻近矩阵的方差来研究脚本之间的相似度。在700个扫描的文档图像上评估该方法。实验结果表明,提出的系统在分类​​率上优于现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号