首页> 外文会议>International Conference on Advanced Computing and Communication Systems >A Robust and Automated Approach for Multilingual Indian Document Indexing
【24h】

A Robust and Automated Approach for Multilingual Indian Document Indexing

机译:一种强大的自动化多语言印度文档索引方法

获取原文

摘要

Currently, several Indian government offices lack a robust software for searching words from the scanned multilingual Indian documents. Manually searching such documents is tedious and time-consuming. Moreover, there will be a large number of such documents to be searched for the desired contents. Thus, there is a pressing need for robust automatic search software for multilingual Indian aged documents, where there is no single robust Optical Character Recognition (OCR) system existing to recognize the complex Indian scripts. Towards this end, we propose to group the components belonging to a text line of a document with multiple orientations using a new geometrical approach and an extended profile feature extraction technique for character recognition of printed Indian documents. The performance of the proposed approach is evaluated on variety of Indian documents with English characters and Devanagari scripts. Experimental results suggests that the proposed approach generates the accurate index words for most of the document images used in this study. Moreover, the proposed technique saves both time and efforts compared with the manual indexing of document images.
机译:当前,几个印度政府机关缺乏用于从扫描的多语言印度文件中搜索单词的强大软件。手动搜索此类文档既繁琐又耗时。而且,将搜索大量这样的文档以寻找期望的内容。因此,迫切需要用于多语言印度陈年文档的健壮的自动搜索软件,其中不存在用于识别复杂的印度文字的单个健壮的光学字符识别(OCR)系统。为此,我们建议使用一种新的几何方法和一种扩展的轮廓特征提取技术,将属于多个方向的文档的文本行组成的组件进行分组,以对打印的印度文档进行字符识别。在各种带有英文字符和梵文的印度文档中,对所提方法的性能进行了评估。实验结果表明,所提出的方法可为本研究中使用的大多数文档图像生成准确的索引词。此外,与手动索引文档图像相比,该技术节省了时间和精力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号