A Robust and Automated Approach for Multilingual Indian Document Indexing

机译：一种强大的自动化多语言印度文档索引方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Currently, several Indian government offices lack a robust software for searching words from the scanned multilingual Indian documents. Manually searching such documents is tedious and time-consuming. Moreover, there will be a large number of such documents to be searched for the desired contents. Thus, there is a pressing need for robust automatic search software for multilingual Indian aged documents, where there is no single robust Optical Character Recognition (OCR) system existing to recognize the complex Indian scripts. Towards this end, we propose to group the components belonging to a text line of a document with multiple orientations using a new geometrical approach and an extended profile feature extraction technique for character recognition of printed Indian documents. The performance of the proposed approach is evaluated on variety of Indian documents with English characters and Devanagari scripts. Experimental results suggests that the proposed approach generates the accurate index words for most of the document images used in this study. Moreover, the proposed technique saves both time and efforts compared with the manual indexing of document images.

机译：当前，几个印度政府机关缺乏用于从扫描的多语言印度文件中搜索单词的强大软件。手动搜索此类文档既繁琐又耗时。而且，将搜索大量这样的文档以寻找期望的内容。因此，迫切需要用于多语言印度陈年文档的健壮的自动搜索软件，其中不存在用于识别复杂的印度文字的单个健壮的光学字符识别（OCR）系统。为此，我们建议使用一种新的几何方法和一种扩展的轮廓特征提取技术，将属于多个方向的文档的文本行组成的组件进行分组，以对打印的印度文档进行字符识别。在各种带有英文字符和梵文的印度文档中，对所提方法的性能进行了评估。实验结果表明，所提出的方法可为本研究中使用的大多数文档图像生成准确的索引词。此外，与手动索引文档图像相比，该技术节省了时间和精力。

著录项

来源
《International Conference on Advanced Computing and Communication Systems》|2019年|457-462|共6页
会议地点
作者
Parnika Paranjape; Nitesh Funde; Mayank Thakur; Meera Dhabu; Parag Deshpande;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Optical character recognition software; Character recognition; Image segmentation; Indexing; Feature extraction; Communication systems; Computer science;

机译：光学字符识别软件;字符识别;图像分割;索引;特征提取;通信系统;计算机科学;

相似文献

外文文献
中文文献
专利

1. A Latent Semantic Indexing-based approach to multilingual document clustering [J] . Chih-Ping Wei, Christopher C. Yang, Chia-Min Lin Decision support systems . 2008,第3期

机译：基于潜在语义索引的多语言文档聚类方法
2. An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents [J] . Chung-Hsin Lin, Hsinchun Chen IEEE transactions on systems, man, and cybernetics. Part B . 1996,第1期

机译：一种自动索引和神经网络的多语言（汉英）文档概念检索和分类方法
3. Robust Character Segmentation and Recognition Schemes for Multilingual Indian Document Images [J] . Sahare Parul, Dhok Sanjay B. IETE Technical Review . 2019,第2期

机译：多语言印度文档图像的鲁棒字符分割和识别方案
4. A Robust and Automated Approach for Multilingual Indian Document Indexing [C] . Parnika Paranjape, Nitesh Funde, Mayank Thakur, International Conference on Advanced Computing and Communication Systems . 2019

机译：多语种印度文档索引的强大和自动化方法
5. Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing. [D] . Csomai, Andras. 2008

机译：薄雾中的关键字：自动提取非常大的文档并在书后建立索引的关键字。
6. A Robust and Affordable Table Indexing Approach for Multi-isocenter Dosimetrically Matched Fields [O] . Amy Yu, Benjamin Fahimian, Lynn Million, -1

机译：多等中心剂量匹配字段的稳健且负担得起的表索引方法
7. A Latent Semantic Indexing-based approach to multilingual document clustering [O] . Chih-ping Wei A, Christopher C. Yang, Chia-min Lin 2007

机译：基于潜在语义索引的多语言文档聚类方法

A Robust and Automated Approach for Multilingual Indian Document Indexing

摘要

著录项

相似文献

相关主题

期刊订阅