Script Identification of Text Words from a Tri-Lingual Document Using Voting Technique

M C Padma; P. A. Vijaya

首页> 外文期刊>International Journal of Image Processing >Script Identification of Text Words from a Tri-Lingual Document Using Voting Technique

【24h】

Script Identification of Text Words from a Tri-Lingual Document Using Voting Technique

机译：使用投票技术从三语种文档中识别文字单词的脚本

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In a multi script environment, majority of the documents may contain text information printed in more than one script/language forms. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this context, this paper proposes to develop a model to identify and separate text words of Kannada, Hindi and English scripts from a printed tri-lingual document. The proposed method is trained to learn thoroughly the distinct features of each script. The binary tree classifier is used to classify the input text image. Experimentation conducted involved 1500 text words for learning and 1200 text words for testing. Extensive experimentation has been carried out on both manually created data set and scanned data set. The results are very encouraging and prove the efficacy of the proposed model. The average success rate is found to be 99% for manually created data set and 98.5% for data set constructed from scanned document images.

机译：在多脚本环境中，大多数文档可能包含以多种脚本/语言形式打印的文本信息。为了通过光学字符识别（OCR）自动处理此类文档，有必要识别文档的不同脚本区域。在这种情况下，本文提议建立一个模型，以从印刷的三语文档中识别和分离卡纳达语，北印度语和英语文字的文字。对提出的方法进行了培训，以彻底学习每个脚本的独特功能。二叉树分类器用于对输入文本图像进行分类。进行的实验涉及1500个学习单词和1200个测试单词。在手动创建的数据集和扫描的数据集上都进行了广泛的实验。结果非常令人鼓舞，并证明了所提出模型的有效性。手动创建的数据集的平均成功率为99％，从扫描的文档图像构建的数据集的平均成功率为98.5％。

著录项

来源
《International Journal of Image Processing》 |2010年第1期|共页
作者
M C Padma; P. A. Vijaya;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Word level identification of Kannada, Hindi and English scripts from a tri-lingual document [J] . M.C. Padma, P.A. Vijaya International journal of computational vision and robotics . 2010,第2期

机译：从三语文档中识别卡纳达语，北印度语和英语文字的单词级别识别
2. Word-Level Multi-Script Indic Document Image Dataset and Baseline Results on Script Identification [J] . Chayan Halder, Nibaran Das, Kaushik Roy, International journal of computer vision and iImage processing . 2017,第2期

机译：Word级多脚本指示文档图像数据集和脚本识别的基准结果
3. Technique for Conversion of Text Document into Grade 2 Braille Script [J] . G. Gayathri Devi International Journal of Applied Engineering Research . 2018,第11aPta2期

机译：将文本文档转换为2年级盲文脚本的技术
4. Text line script identification for a tri-lingual document [C] . Aithal Prakash K., Rajesh G., Acharya Dinesh U., 2010 Second International Conference on Computing Communication and Networking Technologies . 2010

机译：三语种文档的文本行脚本标识
5. Document image analysis techniques for handwritten text segmentation, document image rectification and digital collation. [D] . Salvi, Dhaval. 2014

机译：用于手写文本分割，文档图像校正和数字整理的文档图像分析技术。
6. BoB a best-of-breed automated text de-identification system for VHA clinical documents [O] . Oscar Ferrández, Brett R South, Shuying Shen, -1

机译：BoB用于VHA临床文档的同类最佳自动文本去识别系统
7. Segmentation of Text Lines and Characters in Ancient Tamil Script Documents using Computational Intelligence Techniques [O] . N. Sridevi, P. Subashini Phd 2013

机译：使用计算智能技术对古代泰米尔文字文档中的文本行和字符进行分割
8. Pictures from Words, Pictures from Text: Constructing Pictorial Representations of Meaning from Text [R] . Cowie, J., Helmreich, S., Dang, H. H. 2009

机译：词语中的图片，文本中的图片：从文本构建意义的图像表征

Script Identification of Text Words from a Tri-Lingual Document Using Voting Technique

摘要

著录项

相似文献

相关主题

期刊订阅