...
首页> 外文期刊>International Journal of Image Processing >Script Identification of Text Words from a Tri-Lingual Document Using Voting Technique
【24h】

Script Identification of Text Words from a Tri-Lingual Document Using Voting Technique

机译:使用投票技术从三语种文档中识别文字单词的脚本

获取原文
   

获取外文期刊封面封底 >>

       

摘要

In a multi script environment, majority of the documents may contain text information printed in more than one script/language forms. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this context, this paper proposes to develop a model to identify and separate text words of Kannada, Hindi and English scripts from a printed tri-lingual document. The proposed method is trained to learn thoroughly the distinct features of each script. The binary tree classifier is used to classify the input text image. Experimentation conducted involved 1500 text words for learning and 1200 text words for testing. Extensive experimentation has been carried out on both manually created data set and scanned data set. The results are very encouraging and prove the efficacy of the proposed model. The average success rate is found to be 99% for manually created data set and 98.5% for data set constructed from scanned document images.
机译:在多脚本环境中,大多数文档可能包含以多种脚本/语言形式打印的文本信息。为了通过光学字符识别(OCR)自动处理此类文档,有必要识别文档的不同脚本区域。在这种情况下,本文提议建立一个模型,以从印刷的三语文档中识别和分离卡纳达语,北印度语和英语文字的文字。对提出的方法进行了培训,以彻底学习每个脚本的独特功能。二叉树分类器用于对输入文本图像进行分类。进行的实验涉及1500个学习单词和1200个测试单词。在手动创建的数据集和扫描的数据集上都进行了广泛的实验。结果非常令人鼓舞,并证明了所提出模型的有效性。手动创建的数据集的平均成功率为99%,从扫描的文档图像构建的数据集的平均成功率为98.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号