...
首页> 外文期刊>International journal of computational vision and robotics >Word level identification of Kannada, Hindi and English scripts from a tri-lingual document
【24h】

Word level identification of Kannada, Hindi and English scripts from a tri-lingual document

机译:从三语文档中识别卡纳达语,北印度语和英语文字的单词级别识别

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In a multi script environment, majority of the documents may contain text information printed in more than one script/language forms. For automatic processing of such documents through optical character recognition (OCR), it is necessary to identify different script regions of the document. With this context, this paper proposes to develop a model to identify and separate text words of Kannada, Hindi and English scripts from a printed tri-lingual document. The proposed method is trained to learn thoroughly the distinct features of each script. The binary tree classifier is used to classify the input text image. Experiments were conducted on manually created document images of size 600 × 600 pixels. The results are very encouraging and prove the efficacy of the proposed model. The average success rate is found to be 98.8% for manually created dataset and 98.5% for dataset constructed from scanned document images.
机译:在多脚本环境中,大多数文档可能包含以多种脚本/语言形式打印的文本信息。为了通过光学字符识别(OCR)自动处理此类文档,有必要识别文档的不同脚本区域。在此背景下,本文提出了一种模型,用于从印刷的三语文档中识别和分离卡纳达语,北印度语和英语文字的文字。对提出的方法进行了培训,以全面学习每个脚本的独特功能。二叉树分类器用于对输入文本图像进行分类。对手动创建的尺寸为600×600像素的文档图像进行了实验。结果非常令人鼓舞,并证明了所提出模型的有效性。手动创建的数据集的平均成功率为98.8%,从扫描文档图像构建的数据集的平均成功率为98.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号