首页> 外文期刊>ACM transactions on Asian language information processing >Word-Wise Thai and Roman Script Identification
【24h】

Word-Wise Thai and Roman Script Identification

机译:明智的泰文和罗马文字识别

获取原文
获取原文并翻译 | 示例
       

摘要

In some Thai documents, a single text line of a printed document page may contain words of both Thai and Roman scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and Roman script portions and then to use individual OCR systems of the respective scripts on these identified portions. In this article, an SVM-based method is proposed for identification of word-wise printed Roman and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of a character group combining different character features obtained from structural shape, profile behavior, component overlapping information, topological properties, and water reservoir concept, etc. Based on the experiment on 10,000 data (words) we obtained 99.62% script identification accuracy from the proposed scheme.
机译:在某些泰文文档中,打印文档页面的单个文本行可能包含泰文和罗马文字。对于这种文档页面的光学字符识别(OCR),最好先识别泰语和罗马文字部分,然后在这些识别的部分上使用相应文字的各个OCR系统。本文中,提出了一种基于SVM的方法,用于从文档页面的单行中识别逐字打印的罗马和泰语脚本。在这里,首先,将文档分为几行,然后将行分为字符组(单词)。在提出的方案中,我们确定了一个字符组的脚本,该脚本组合了从结构形状,轮廓行为,组件重叠信息,拓扑属性和水库概念等获得的不同字符特征。基于10,000个数据(单词)的实验该方案获得了99.62%的脚本识别准确率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号