首页> 外文会议>2009 IEEE International Conference on Signal and Image Processing Applications >Autonomously normalized horizontal differentials as features for HMM-based Omni font-written OCR systems for cursively scripted languages
【24h】

Autonomously normalized horizontal differentials as features for HMM-based Omni font-written OCR systems for cursively scripted languages

机译:自主标准化的水平差作为草书语言的基于HMM的Omni字体编写的OCR系统的功能

获取原文
获取原文并翻译 | 示例

摘要

Automatic font-written Optical Character Recognition (OCR) is highly desirable for numerous modern information technology (IT) applications. Reliable font-written OCR''s for Latin scripts are readily in use since long. For cursively scripted languages, that are the mother tongues of over one fourth of the world population, such OCR''s are however not available at a robust and reliable performance. In this regard, the main challenge is the mandatory connectivity of characters/ligatures (i.e. graphemes) that has to be resolved simultaneously upon the recognition of these graphemes. Among the various approaches tried over decades, Hidden Markov Models (HMM)-based OCR''s seem to be the most promising as they capitalize on the ability of HMM decoders to achieve segmentation and recognition simultaneously similar to the widely used HMM-based automatic speech recognition (ASR). Unlike ASR''s, what is missing in HMM-based OCR''s is the definition of a rigorously founded features vector capable to robustly achieving minimal “font type/size-independent” (omnifont) word error rates comparable to those realized with Latin scripts. Here comes the contribution of this paper that introduces such a sound features vector design, and experimentally shows its superiority in this regard.
机译:自动字体写的光学字符识别(OCR)是众多现代信息技术(IT)应用程序中非常需要的。长期以来,可靠的用于拉丁文字的字体书写OCR一直在使用中。对于草书脚本语言来说,它们是世界四分之一以上人口的母语,但是,此类OCR并不能以强大而可靠的性能提供。在这方面,主要挑战是字符/连字(即字素)的强制性连接,这些连接必须在识别这些字素时同时解决。在数十年来尝试的各种方法中,基于隐马尔可夫模型(HMM)的OCR似乎最有前途,因为它们利用了HMM解码器同时实现分段和识别的能力,类似于广泛使用的基于HMM的自动语音识别(ASR)。与ASR不同,基于HMM的OCR中缺少的是严格建立的特征向量的定义,该特征向量能够可靠地实现最小的“字体类型/大小无关”(全字体)字错误率,与之相比,拉丁文字。本文介绍了这种声音特征矢量设计,并通过实验证明了其在这方面的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号