首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >HMM-Based Lexicon-Driven and Lexicon-Free Word Recognition for Online Handwritten Indic Scripts
【24h】

HMM-Based Lexicon-Driven and Lexicon-Free Word Recognition for Online Handwritten Indic Scripts

机译:基于HMM的在线手写印度文字的词库驱动和无词库单词识别

获取原文
获取原文并翻译 | 示例
       

摘要

Research for recognizing online handwritten words in Indic scripts is at its early stages when compared to Latin and Oriental scripts. In this paper, we address this problem specifically for two major Indic scriptsȁ4;Devanagari and Tamil. In contrast to previous approaches, the techniques we propose are largely data driven and script independent. We propose two different techniques for word recognition based on Hidden Markov Models (HMM): lexicon driven and lexicon free. The lexicon-driven technique models each word in the lexicon as a sequence of symbol HMMs according to a standard symbol writing order derived from the phonetic representation. The lexicon-free technique uses a novel Bag-of-Symbols representation of the handwritten word that is independent of symbol order and allows rapid pruning of the lexicon. On handwritten Devanagari word samples featuring both standard and nonstandard symbol writing orders, a combination of lexicon-driven and lexicon-free recognizers significantly outperforms either of them used in isolation. In contrast, most Tamil word samples feature the standard symbol order, and the lexicon-driven recognizer outperforms the lexicon free one as well as their combination. The best recognition accuracies obtained for 20,000 word lexicons are 87.13 percent for Devanagari when the two recognizers are combined, and 91.8 percent for Tamil using the lexicon-driven technique.
机译:与拉丁文和东方文相比,在印度文中识别在线手写单词的研究还处于早期阶段。在本文中,我们专门针对两个主要的印度文字ȁ4(Devanagari和Tamil)解决了这个问题。与以前的方法相比,我们提出的技术主要是数据驱动的和脚本独立的。我们提出了两种基于隐马尔可夫模型(HMM)的单词识别技术:词典驱动和免费词典。词典驱动技术根据从语音表示中导出的标准符号书写顺序,将词典中的每个单词建模为符号HMM的序列。无词典技术使用手写单词的新型“符号袋”表示形式,该表示形式与符号顺序无关,并且可以快速修剪词典。在具有标准和非标准符号书写顺序的手写梵文单词样本上,词典驱动的识别器和无词典的识别器的组合明显优于单独使用的两种识别器。相比之下,大多数泰米尔语单词样本均具有标准符号顺序,并且词典驱动的识别器及其组合的性能优于词典免费的识别器。当结合两个识别器时,对20,000个单词的词典,最佳识别准确率是梵文的87.13%,泰米尔语的词典驱动的最佳识别准确率是91.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号