首页> 中文期刊>中文信息学报 >基于字素分割的蒙古文手写识别研究

基于字素分割的蒙古文手写识别研究

     

摘要

隐马尔科夫模型(HMM)对序列数据有很强的建模能力,在语音和手写识别中都得到了广泛的应用.利用HMM研究蒙古文手写识别,首先需要解决的问题是手写文字的序列化.从蒙古文的构词和书写特点看,蒙古文由多个字素从上到下串联构成.选择字素集合和词的字素分割是手写识别的基础,也是影响识别效果的关键因素.该文根据蒙古文音节和编码知识确定了蒙古文字母集合,共包括1171个字母.通过相关性处理、H M M排序筛选等手段得到长字素集合,共包括378个字素.对长字素经过人工分解,获得了50个短字素.最后利用两层映射给出了词转字素序列的算法.为了验证长短字素在手写识别中的效果,我们在HTK(hidden Markov model tool-kit)环境下利用小规模字库实现了手写识别系统,实验结果表明短字素比长字素有更好的性能.文中给出的字素集合和词转字素序列的算法为后续基于HMM的蒙古文手写识别研究奠定了基础.%Hidden Markov Models(HMM ) has strong modeling capabilities for sequence data,and it is widely used in speech recognition and handwriting recognition task.HMM-based Mongolian handwriting recognizers require the data to be analyzed sequentially.According to Mongolian word formation and writing style,it is evident that a Mon-golian word consists of grapheme seamless connected from top to down.The selection of grapheme and segmentation word to grapheme is a preliminary work for handwriting recognition with substantial effects on recognition accuracy. In this paper,according to knowledge of syllables and coding,we collect a Mongolian letters set of 1171 letters. The long grapheme set which contain 378 grapheme is then extracted from letters set by correlation process and HMM based sorting method.The short grapheme set which contain 50 shapes is extracted from long grapheme set via decompose long grapheme by hands.We present an algorithm to decompose a word to grapheme by two layers mapping.Experimental results show that the short grapheme get better performance than long grapheme.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号