首页> 外文会议>Electronic Imaging Science and Technology Symposium >A word language model based contextual language processing on Chinese character recognition
【24h】

A word language model based contextual language processing on Chinese character recognition

机译:基于语言模型的汉字识别上下文语言处理

获取原文

摘要

The language model design and implementation issue is researched in this paper. Different from previous research, we want to emphasize the importance of n-gram models based on words in the study of language model. We build up a word based language model using the toolkit of SRILM and implement it for contextual language processing on Chinese documents. A modified Absolute Discount smoothing algorithm is proposed to reduce the perplexity of the language model. The word based language model improves the performance of post-processing of online handwritten character recognition system compared with the character based language model, but it also increases computation and storage cost greatly. Besides quantizing the model data non-uniformly, we design a new tree storage structure to compress the model size, which leads to an increase in searching efficiency as well. We illustrate the set of approaches on a test corpus of recognition results of online handwritten Chinese characters, and propose a modified confidence measure for recognition candidate characters to get their accurate posterior probabilities while reducing the complexity. The weighted combination of linguistic knowledge and candidate confidence information proves successful in this paper and can be further developed to achieve improvements in recognition accuracy.
机译:本文研究了语言模型设计和实现问题。与以前的研究不同,我们希望根据语言模型研究中的单词强调n-gram模型的重要性。我们使用SRILM的工具包构建基于Word的语言模型,并实现了中文文档的上下文语言处理。提出了一种修改的绝对折扣平滑算法,以减少语言模型的困惑。基于词的语言模型可以提高与基于角色的语言模型相比的线手写字符识别系统的性能的性能,但它也大大提高了计算和存储成本。除了不均匀地量化模型数据外,我们设计了一种新的树存储结构,以压缩模型大小,这也导致搜索效率的增加。我们说明了在线手写汉字的识别结果的测试语料库中的一组方法,并提出了一个修改的识别候选人字符的置信度量,以获得其准确的后验概率,同时降低复杂性。语言知识和候选人信心信息的加权组合证明了本文成功,可以进一步发展以实现认可准确性的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号