首页> 外文OA文献 >A hybrid post-processing system for handwritten chinese character recognition
【2h】

A hybrid post-processing system for handwritten chinese character recognition

机译:手写汉字识别的混合后处理系统

摘要

In this paper, a hybrid post-processing system for improving the performance of Handwritten Chinese Character Recognition is presented. In order to remove two kinds of frequently encountered errors in the recognition result, namely mis-recognized character and unrecognized character, both confusing character characteristics of the recognizer and the contextual linguistic information are utilized in our hybrid three-stage post-processing system. In the first stage, the confusing character set and a statistical Noisy-Channel model are employed to identify the most promising candidate character and append possible unrecognized similar-shaped characters into candidate character set when a candidate sequence is given. Secondly, dictionary-based approximate word matching is conducted to further append contextual linguistic-prone characters into candidate character set and bind the candidate characters into a word-lattice. Finally, a Chinese word Bi-Gram Markov model is employed in the third stage to identify a most promising sentence by selecting plausible words from the word-lattice. On the average, our system achieves a 5.1% recognition rate improvement for the first candidate when the original character recognition rate is 90% for the first candidate and 95% for the top-10 candidates by an online HCCR engine.
机译:本文提出了一种用于提高手写汉字识别性能的混合后处理系统。为了消除识别结果中经常遇到的两种错误,即错误识别的字符和无法识别的字符,在我们的混合三阶段后处理系统中,同时使用了识别器的混乱字符特征和上下文语言信息。在第一阶段,使用混淆字符集和统计噪声通道模型来识别最有前途的候选字符,并在给出候选序列时将可能无法识别的相似形状字符附加到候选字符集中。其次,进行基于字典的近似词匹配,以进一步将上下文易发语言的字符附加到候选字符集中,并将候选字符绑定到词格中。最后,在第三阶段使用中文单词Bi-Gram Markov模型通过从单词格中选择合理的单词来识别最有希望的句子。平均而言,通过在线HCCR引擎,当第一位候选人的原始字符识别率为90%,前十位候选人的原始字符识别率为95%时,我们的系统对第一位候选人的识别率提高了5.1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号