首页> 外文会议>IAPR International Workshop on Document Analysis Systems >Improving Handwritten Chinese Text Recognition by Unsupervised Language Model Adaptation
【24h】

Improving Handwritten Chinese Text Recognition by Unsupervised Language Model Adaptation

机译:通过无监督语言模型适应改进手写中文文本识别

获取原文

摘要

This paper investigates the effects of unsupervised language model adaptation (LMA) in handwritten Chinese text recognition. For no prior information of recognition text is available, we use a two-pass recognition strategy. In the first pass, the generic language model (LM) is used to get a preliminary result, which is used to choose the best matched LMs from a set of pre-defined domains, then the matched LMs are used in the second pass recognition. Each LM is compressed to a moderate size via the entropy-based pruning, tree-structure formatting and fewer-byte quantization. We evaluated the LMA for five LM types, including both character-level and word-level ones. Experiments on the CASIA-HWDB database show that language model adaptation improves the performance for each LM type in all domains. The documents of ancient domain gained the biggest improvement of character-level correct rate of 5.87 percent up and accurate rate of 6.05 percent up.
机译:本文调查了无监督语言模型适应(LMA)在手写中文文本识别中的影响。 对于无法使用的识别文本的先前信息,我们使用双通识别策略。 在第一次通过中,通用语言模型(LM)用于获得初步结果,该初步结果用于从一组预定义域中选择最佳匹配的LMS,然后在第二传递识别中使用匹配的LMS。 每个LM通过基于熵的修剪,树木结构格式和更少的字节量化压缩到中等大小。 我们评估了LMA的五个LM类型,包括字符级和字级别。 CASIA-HWDB数据库的实验表明,语言模型适配可提高所有域中每个LM类型的性能。 古代领域的文件获得了性质级别的最大提高了5.87%,准确率为6.05%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号