首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >Handwritten Chinese Text Recognition by Integrating Multiple Contexts
【24h】

Handwritten Chinese Text Recognition by Integrating Multiple Contexts

机译:整合多种语境的手写中文文本识别

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency and, meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature.
机译:本文提出了一种有效的方法来离线识别不受约束的手写中文文本。在具有字符过度分割的一般集成的分割和识别框架下,我们研究了三个重要问题:候选路径评估,路径搜索和参数估计。对于路径评估,我们将贝叶斯决策视图中的多个上下文(字符识别分数,几何和语言上下文)组合在一起,并通过置信度转换将分类器输出转换为后验概率。在路径搜索中,我们使用改进的波束搜索算法来提高搜索效率,同时使用候选字符增强策略来提高识别精度。通过使用最大字符精度标准进行监督学习,可以优化路径评估功能的组合权重。我们在中文手写数据库CASIA-HWDB上评估了识别性能,该数据库包含将近400万个字符样本,包含7356个类别和5,091页无限制的手写文本。实验结果表明,置信度转换和组合多个上下文可以显着提高文本行的识别性能。在1,015个手写页面的测试集上,所提出的方法实现了90.75%的字符级准确率和91.39%的正确率,远远优于文献报道的最佳结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号