首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >A Comprehensive Analysis of Misclassified Handwritten Chinese Character Samples by Incorporating Human Recognition
【24h】

A Comprehensive Analysis of Misclassified Handwritten Chinese Character Samples by Incorporating Human Recognition

机译:综合分析人类认可错误分类的手写汉字样本

获取原文

摘要

The development of convolutional neural networks (CNN) has led to revolutionary progress in the resolution of the offline handwritten Chinese character recognition (HCCR) problem. As the recognition rate on a standard offline HCCR testbed is outstanding, a few samples that remain misclassified have kindled our interest. In this paper, with the help of human recognition results, we present a comprehensive analysis of the samples misclassified by a state-of-the-art CNN model. We performed the analysis based on the top-1-votes, which are obtained from the statistical analysis of human recognition results, and derived the following conclusions: (1) the majority of samples with high top-1-votes were mis-labeled. Besides, by comparing the results of human recognition with that of CNN, some limitations of CNN that provide scope for further improvement are presented; (2) in the samples with medium top- 1-votes, it is shown that the samples with different confidence level have different characteristics. Specifically, some samples could be regarded as multi-label samples; (3) the samples with low top-1- votes are either wrongly written or written extensively in cursive style, which are difficult to match their given ground-truths; (4)the relationship between writing styles and misclassifications are also introduced in the paper. We believe this work should provide some insights and brings new clues on designing new classification methods to deal with these challenging samples.
机译:卷积神经网络(CNN)的发展导致了革命性的汉字识别(HCCR)问题的解决方案。随着标准离线的识别率HCCR测试率未突出,仍然错误分类的一些样本有用点燃了我们的兴趣。在本文中,在人为识别结果的帮助下,我们对由最先进的CNN模型进行错误分类的样品综合分析。我们基于前1票进行了分析,这些投票是从人类识别结果的统计分析中获得的,并得出以下结论:(1)大多数具有高前1票的样品被错误标记。此外,通过将人类识别结果与CNN的结果进行比较,提出了为进一步改进提供范围的CNN的一些限制; (2)在具有中等顶部1-投票的样品中,显示具有不同置信水平的样品具有不同的特性。具体地,一些样品可以被视为多标签样本; (3)具有低顶级投票的样品是错误地写入或广泛写入的草书风格,这很难与其给定的地面真理相匹配; (4)本文还介绍了写作风格和错误分类之间的关系。我们认为这项工作应提供一些见解,并为设计新的分类方法提供新的线索来处理这些挑战性样本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号