首页> 外文会议>Conference on character recognition technologies >Effectiveness of feature and classifier algorithms in character recognition systems
【24h】

Effectiveness of feature and classifier algorithms in character recognition systems

机译:特征和分类算法在字符识别系统中的有效性

获取原文

摘要

first Census Optical Character Recognition Systems Conference, NIST generated accuracy data for more than character recognition systems. Most systems were tested on the recognition of isolated digits and upper and lower case alphabetic characters. The recognition experiments were performed on sample sizes of 58,000 digits, and 12,000 upper and lower case alphabetic characters. The algorithms used by the 26 conference participants included rule-based methods, image-based methods, statistical methods, and neural networks. The neural network methods included Multi-Layer Perceptron's, Learned Vector Quantitization, Neocognitrons, and cascaded neural networks. In this paper 11 different systems are compared using correlations between the answers of different systems, comparing the decrease in error rate as a function of confidence of recognition, and comparing the writer dependence of recognition. This comparison shows that methods that used different algorithms for feature extraction and recognition performed with very high levels of correlation. This is true for neural network systems, hybrid systems, and statistically based systems, and leads to the conclusion that neural networks have not yet demonstrated a clear superiority to more conventional statistical methods. Comparison of these results with the models of Vapnick (for estimation problems), MacKay (for Bayesian statistical models), Moody (for effective parameterization), and Boltzmann models (for information content) demonstrate that as the limits of training data variance are approached, all classifier systems have similar statistical properties. The limiting condition can only be approached for sufficiently rich feature sets because the accuracy limit is controlled by the available information content of the training set, which must pass through the feature extraction process prior to classification.
机译:第一个人口普查光学字符识别系统会议,NIST生成了超过字符识别系统的准确性数据。大多数系统都在识别隔离数字和大写和小写字母字符上进行了测试。识别实验是对58,000位数的样本尺寸和12,000个大写字母字符进行的。 26会议参与者使用的算法包括基于规则的方法,基于图像的方法,统计方法和神经网络。神经网络方法包括多层Perceptron,学习的矢量定量,新oCognitrons和级联神经网络。在本文中,使用不同系统的答案之间的相关性,将误差率的减小与识别符合的函数进行比较,以及比较识别的作者依赖性之间的误差率降低。该比较表明,使用具有非常高的相关性的特征提取和识别的不同算法的方法。这对于神经网络系统,混合系统和基于统计的系统来说是如此,并导致神经网络尚未对更传统的统计方法进行明确的优越性的结论。这些结果比较了这些结果与VAPnick(用于估计问题),Mackay(对于贝叶斯统计模型),穆迪(用于有效参数化)和Boltzmann模型(用于信息内容)的模型表明,随着训练数据方差的限制,所有分类系统都具有类似的统计属性。限制条件只能接近足够丰富的特征集,因为精度限制由训练集的可用信息内容控制,这必须通过分类之前通过特征提取处理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号