Cross-validation comparison of NIST OCR databases

机译：NIST OCR数据库的交叉验证比较

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Abstract: lity of reference databases for optical character recognition is vital to the meaningful assessment of classification algorithms. NIST has produced two databases of segmented handprinted characters obtained from socially distinct writer populations. Two approaches to the comparison of the databases are described. The first uses the eigenvalue spectrum of the covariance matrix as an a priori measure of the variance intrinsic to the data. The second cross validates the datasets using classification error to quantify the difficulty of OCR. The eigenvalue spectra from the training partitions of the datasets are generated during the production of the Karhunen Loeve Transforms, the leading components of which are used as prototype features for a classifier. The eignespectra are used to quantify diversity of the character sets and the Bhattacharrya distance is used to measure class separability. The digits, uppers and lowers from the two populations of 500 writers are partitioned into N disjoint sets. The KL transforms of each such set are used for testing, while the remaining N-1 sets form the training prototypes for a PNN nearest neighbor classifier. Recognition error rates and their variances are calculated over the N partitions for both databases independently. This quantifies intra-database diversity. The inter-database results, or `cross' terms, obtained by training and testing on different databases, indicate the generality of the training set. The results for digits suggest that the second NIST database (used nominally for testing) is significantly harder than the first (training) set; the testing images are 11% more variant. The NIST training data classifies partitions of itself with 1.7% error, and the test set with 6.8% error. Conversely the test set generalizes to both itself and the training data with 3.5% error. This effect has also ben reported using non-NIST classifiers. !13

机译：摘要：大量用于光学字符识别的参考数据库对于有意义的分类算法评估至关重要。 NIST已创建了两个数据库，这些数据库是从社会上不同的作家群体中获得的分段手印字符的数据库。描述了两种比较数据库的方法。第一种将协方差矩阵的特征值谱用作数据固有方差的先验度量。第二个交叉使用分类误差验证了数据集，以量化OCR的难度。来自数据集训练分区的特征值谱是在产生Karhunen Loeve变换的过程中生成的，其主要成分用作分类器的原型特征。 eignespectra用于量化字符集的多样性，而Bhattacharrya距离用于度量类的可分离性。来自500个作家的两个总体的数字，上下位被分为N个不相交的集合。每个这样的集合的KL变换用于测试，而其余的N-1个集合形成PNN最近邻分类器的训练原型。分别针对两个数据库在N个分区上计算识别错误率及其方差。这量化了数据库内的多样性。通过在不同数据库上进行培训和测试而获得的数据库间结果或“交叉”术语表明了培训集的普遍性。数字结果表明，第二个NIST数据库（名义上用于测试）比第一个（训练）集难得多。测试图片的变体多了11％。 NIST训练数据对自己的分区进行分类，错误率为1.7％，对测试集进行分类的错误率为6.8％。相反，测试集可同时推广到自身和训练数据，误差为3.5％。还使用非NIST分类器报告了这种效果。！13

著录项

来源
《Character Recognition Technologies》|1993年|p.296-307|共12页
会议地点 San Jose CA(US)
作者
Patrick J. Grother; National Institute of Standards; Technology; Gaithersburg; MD; USA.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. The UWO contribution to the NIST aerodynamic database for wind loads on low buildings: Part 2. Comparison of data with wind load provisions [J] . L. M. St. Pierre, G. A. Kopp, D. Surry, Journal of Wind Engineering and Industrial Aerodynamics: The Journal of the International Association for Wind Engineering . 2005,第1期

机译：UWO对NIST空气动力学数据库（低建筑物风荷载）的贡献：第2部分。数据与风荷载规定的比较
2. Printed Text Image Database for Sindhi OCR [J] . DIL NAWAZ HAKRO, ABDULLAH ZAWAWI TALIB ACM transactions on Asian language information processing . 2016,第4期

机译：信德OCR的印刷文本图像数据库
3. OCR for printed Kannada text to Machine editable format using Database approach [J] . B. M. SAGAR, SHOBHA G., RAMAKANTH KUMAR P. WSEAS Transactions on Computers . 2008,第6期

机译：使用数据库方法将打印的卡纳达语文本转换为机器可编辑格式的OCR
4. Cross-validation comparison of NIST OCR databases [C] . Patrick J. Grother Conference on character recognition technologies . 1993

机译：NIST OCR数据库的交叉验证比较
5. Comparison of NIST and wavelet transform test point selection methods for a programmable gain amplifier [D] . Zhang, Xinsong 2008

机译：可编程增益放大器的NIST和小波变换测试点选择方法的比较
6. NIST/Sandia/ICDD Electron Diffraction Database: A Database for Phase Identification by Electron Diffraction [O] . M. J. Carr, W. F. Chambers, D. Melgaard, 1989

机译：NIST / Sandia / ICDD电子衍射数据库：通过电子衍射进行相识别的数据库
7. Cross validation comparison of NIST OCR databases [O] . 1993

机译：NIST OCR数据库的交叉验证比较
8. Cross Validation Comparison of NIST OCR Databases. [R] . Grother, P. J. 1993

机译：NIsT OCR数据库的交叉验证比较。

Cross-validation comparison of NIST OCR databases

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅