首页> 美国政府科技报告 >Cross Validation Comparison of NIST OCR Databases.
【24h】

Cross Validation Comparison of NIST OCR Databases.

机译:NIsT OCR数据库的交叉验证比较。

获取原文

摘要

The quality of reference databases for Optical Character Recognition (OCR) is vital to the meaningful assessment of classification algorithms. The National Institute of Standards and Technology (NIST) has produced two databases of segmented handprinted characters obtained from socially distinct writer populations. Two approaches to the comparison of the databases are described. The first uses the eigenvalue spectrum of the covariance matrix as an a priori measure of the variance intrinsic to the data. The second cross validates the datasets using classification error to quantify the difficulty of OCR. The eigenvalue spectra from the training partitions of the datasets are generated during the production of the Karhunen Loeve (KL) Transforms, the leading components of which are used as prototype features for a classifier. The eigenspectra are used to quantify diversity of the character sets and the Bhattacharrya distance is used to measure class separability. The results for digits suggest that the second NIST database (used nominally for testing) is significantly harder than the first (training) set; the testing images are 11 percent more variable.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号