【24h】

Cold Case: the Lost MNIST Digits

机译:冷案:丢失的mnist数字

获取原文

摘要

Although the popular MNIST dataset [LeCun et al., 1994] is derived from the NIST database [Grother and Hanaoka, 1995], the precise processing steps for this derivation have been lost to time. We propose a reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy. We trace each MNIST digit to its NIST source and its rich metadata such as writer identifier, partition identifier, etc. We also reconstruct the complete MNIST test set with 60,000 samples instead of the usual 10,000. Since the balance 50,000 were never distributed, they can be used to investigate the impact of twenty-five years of MNIST experiments on the reported testing performances. Our limited results unambiguously confirm the trends observed by Recht et al. [2018, 2019]: although the misclassification rates are slightly off, classifier ordering and model selection remain broadly reliable. We attribute this phenomenon to the pairing benefits of comparing classifiers on the same digits.
机译:虽然受欢迎的Mnist DataSet [Lecun等,1994]来自NIST数据库[Grother和Hanaoka,1995],这一衍生的精确处理步骤已经丢失了时间。我们提出了一种重建,其准确到足以作为MNIST数据集的替代,具有微不足道的准确性。我们将每个MNIST数字跟踪到其NIST源以及其丰富的元数据,如Writer标识符,分区标识符等。我们还重建了具有60,000个样本的完整Mnist测试集,而不是通常的10,000。由于50,000余额从未分发,它们可用于调查二十五年MNIST实验对报告的检测表演的影响。我们有限的结果明确确认了Recht等人所观察到的趋势。 [2018,2019]:虽然错误分类率略微偏离,分类器订购和模型选择仍然广泛可靠。我们将这种现象归因于比较相同数字上的分类器的配对益处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号