Model Similarity Mitigates Test Set Overuse

机译：模型相似性减轻测试集过度使用

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Excessive reuse of test data has become commonplace in today's machine learning workflows. Popular benchmarks, competitions, industrial scale tuning, among other applications, all involve test data reuse beyond guidance by statistical confidence bounds. Nonetheless, recent replication studies give evidence that popular benchmarks continue to support progress despite years of extensive reuse. We proffer a new explanation for the apparent longevity of test data: Many proposed models are similar in their predictions and we prove that this similarity mitigates overfitting. Specifically, we show empirically that models proposed for the ImageNet ILSVRC benchmark agree in their predictions well beyond what we can conclude from their accuracy levels alone. Likewise, models created by large scale hyperparameter search enjoy high levels of similarity. Motivated by these empirical observations, we give a non-asymptotic generalization bound that takes similarity into account, leading to meaningful confidence bounds in practical settings.

机译：在当今的机器学习工作流程中，过度重复使用测试数据已变得普遍。在其他应用中，流行的基准测试，竞争，工业规模调整，所有这些应用程序都涉及通过统计置信度限制来测试数据重用超出指导。尽管如此，最近的复制研究证明了尽管多年来广泛的重用，但流行的基准继续支持进步。我们为测试数据的明显寿命提供了新的解释：许多拟议的模型在他们的预测中类似，我们证明了这种相似性减轻过度装备。具体而言，我们凭经验展示了针对Imagenet ILSVRC基准的模型在他们的预测中同意，远远超出了我们可以单独从其准确度得出的。同样，由大规模的覆盖物搜索创建的模型享有高水平的相似性。通过这些经验观察，我们给出了一个非渐近概括的界限，这些界限考虑了相似，导致实际设置中有意义的置信界限。

著录项

来源
《Conference on Neural Information Processing Systems》|2020年|p9554-10354|共10页
会议地点
作者
Horia Mania; John Miller; Ludwig Schmidt; Moritz Hardt; Benjamin Recht;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计量学;
关键词

相似文献

外文文献
中文文献
专利

1. Similarity Measures for New Hybrid Models: mF Sets and mF Soft Sets [J] . Muhammad Akram, Neha Waseem Punjab University Journal of Mathematics . 2019,第6期

机译：新混合模型的相似度量：mF集和mF软集
2. BURIAL (version 1.0): a method for testing genetic similarity within small groups of individuals using fragmentary data sets [J] . Birgitt Schonfisch, Jurgen Tomiuk, Lutz Bachmann, Molecular ecology notes . 2001,第3期

机译：BURIAL（1.0版）：使用片段数据集测试一小群人中遗传相似性的方法
3. QSAR Models for Predicting the Similarity in Binding Profiles for Pairs of Protein Kinases and the Variation of Models between Experimental Data Sets [J] . Sheridan Robert P., Nam Kiyean, Maiorov Vladimir N., Journal of chemical information and modeling . 2009,第8期

机译：QSAR模型，用于预测蛋白质激酶对结合谱中的相似性以及实验数据集之间的模型差异
4. Model Similarity Mitigates Test Set Overuse [C] . Horia Mania, John Miller, Ludwig Schmidt, Conference on Neural Information Processing Systems . 2020

机译：模型相似性减轻测试集过度使用
5. Hidden Markov models for simultaneous testing of multiple gene sets and adaptive and dynamic adaptive procedures for false discovery rate control and estimation. [D] . Liang, Kun. 2010

机译：用于同时测试多个基因集的隐马尔可夫模型以及用于错误发现率控制和估计的自适应和动态自适应程序。
6. Corrigendum: Factors Affecting User Acceptance in Overuse of Smartphones in Mobile Health Services: An Empirical Study Testing a Modified Integrated Model in South Korea [O] . Seo-Joon Lee, Mun Joo Choi, Mi Jung Rho, 2019

机译：勘误：影响移动医疗服务中智能手机过度使用的用户接受度的因素：一项在韩国测试改进的集成模型的实证研究
7. Wind-sand tunnel testing of surface-mounted obstacles: Similarity requirements and a case study on a Sand Mitigation Measure [O] . Lorenzo Raffaele, Jeroen van Beeck, Luca Bruno 2021

机译：地形障碍的风砂隧道试验：相似性要求及砂缓解措施的案例研究
8. Similarity analysis applied to the design of scaled tests of hydraulic mitigation methods for Tank 241-SY-101 [R] . Liljegren, L M 1993

机译：相似性分析应用于坦克241-sY-101液压缓解方法的比例测试设计

Model Similarity Mitigates Test Set Overuse

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅