Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms

Dietterich T

首页> 外文期刊>Neural computation >Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms

【24h】

Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms

机译：监督分类学习算法比较的近似统计检验

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These test sare compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paired-differences t test based on taking several random train-test splits. A third test, a paired-differences t test based on 10-fold cross-validation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar's test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold cross-validation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist) of these tests. The cross-validated t test is the most powerful. The 5×2 cv test is shown to be slightly more powerful than McNemar's test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, Mc-Nemar's test is the only test with acceptable type I error. For algorithms that can be executed 10 times, the 5 × 2 cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.

机译：本文回顾了五种近似统计测试，以确定一种学习算法在特定学习任务上是否优于另一种学习算法。对这些测试进行实验比较，以确定在不存在差异（I型错误）时错误地检测出差异的可能性。在某些情况下，两个被广泛使用的统计检验显示出I型错误的可能性很高，因此永远不应该使用：两个比例差异的检验和基于几次随机训练检验分裂的成对差异t检验。第三次检验是基于10倍交叉验证的成对差异t检验，显示出I型错误的可能性有所提高。第四项测试是McNemar的测试，显示出低I型错误。第五项测试是基于双重交叉验证的五次迭代的新测试5×2 cv。实验表明，该测试也具有可接受的I型错误。本文还测量了这些测试的功能（在存在算法差异时能够检测出它们的能力）。交叉验证的t检验最有效。显示5×2 cv测试比McNemar的测试要强大一些。最佳测试的选择取决于运行学习算法的计算成本。对于只能执行一次的算法，Mc-Nemar的测试是唯一具有可接受的I类错误的测试。对于可以执行10次的算法，建议使用5×2 cv测试，因为它功能更强大，并且由于可以选择训练集而直接测量变化。

著录项

来源
《Neural computation》 |1998年第7期|1895-1923|共29页
作者
Dietterich T;
展开▼
作者单位

Department of Computer Science, Oregon State University, Corvallis, OR 97331, U.S.A.;

展开▼
收录信息美国《科学引文索引》(SCI);美国《化学文摘》(CA);
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-18 02:12:33

相似文献

外文文献
中文文献
专利

1. Blocked 3×2 Cross-Validated t-Test for Comparing Supervised Classification Learning Algorithms [J] . Wang Yu, Wang Ruibo, Jia Huichen, Neural computation . 2014,第1期

机译：块3×2交叉验证t检验，用于比较监督分类学习算法
2. Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms [J] . Alpaydm E Neural computation . 1999,第8期

机译：组合5×2 cv F检验比较监督分类学习算法
3. Combined 5×2 cv F Test for Comparing Supervised Classification Learning Algorithms [J] . Ethem alpaydin Neural computation . 1999,第8期

机译：组合5×2 cv F检验比较监督分类学习算法
4. Multivariate Statistical Tests for Comparing Classification Algorithms [C] . Olcay Taner Yildiz, OEzlem Asian, Ethem Alpaydin Learning and intelligent optimization . 2011

机译：用于比较分类算法的多元统计检验
5. A hybrid intrusion detection system with traffic classification using supervised learning algorithms. [D] . Albalawi, Umar Abdalah. 2013

机译：一种使用监督学习算法进行流量分类的混合入侵检测系统。
6. Lung nodule malignancy classification using only radiologist-quantified image features as inputs to statistical learning algorithms: probing the Lung Image Database Consortium dataset with two statistical learning methods [O] . Matthew C. Hancock, Jerry F. Magnan 2016

机译：仅使用放射科医生量化的图像特征作为统计学习算法的输入的肺结节恶性分类：使用两种统计学习方法探查肺图像数据库联盟数据集
7. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms [O] . Thomas G. Dietterich 1998

机译：用于比较监督分类学习算法的近似统计检验

Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅