首页> 外文期刊>Neural computation >Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
【24h】

Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms

机译:监督分类学习算法比较的近似统计检验

获取原文
获取原文并翻译 | 示例
       

摘要

This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These test sare compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paired-differences t test based on taking several random train-test splits. A third test, a paired-differences t test based on 10-fold cross-validation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar's test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold cross-validation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist) of these tests. The cross-validated t test is the most powerful. The 5×2 cv test is shown to be slightly more powerful than McNemar's test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, Mc-Nemar's test is the only test with acceptable type I error. For algorithms that can be executed 10 times, the 5 × 2 cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.
机译:本文回顾了五种近似统计测试,以确定一种学习算法在特定学习任务上是否优于另一种学习算法。对这些测试进行实验比较,以确定在不存在差异(I型错误)时错误地检测出差异的可能性。在某些情况下,两个被广泛使用的统计检验显示出I型错误的可能性很高,因此永远不应该使用:两个比例差异的检验和基于几次随机训练检验分裂的成对差异t检验。第三次检验是基于10倍交叉验证的成对差异t检验,显示出I型错误的可能性有所提高。第四项测试是McNemar的测试,显示出低I型错误。第五项测试是基于双重交叉验证的五次迭代的新测试5×2 cv。实验表明,该测试也具有可接受的I型错误。本文还测量了这些测试的功能(在存在算法差异时能够检测出它们的能力)。交叉验证的t检验最有效。显示5×2 cv测试比McNemar的测试要强大一些。最佳测试的选择取决于运行学习算法的计算成本。对于只能执行一次的算法,Mc-Nemar的测试是唯一具有可接受的I类错误的测试。对于可以执行10次的算法,建议使用5×2 cv测试,因为它功能更强大,并且由于可以选择训练集而直接测量变化。

著录项

  • 来源
    《Neural computation》 |1998年第7期|1895-1923|共29页
  • 作者

    Dietterich T;

  • 作者单位

    Department of Computer Science, Oregon State University, Corvallis, OR 97331, U.S.A.;

  • 收录信息 美国《科学引文索引》(SCI);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-18 02:12:33

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号