...
首页> 外文期刊>Journal of machine learning research >No Unbiased Estimator of the Variance of K-Fold Cross-Validation
【24h】

No Unbiased Estimator of the Variance of K-Fold Cross-Validation

机译:没有K折交叉验证方差的无偏估计

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the performance of different algorithms (in particular, their proposed algorithm). In order to be able to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates.This paper studies the very commonly used K-fold cross-validation estimator of generalization performance. The main theorem shows that there exists no universal (valid under all distributions) unbiased estimator of the variance of K-fold cross-validation. The analysis that accompanies this result is based on the eigen-decomposition of the covariance matrix of errors, which has only three different eigenvalues corresponding to three degrees of freedom of the matrix and three components of the total variance. This analysis helps to better understand the nature of the problem and how it can make naive estimators (that don't take into account the error correlations due to the overlap between training and test sets) grossly underestimate variance. This is confirmed by numerical experiments in which the three components of the variance are compared when the difficulty of the learning problem and the number of folds are varied. color="gray">
机译:大多数机器学习研究人员执行定量实验以估计泛化误差并比较不同算法(特别是他们提出的算法)的性能。为了能够得出统计上令人信服的结论,重要的是估计此类估计的不确定性。本文研究了通用性能的非常常用的K折交叉验证估计器。主定理表明,不存在K折交叉验证方差的通用(在所有分布下均有效)无偏估计量。伴随此结果进行的分析是基于误差协方差矩阵的本征分解,该协方差矩阵只有三个不同的特征值,分别对应于矩阵的三个自由度和总方差的三个分量。这种分析有助于更好地理解问题的性质以及它如何使幼稚的估计量(由于训练和测试集之间的重叠而没有考虑误差相关性)严重低估了方差。数值实验证实了这一点,其中,当学习问题的难度和倍数变化时,比较方差的三个分量。 color =“ gray”>

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号