...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A general framework for the statistical analysis of the sources of variance for classification error estimators
【24h】

A general framework for the statistical analysis of the sources of variance for classification error estimators

机译:分类误差估计量方差源统计分析的一般框架

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Estimating the prediction error of classifiers induced by supervised learning algorithms is important not only to predict its future error, but also to choose a classifier from a given set (model selection). If the goal is to estimate the prediction error of a particular classifier, the desired estimator should have low bias and low variance. However, if the goal is the model selection, in order to make fair comparisons the chosen estimator should have low variance assuming that the bias term is independent from the considered classifier. This paper follows the analysis proposed in [1] about the statistical properties of k-fold cross-validation estimators and extends it to the most popular error estimators: resubstitution, holdout, repeated holdout, simple bootstrap and 0.632 bootstrap estimators, without and with stratification. We present a general framework to analyze the decomposition of the variance of different error estimators considering the nature of the variance (irreducible/reducible variance) and the different sources of sensitivity (internal/external sensitivity). An extensive empirical study has been performed for the previously mentioned estimators with naive Bayes and C4.5 classifiers over training sets obtained from assorted probability distributions. The empirical analysis consists of decomposing the variances following the proposed framework and checking the independence assumption between the bias and the considered classifier. Based on the obtained results, we propose the most appropriate error estimations for model selection under different experimental conditions.
机译:估计由监督学习算法引起的分类器的预测误差不仅对预测其未来的误差很重要,而且对于从给定集合中选择分类器(模型选择)也很重要。如果目标是估计特定分类器的预测误差,则所需的估计器应具有低偏差和低方差。但是,如果目标是模型选择,则为了进行公平的比较,假设偏差项独立于所考虑的分类器,则所选估计量应具有低方差。本文遵循文献[1]中提出的关于k折交叉验证估计量统计特性的分析,并将其扩展到最受欢迎的误差估计量:重新替换,保留,重复保留,简单bootstrap和0.632 bootstrap估计量,无分层。考虑到方差的性质(不可约/可约方差)和敏感度的不同来源(内部/外部敏感度),我们提出了一个通用框架来分析不同误差估计量的方差分解。对于从朴素的贝叶斯和C4.5分类器对从各种概率分布获得的训练集上的上述估计量,已经进行了广泛的经验研究。实证分析包括按照提出的框架分解方差,并检查偏差和考虑的分类器之间的独立性假设。基于获得的结果,我们提出了在不同实验条件下用于模型选择的最合适的误差估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号