A general framework for the statistical analysis of the sources of variance for classification error estimators

Rodríguez J.D.; Pérez A.; Lozano J.A.

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A general framework for the statistical analysis of the sources of variance for classification error estimators

【24h】

A general framework for the statistical analysis of the sources of variance for classification error estimators

机译：分类误差估计量方差源统计分析的一般框架

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Estimating the prediction error of classifiers induced by supervised learning algorithms is important not only to predict its future error, but also to choose a classifier from a given set (model selection). If the goal is to estimate the prediction error of a particular classifier, the desired estimator should have low bias and low variance. However, if the goal is the model selection, in order to make fair comparisons the chosen estimator should have low variance assuming that the bias term is independent from the considered classifier. This paper follows the analysis proposed in [1] about the statistical properties of k-fold cross-validation estimators and extends it to the most popular error estimators: resubstitution, holdout, repeated holdout, simple bootstrap and 0.632 bootstrap estimators, without and with stratification. We present a general framework to analyze the decomposition of the variance of different error estimators considering the nature of the variance (irreducible/reducible variance) and the different sources of sensitivity (internal/external sensitivity). An extensive empirical study has been performed for the previously mentioned estimators with naive Bayes and C4.5 classifiers over training sets obtained from assorted probability distributions. The empirical analysis consists of decomposing the variances following the proposed framework and checking the independence assumption between the bias and the considered classifier. Based on the obtained results, we propose the most appropriate error estimations for model selection under different experimental conditions.

机译：估计由监督学习算法引起的分类器的预测误差不仅对预测其未来的误差很重要，而且对于从给定集合中选择分类器（模型选择）也很重要。如果目标是估计特定分类器的预测误差，则所需的估计器应具有低偏差和低方差。但是，如果目标是模型选择，则为了进行公平的比较，假设偏差项独立于所考虑的分类器，则所选估计量应具有低方差。本文遵循文献[1]中提出的关于k折交叉验证估计量统计特性的分析，并将其扩展到最受欢迎的误差估计量：重新替换，保留，重复保留，简单bootstrap和0.632 bootstrap估计量，无分层。考虑到方差的性质（不可约/可约方差）和敏感度的不同来源（内部/外部敏感度），我们提出了一个通用框架来分析不同误差估计量的方差分解。对于从朴素的贝叶斯和C4.5分类器对从各种概率分布获得的训练集上的上述估计量，已经进行了广泛的经验研究。实证分析包括按照提出的框架分解方差，并检查偏差和考虑的分类器之间的独立性假设。基于获得的结果，我们提出了在不同实验条件下用于模型选择的最合适的误差估计。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2013年第3期|共10页
作者
Rodríguez J.D.; Pérez A.; Lozano J.A.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Error estimation; Model selection; Prediction error; Sensitivity analysis; Sources of variance; Supervised classification;

机译：误差估计;模型选择;预测误差;灵敏度分析;方差来源;监督分类;

相似文献

外文文献
中文文献
专利

1. A general framework for the statistical analysis of the sources of variance for classification error estimators [J] . Rodríguez J.D., Pérez A., Lozano J.A. Pattern Recognition: The Journal of the Pattern Recognition Society . 2013,第3期

机译：分类误差估计量方差源统计分析的一般框架
2. MEAN SQUARED ERRORS OF BOOTSTRAP VARIANCE ESTIMATORS FOR U-STATISTICS [J] . Masayuki MIZUNO, Yoshihiko MAESONO Bulletin of Informatics and Cybernetics . 2011,第Null期

机译：U统计量的自举方差估计的均方误差
3. Performance analysis of distributed source parameter estimator (DSPE) in the presence of modeling errors due to the spatial distributions of sources [J] . Xiong Wenmeng, Picheral Jose, Marcos Sylvie Signal processing . 2018,第FEBa期

机译：由于源的空间分布而存在建模误差的情况下，分布式源参数估计器（DSPE）的性能分析
4. Bias And Variance Analysis Of The Maximum Likelihood Estimators Of Quadrature Receiver Gain Errors [C] . Green, R.A., Pierre, . 2001

机译：正交接收器增益误差的最大似然估计器的偏差和方差分析
5. Borrowing information across genes and experiments for improved error variance estimation in microarray data analysis and statistical inferences for gene expression heterosis. [D] . Ji, Tieming. 2012

机译：跨基因借阅信息和进行实验，以改进微阵列数据分析中的误差方差估计以及基因表达杂种优势的统计推断。
6. Comparison of Test Statistics of Nonnormal and Unbalanced Samples for Multivariate Analysis of Variance in terms of Type-I Error Rates [O] . Can Ateş, Özlem Kaymaz, H. Emre Kale, 2019

机译：根据I型错误率进行方差多元分析的非正态和不平衡样本检验统计量的比较
7. Estimators of error variance obtained by pooling sums of squares for factorial effects in the analysis of experiments using two-level orthogonal arrays (Approximations to the Statistical Distributions) [O] . 大野洋平, 篠崎信雄 2003

机译：通过使用两层正交数组对实验分析中的因式效应平方和求和而获得的误差方差估计量（统计分布的近似值）
8. Analysis of open-loop conical scan pointing error and variance estimators [R] . Alvarez, L. S. 1993

机译：开环锥形扫描指向误差和方差估计的分析

A general framework for the statistical analysis of the sources of variance for classification error estimators

摘要

著录项

相似文献

相关主题

期刊订阅