...
首页> 外文期刊>Machine Learning >Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation
【24h】

Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation

机译:引导超出样本的预测以进行有效而准确的交叉验证

获取原文
获取原文并翻译 | 示例
           

摘要

Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV's main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation (Varma and Simon in BMC Bioinform 7(1):91, 2006) and a method by Tibshirani and Tibshirani (Ann Appl Stat 822-829, 2009), BBC-CV is computationally more efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based statistical criterion we stop training of models on new folds of inferior (with high probability) configurations. We name the method Bootstrap Bias Corrected with Dropping CV (BBCD-CV) that is both efficient and provides accurate performance estimates.
机译:通常,交叉验证(CV)和样本外性能估计协议通常用于(a)选择算法和超参数值(称为配置)的最佳组合以产生最终预测结果模型,以及(b)估算最终模型的预测效果。但是,最佳配置的交叉验证性能存在乐观偏见。我们提出了一种有效的自举校正方法,用于校正偏差,称为自举偏置校正CV(BBC-CV)。 BBC-CV的主要思想是在每个配置的样本外预测中引导选择最佳性能配置的整个过程,而无需额外的模型训练。与替代方法(即嵌套交叉验证)(Barma Bioinform 7(1):91,2006中的Varma和Simon)以及Tibshirani和Tibshirani的方法(Ann Appl Stat 822-829,2009)相比,BBC-CV为计算效率更高,方差和偏差更小,并且适用于任何性能指标(准确性,AUC,一致性指标,均方误差)。随后,我们再次采用引导样本外预测的想法来加快CV流程。具体来说,使用基于引导程序的统计标准,我们将停止训练新的劣等(高概率)构型折叠模型。我们将方法命名为Bootstrap Bias Corrected with Droping CV(BBCD-CV),它既有效又能提供准确的性能估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号