首页> 外文期刊>Bioinformatics >Prediction error estimation: a comparison of resampling methods
【24h】

Prediction error estimation: a comparison of resampling methods

机译:预测误差估计:重采样方法的比较

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the 'true' prediction error of a prediction model in the presence of feature selection.Results: For small studies where features are selected from thousands of candidates, the resubstitution and simple split-sample estimates are seriously biased. In these small samples, leave-one-out cross-validation (LOOCV), 10-fold cross-validation (CV) and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean square error. The .632+ bootstrap is quite biased in small sample sizes with strong signal-to-noise ratios. Differences in performance among resampling methods are reduced as the number of specimens available increase.Contact: annette.molinaro@yale.eduSupplementary Information: A complete compilation of results and R code for simulations and analyses are available in Molinaro et al. (2005) (http://linus.nci.nih.gov/brb/TechReport.htm).
机译:动机:在基因组研究中,在相对较少的样本中收集了数千个特征。这些研究的目标之一是建立分类器,以预测未来观察的结果。此过程包含三个固有步骤:特征选择,模型选择和预测评估。以预测评估为重点,我们比较了在特征选择存在的情况下估算预测模型“真实”预测误差的几种方法。结果:对于从数千个候选对象中选择特征的小型研究而言,重新替换和简单拆分样本估计严重偏倚。在这些小样本中,留一法交叉验证(LOOCV),10倍交叉验证(CV)和.632+引导程序对角判别分析,最近邻和分类树的偏差最小。对于线性判别分析,LOOCV和10倍CV具有最小的偏差。此外,LOOCV,5倍和10倍CV以及.632+自举具有最低的均方误差。 .632+引导程序在具有强信噪比的小样本量中有很大偏差。随着可用样本数量的增加,重采样方法之间的性能差异减小。联系人:annette.molinaro@yale.edu补充信息:Molinaro等人提供了完整的结果以及用于模拟和分析的R代码。 (2005)(http://linus.nci.nih.gov/brb/TechReport.htm)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号