首页> 外文期刊>Statistical Methodology >Asymptotics of cross-validated risk estimation in estimator selection and performance assessment
【24h】

Asymptotics of cross-validated risk estimation in estimator selection and performance assessment

机译:交叉验证风险估计在估计器选择和绩效评估中的渐近性

获取原文
获取原文并翻译 | 示例

摘要

Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold cross-validation, Monte Carlo cross-validation, and bootstrap procedures. For estimator selection, finite sample risk bounds are derived and applied to establish the asymptotic optimality of cross-validation, in the sense that a selector based on a cross-validated risk estimator performs asymptotically as well as an optimal oracle selector based on the risk under the true, unknown data generating distribution. The asymptotic results are derived under the assumption that the size of the validation sets converges to infinity and hence do not cover leave-one-out cross-validation. For performance assessment, cross-validated risk estimators are shown to be consistent and asymptotically linear for the risk under the true data generating distribution and confidence intervals are derived for this unknown risk. Unlike previously published results, the theorems derived in this and our related articles apply to general data generating distributions, loss functions (i.e., parameters), estimators, and cross-validation procedures.
机译:风险估计是重要的统计问题,其目的是选择一个好的估计量(即模型选择)并评估其性能(即估计泛化误差)。本文介绍了交叉验证的通用框架,并在估计器选择和绩效评估的背景下得出了交叉验证的风险估计器的分布特性。考虑任意类别的估计量,包括连续和多结果结局的密度估计量和预测量。提供了一般完整数据丢失功能的结果(例如,绝对和平方误差,指标,负对数密度)。为了涵盖留一法制交叉验证,V折交叉验证,蒙特卡洛交叉验证和引导程序,使用了交叉验证的广泛定义。对于估计器选择,在基于交叉验证的风险估计器的选择器执行渐近式以及基于以下条件下的风险的最优预言子选择器的意义上,导出有限样本风险边界并将其应用于建立交叉验证的渐近最优性真实的,未知的数据生成分布。渐近结果是在以下前提下得出的:验证集的大小收敛到无穷大,因此不包括留一法交叉验证。对于绩效评估,交叉验证的风险估计值在真实数据生成分布下显示为一致且渐近线性,并且针对该未知风险得出了置信区间。与以前发布的结果不同,本手册及相关文章中得出的定理适用于一般数据生成分布,损失函数(即参数),估计量和交叉验证程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号