首页> 外文OA文献 >Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples
【2h】

Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples

机译:在估计器和通用的交叉验证自适应Epsilon-Net估计器之间进行选择的统一交叉验证方法:有限的示例Oracle不等式和示例

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In Part I of this article we propose a general cross-validation criterian for selecting among a collection of estimators of a particular parameter of interest based on n i.i.d. observations. It is assumed that the parameter of interest minimizes the expectation (w.r.t. to the distribution of the observed data structure) of a particular loss function of a candidate parameter value and the observed data structure, possibly indexed by a nuisance parameter. The proposed cross-validation criterian is defined as the empirical mean over the validation sample of the loss function at the parameter estimate based on the training sample, averaged over random splits of the observed sample. The cross-validation selector is now the estimator which minimizes this cross-validation criterion. We illustrate that this general methodology covers, in particular, the selection problems in the current literature, but results in a wide range of new selection methods. We prove a finite sample oracle inequality, and asymptotic optimality of the cross-validated selector under general conditions. The asymptotic optimality states that the cross-validation selector performs asymptotically exactly as well as the selector which for each given data set makes the best choice (knowing the true data generating distribution).Our general framework allows, in particular, the situation in which the observed data structure is a censored version of the full data structure of interest, and where the parameter of interest is a parameter of the full data structure distribution. As examples of the parameter of the full data distribution we consider a density of (a part of) the full data structure, a conditional expectation of an outcome, given explanatory variables, a marginal survival function of a failure time, and multivariate conditional expectation of an outcome vector, given covariates. In part II of this article we show that the general estimating function methodology for censored data structures as provided in van der Laan, Robins (2002) yields the wished loss functions for the selection among estimators of a full-data distribution parameter of interest based on censored data. The corresponding cross-validation selector generalizes any of the existing selection methods in regression and density estimation (including model selection) to the censored data case. Under general conditions, our optimality results now show that the corresponing cross-validation selector performs asymptotically exactly as well as the selector which for each given data set makes the best choice (knowing the true full data distribution).In Part III of this article we propose a general estimator which is defined as follows. For a collection of subspaces and the complete parameter space, one defines an epsilon-net (i.e., a finite set of points whose epsilon-spheres cover the complete parameter space). For each epsilon and subspace one defines now a corresponding minimum cross-valided empirical risk estimator as the minimizer of cross-validated risk over the subspace-specific epsilon-net. In the special case that the loss function has no nuisance parameter, which thus covers the classical regression and density estimation cases, this epsilon and subspace specific minimum risk estimator reduces to the minimizer of the empirical risk over the corresponding epsilon-net. Finally, one selects epsilon and the subspace with the cross-validation selector. We refer to the resulting estimator as the cross-validated adaptive epsilon-net estimator. We prove an oracle inequality for this estimator which implies that the estimator minimax adaptive in the sense that it achieves the minimax optimal rate of convergence for the smallest of the guessed subspaces containing the true parameter value.
机译:在本文的第一部分中,我们提出了一个通用的交叉验证准则,用于基于n i.i.d在感兴趣的特定参数的估计量集合中进行选择。观察。假定感兴趣的参数使候选参数值和观察到的数据结构的特定损失函数的期望值(对观察到的数据结构的分布的w.r.t.)最小化,可能由烦人的参数来索引。提出的交叉验证标准n定义为基于训练样本的参数估计时损失函数的验证样本在经验样本上的经验平均值,该平均值在观察样本的随机分割上取平均值。交叉验证选择器现在是将此交叉验证条件最小化的估算器。我们说明,这种通用方法尤其涵盖了当前文献中的选择问题,但是却导致了各种各样的新选择方法。我们证明了有限样本oracle不等式,以及在一般条件下交叉验证选择器的渐近最优性。渐近最优性指出,交叉验证选择器与选择器的渐进性完全一样,对于每个给定的数据集,选择器都是最佳选择(知道真实的数据生成分布)。特别是在我们的通用框架中,观察到的数据结构是感兴趣的完整数据结构的审查版本,其中感兴趣的参数是完整数据结构分布的参数。作为完整数据分布参数的示例,我们考虑了完整数据结构的(一部分)密度,结果的条件期望,给定的解释变量,故障时间的边际生存函数以及变量的多条件期望给定协变量的结果向量。在本文的第二部分中,我们展示了van der Laan,Robins(2002)中提供的用于删失数据结构的通用估计函数方法,可以得出期望的损失函数,用于在基于估计的全数据分布参数的估计器中进行选择。审查数据。相应的交叉验证选择器将审查和密度估计(包括模型选择)中的任何现有选择方法推广到受检查的数据情况。在一般条件下,我们的最优结果现在表明,对应的交叉验证选择器的渐近性能与选择器的渐近性能完全一样,对于每个给定的数据集,选择器都是最佳选择(知道真正的完整数据分布)。在本文的第三部分中提出一个通用估计量,其定义如下。对于子空间和完整参数空间的集合,定义了一个epsilon-net(即,其epsilon球覆盖了完整参数空间的一组有限点)。现在,对于每个ε和子空间,将对应的最小交叉验证的经验风险估计量定义为特定于子空间的ε网上的交叉验证的风险的最小值。在特殊的情况下,损失函数没有扰民参数,因此涵盖了经典的回归和密度估计情况,该epsilon和子空间特定的最小风险估计器将相应的epsilon网络上的经验风险降到最小。最后,使用交叉验证选择器选择epsilon和子空间。我们将结果估计器称为交叉验证的自适应epsilon-net估计器。我们证明了此估计量的预言不等式,这意味着在对包含真实参数值的最小猜想子空间而言,它实现了最小最大最优收敛速度的意义上,估计器最小最大自适应。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号