首页> 外文学位 >Investigation of topics in U-statistics and their applications in risk estimation and cross-validation
【24h】

Investigation of topics in U-statistics and their applications in risk estimation and cross-validation

机译:U统计中的主题调查及其在风险估计和交叉验证中的应用

获取原文
获取原文并翻译 | 示例

摘要

The primary goal of my dissertation has been to develop new methods, including theory and practical implementation, in the area of U-statistics. This area is quite old, with many important results first appearing in Hoeffding (1948). There have been many applications of U-statistics in nonparametric statistics. One area that is quite modern and active is cross-validation and risk estimation, although it has not traditionally been thought of as a U-statistic area. The application of my research has been focused on this area.;The first objective of my research is to devise the best unbiased variance estimator for a general U-statistic. It can be written as a quadratic form of the kernel function and is applicable as long as the kernel size k ≤ n/2. In addition, it can be represented as a familiar ANOVA form as a contrast of between-class and within-class variation. As a further step to make the proposed variance estimator more practical, we developed a partition resampling scheme that can be used to realize the U-statistic and its variance estimator simultaneously with high computational efficiency.;We then turn our attention to the implementation of U-statistics in risk estimation in the context of the nonparametric kernel density estimator. We propose to construct a U-statistic form estimate for the risk that arises from L2 and Kullback-Leibler distance respectively. In addition, we consider a two-stage, "subsampling+extrapolation", bandwidth selection procedure which can help to reduce the variability of the conventional cross-validation bandwidth selector dramatically. It is equivalent to Hall and Robinson's (2009) [27] rescaled "bagging cross-validation" bandwidth selector if one sets the fictional sample size equal to the bootstrap size. However, the simple form for our U-statistic risk estimator enables us to calculate the aggregated risk much more efficiently than bootstrapping. Moreover, a real data example in the context of model selection is considered. We construct a U-statistic cross-validation tool, akin to the BIC criterion for model selection. The U-estimator for the likelihood risk is more generally applicable than the AIC and BIC methods. In addition, with our proposed variance estimator for a general U-statistic we can test which model has the smallest risk. Finally, we will explore extrapolation and interpolation techniques with applications in bandwidth selection, variance estimation, and quantile estimation. Some preliminary results will be discussed in the end of the dissertation.
机译:本文的主要目标是在U统计领域开发新的方法,包括理论和实践方法。这个地区相当古老,许多重要成果首先出现在霍夫丁(1948)。 U统计量在非参数统计量中有许多应用。交叉验证和风险评估是一个非常现代且活跃的领域,尽管传统上并未将其视为U统计领域。我的研究的应用一直集中在这一领域。我的研究的第一个目标是为一般U统计量设计最佳无偏方差估计量。它可以写为核函数的二次形式,并且只要核大小k≤n / 2即可适用。此外,它可以表示为熟悉的方差分析形式,作为类间差异和类内变异的对比。为了使所提出的方差估计器更实用,我们进一步开发了一种分区重采样方案,该方案可用于以高计算效率同时实现U统计量及其方差估计器。然后,我们将注意力转向U的实现非参数内核密度估计器的风险估计中的统计信息。我们建议针对分别由L2和Kullback-Leibler距离引起的风险构建U统计形式的估计。此外,我们考虑了一个两阶段的“子采样+外推”带宽选择过程,该过程可以帮助显着降低常规交叉验证带宽选择器的可变性。如果有人将虚拟样本大小设置为等于引导大小,则它等效于Hall and Robinson(2009)[27]重新缩放的“装袋交叉验证”带宽选择器。但是,U统计风险估计器的简单形式使我们能够比自举法更有效地计算汇总风险。此外,考虑了模型选择方面的实际数据示例。我们构建了一个类似于BIC标准进行模型选择的U统计交叉验证工具。与AIC和BIC方法相比,适用于可能性风险的U估计值更普遍。另外,使用我们针对一般U统计量提出的方差估计器,我们可以测试哪个模型具有最小的风险。最后,我们将探讨外推和内插技术及其在带宽选择,方差估计和分位数估计中的应用。本文的最后将讨论一些初步结果。

著录项

  • 作者

    Wang, Qing.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Statistics.;Applied mathematics.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 188 p.
  • 总页数 188
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号