首页> 外文学位 >Model selection when the number of variables exceeds the number of observations.
【24h】

Model selection when the number of variables exceeds the number of observations.

机译:变量数超过观察数时的模型选择。

获取原文
获取原文并翻译 | 示例

摘要

The classical multivariate linear regression problem assumes p predictor variables X1, X 2,..., Xp and a response vector y, each with n observations, and a linear relationship between the two: y = Xbeta + z, where z ∼ N(0, sigma 2). This thesis find that when p > n, there is a breakdown point for standard model selection schemes, such that model selection only works well below a certain critical complexity level depending on n/p. This notion is applied to some standard model selection algorithms (Classical Forward Stepwise, Forward Stepwise with False Discovery Rate thresholding, Lasso, LARS, and Stagewise Orthogonal Pursuit) in the case where p ≫ n.; The notion of the Phase Diagram is borrowed from signal processing and statistical physics to discover that (1) the breakdown point is well-defined for random X-models and low noise, (2) increasing noise shifts the breakdown point to lower levels of sparsity, and reduces the model recovery ability of the algorithm in a systematic way; and (3) below breakdown, the size of coefficient errors follows the theoretical error distribution for the classical linear model.; Our results are exhibited in a chemometric application using P. J. Brown. T. Fearn, and M. Vannucci's near-infrared spectroscopy data.
机译:经典的多元线性回归问题假设p个预测变量X1,X 2,...,Xp和一个响应向量y(每个都有n个观测值)以及两者之间的线性关系:y = Xbeta + z,其中z〜N( 0,sigma 2)。本论文发现,当p> n时,标准模型选择方案存在一个崩溃点,因此模型选择仅在取决于n / p的特定临界复杂度以下才能很好地工作。在p≫的情况下,此概念适用于某些标准模型选择算法(经典正向逐步,具有错误发现率阈值的正向逐步,套索,LARS和逐步正交追踪)。 n。相位图的概念是从信号处理和统计物理学中借用的,它发现(1)对于随机X模型和低噪声,击穿点是明确定义的;(2)噪声的增加将击穿点转移到较低的稀疏度,有系统地降低了算法的模型恢复能力; (3)在分解之后,系数误差的大小遵循经典线性模型的理论误差分布。我们的结果在使用P.J. Brown的化学计量学应用中得以展示。 T. Fearn和M. Vannucci的近红外光谱数据。

著录项

  • 作者

    Stodden, Victoria.;

  • 作者单位

    Stanford University.;

  • 授予单位 Stanford University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 93 p.
  • 总页数 93
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号