首页> 外文学位 >Variable selection in high dimensional complex data and Bayesian estimation of reduction subspace
【24h】

Variable selection in high dimensional complex data and Bayesian estimation of reduction subspace

机译:高维复杂数据中的变量选择和约化子空间的贝叶斯估计

获取原文
获取原文并翻译 | 示例

摘要

Nowadays researchers are collecting large amount of data for which the number of predictors p is often too large to allow a thorough graphical visualization of the data for regression modeling. Commonly regression data are collected jointly on (Y,X) where X = (X 1,...,Xp) T is a random p-dimensional predictor and Y is a univariate response. In high dimensional setup, frequently encountered problems for variable selection or estimation in regression analyses are i) nonlinear relationship among predictors and response, ii) number of predictors much larger than sample size, iii) presence of sparsity.;In such situations, it can be useful to reduce the dimensionality of the predictor space that best depicts the information needed to explain the response under consideration. Principal fitted component (PFC; Cook, 2007) models are likelihood-based inverse regression methods that yield a sufficient reduction of the random p-vector of predictors X given the response Y. Three methodologies based on PFC models are presented: (1) Consistency of a P-value guided variable selection method (PFC-pv; Adragni & Xi, 2015), (2) Variable selection based on PFC (PFC-lrt) in presence of nonlinearity and sparsity, (3) Bayesian estimation of the dimension reduction subspace under PFC model.;PFC-pv is a "p-value guided hard-thresholding" approach for variable selection based on PFC introduced by Adragni & Xi (2015) . Their encouraging simulation studies suggest a possible selection consistency of the variable selection procedure. Our approach to prove the consistency theoretically is primarily based on fixed sequence of significance levels (alpha) contrary to their data driven choice of alpha in the PFC-pv method. In addition, we explore the dynamics of sample size, number of predictors and significance level on variable selection.;When a non-linear relationship is suspected and a possibly large number of predictors are irrelevant, the accuracy of sufficient reduction is hindered. The proposed PFC-lrt method is a novel approach for variable selection in high dimensions when the relationship between the active predictors and the response is nonlinear. PFC-lrt adapts a sequential likelihood ratio test to the PFC to obtain a "pruned" sufficient reduction. The resulting reduction has an improved accuracy, allows the accurate identification of the important predictors and also provides a sparse estimate of reduction matrix.;In the third part, we develop a fully Bayesian estimation of the parameters in the PFC model using proper prior distribution on both Stiefel and Grassman manifold for the reduction matrix. Efficient Gibbs samplers are developed and the efficacy of the Bayes estimate is illustrated through simulations.
机译:如今,研究人员正在收集大量的数据,其预测变量p的数量通常太大,以至于无法对数据进行全面的图形可视化以进行回归建模。通常,回归数据是在(Y,X)上联合收集的,其中X =(X 1,...,Xp)T是随机的p维预测变量,Y是单变量响应。在高维设置中,回归分析中变量选择或估计经常遇到的问题是:i)预测变量与响应之间的非线性关系; ii)远远大于样本数量的预测变量数量; iii)稀疏性存在;在这种情况下,它可以有助于减少最能描述解释所考虑响应的信息的预测变量空间的维数。主拟合分量(PFC; Cook,2007)模型是基于似然的逆回归方法,在给定响应Y的情况下,可充分减少预测变量X的随机p向量。提出了三种基于PFC模型的方法:(1)一致性P值引导变量选择方法(PFC-pv; Adragni&Xi,2015)的研究,(2)在存在非线性和稀疏性的情况下基于PFC的变量选择(PFC-lrt),(3)降维的贝叶斯估计PFC-pv是Adragni&Xi(2015)提出的基于PFC的“ p值引导硬阈值”变量选择方法。他们令人鼓舞的模拟研究表明变量选择过程的可能选择一致性。我们从理论上证明一致性的方法主要是基于固定的有意义水平序列(alpha),这与在PFC-pv方法中数据驱动的alpha选择相反。此外,我们还探讨了样本量,预测变量数量和变量选择的显着性水平的动态变化。当怀疑存在非线性关系并且可能与大量预测变量无关时,会降低充分减少的准确性。当主动预测变量与响应之间的关系为非线性时,提出的PFC-lrt方法是一种用于高维变量选择的新颖方法。 PFC-lrt使顺序似然比测试适应PFC,以获得“修剪”的充分减少量。所产生的减少具有提高的精度,可以准确识别重要的预测变量,还可以提供减少矩阵的稀疏估计。第三部分,我们使用适当的先验分布对PFC模型中的参数进行了完全贝叶斯估计。简化矩阵的Stiefel和Grassman流形。开发了有效的吉布斯采样器,并通过仿真说明了贝叶斯估计的有效性。

著录项

  • 作者

    Karmakar, Moumita.;

  • 作者单位

    University of Maryland, Baltimore County.;

  • 授予单位 University of Maryland, Baltimore County.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 101 p.
  • 总页数 101
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号