首页> 美国卫生研究院文献>other >A sparse regression method for group-wise feature selection with false discovery rate control
【2h】

A sparse regression method for group-wise feature selection with false discovery rate control

机译:一种具有错误发现率控制的基于群的特征选择的稀疏回归方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The method of Sorted L-One Penalized Estimation, or SLOPE, is a sparse regression method recently introduced by Bogdan et. al. []. It can be used to identify significant predictor variables in a linear model that may have more unknown parameters than observations. When the correlations between predictor variables are small, the SLOPE method is shown to successfully control the false discovery rate (the expected proportion of the irrelevant among all selected predictors) at a user specified level. However, the requirement for nearly uncorrelated predictors is too restrictive for genomic data, as demonstrated in our recent study [] by an application of SLOPE to realistic simulated DNA sequence data. A possible solution is to divide the predictor variables into nearly uncorrelated groups, and to modify the procedure to select entire groups with an overall significant group effect, rather than individual predictors. Following this motivation, we extend SLOPE in the spirit of Group LASSO to Group SLOPE, a method that can handle group structures between the predictor variables, which are ubiquitous in real genomic data. Our theoretical results show that Group SLOPE controls the group-wise false discovery rate (gFDR), when groups are orthogonal to each other. For use in non-orthogonal settings we propose two types of Monte Carlo based heuristics, which lead to gFDR control with Group SLOPE in simulations based on real SNP data. As an illustration of the merits of this method, an application of Group SLOPE to a dataset from the Framingham Heart Study results in the identification of some known DNA sequence regions associated with bone health, as well as some new candidate regions. The novel methods are implemented in the R package grpSLOPEMC, which is publicly available at .
机译:排序L-一罚估计法(SLOPE)是Bogdan等人最近引入的一种稀疏回归方法。等[]。它可以用于识别线性模型中的重要预测变量,该变量可能比观测值具有更多未知参数。当预测变量之间的相关性较小时,SLOPE方法将显示为在用户指定的级别成功控制了错误发现率(所有选定预测变量中不相关的预期比例)。但是,对于几乎不相关的预测变量,对基因组数据的要求过于严格,正如我们最近的研究[]通过将SLOPE应用于现实的模拟DNA序列数据所证明的那样。一种可能的解决方案是将预测变量分为几乎不相关的组,并修改程序以选择整体效果显着的整个组,而不是单个预测变量。遵循这种动机,我们本着LASSO组的精神将SLOPE扩展到Group SLOPE,这是一种可以处理在实际基因组数据中普遍存在的预测变量之间的组结构的方法。我们的理论结果表明,当组彼此正交时,组SLOPE控制着逐组错误发现率(gFDR)。为了在非正交设置中使用,我们提出了两种基于蒙特卡洛的启发式方法,这些方法可在基于实际SNP数据的仿真中使用Group SLOPE进行gFDR控制。为了说明此方法的优点,将SLOPE组应用于Framingham心脏研究的数据集可识别与骨骼健康相关的一些已知DNA序列区域,以及一些新的候选区域。新颖的方法在R包grpSLOPEMC中实现,该包可在http://www.grpSLOPEMC/获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号