首页> 外文期刊>Statistical modeling: applications in contemporary issues >Dirichlet Lasso: A Bayesian approach to variable selection
【24h】

Dirichlet Lasso: A Bayesian approach to variable selection

机译:Dirichlet Lasso:贝叶斯变量选择方法

获取原文
获取原文并翻译 | 示例
       

摘要

Selection of the most important predictor variables in regression analysis is one of the key problems statistical research has been concerned with for long time. In this article, we propose the methodology, Dirichlet Lasso (abbreviated as DLASSO) to address this issue in a Bayesian framework. In many modern regression settings, large set of predictor variables are grouped and the coefficients belonging to any one of these groups are either all redundant or all important in predicting the response; we say in those cases that the predictors exhibit a group structure. We show that DLASSO is particularly useful where the group structure is not fully known. We exploit the clustering property of Dirichlet Process priors to infer the possibly missing group information. The Dirichlet Process has the advantage of simultaneously clustering the variable coefficients and selecting the best set of predictor variables. We compare the predictive performance of DLASSO to Group Lasso and ordinary Lasso with real data and simulation studies. Our results demonstrate that the predictive performance of DLASSO is almost as good as that of Group Lasso when group label information is given; and superior to the ordinary Lasso for missing group information. For high dimensional data (e.g., genetic data) with missing group information, DLASSO will be a powerful approach of variable selection since it provides a superior predictive performance and higher statistical accuracy.
机译:回归分析中最重要的预测变量的选择是统计研究长期以来一直关注的关键问题之一。在本文中,我们提出了一种方法,即Dirichlet Lasso(缩写为DLASSO),以在贝叶斯框架中解决此问题。在许多现代回归设置中,将大量的预测变量进行分组,并且属于这些组中任一组的系数对于预测响应都是多余的,或者都是重要的。我们说在这些情况下,预测变量表现出群体结构。我们显示了DLASSO在组结构尚不完全明了的地方特别有用。我们利用Dirichlet Process先验的聚类属性来推断可能丢失的组信息。 Dirichlet过程的优点是可以同时对变量系数进行聚类并选择最佳的预测变量集。我们将DLASSO与组Lasso和普通Lasso的预测性能进行了比较,并进行了实际数据和模拟研究。我们的结果表明,当给出组标签信息时,DLASSO的预测性能几乎与Lasso组的预测性能相同。并且在丢失组信息方面优于普通的套索。对于缺少组信息的高维数据(例如遗传数据),DLASSO将是强大的变量选择方法,因为它提供了出色的预测性能和更高的统计准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号