MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression

Ren Sheng; Kang Emily L.; Lu Jason L.

首页> 外文期刊>Statistics and computing >MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression

【24h】

MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression

机译：MCEN：一种用于高维多项式回归的同时变量选择和聚类方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Multinomial regression is often used to investigate the association between potential independent variables and multi-class nominal responses such as multiple disease subtypes. However, it cannot identify groups of variables that have similar effects on predicting the same subtypes of diseases, which is an important problem in biomedical research. Clustering variables in this problem is not trivial, since correlated variables may have distinct predictive effects on the multi-class nominal responses. For example, a group of moderately to highly correlated expressed genes may be associated with different subtypes of a disease. This paper presents a new data-driven simultaneous variable selection and clustering method for high-dimensional multinomial regression. By using a novel penalty function that incorporates both regression coefficients and pairwise correlation to define clusters of variables, the proposed method provides a one-stop solution to select and group important variables associated with different classes of multinomial response at the same time. An alternating minimization algorithm is developed to solve the resulting optimizing problem, which incorporates both convex optimization and clustering steps. The proposed method is compared with the state of the art in terms of prediction and variable clustering performance through extensive simulation studies. In addition, three real data examples are presented to demonstrate how to apply our method and further verify the findings in our simulation studies. The results of simulation and real data studies also shed light on the strength and weakness of several different penalized regression methods with respect to variable clustering and prediction in different scenarios.

机译：多项回归通常用于研究潜在的独立变量与多类别名义反应（例如多种疾病亚型）之间的关联。但是，它无法确定对预测相同疾病亚型具有相似影响的变量组，这是生物医学研究中的重要问题。由于相关变量可能对多类名义响应具有明显的预测效果，因此在该问题中对变量进行聚类并非易事。例如，一组中度到高度相关的表达基因可能与疾病的不同亚型相关。本文提出了一种新的数据驱动的同时变量选择和聚类的高维多项式回归方法。通过使用结合了回归系数和成对相关性的新颖惩罚函数来定义变量集群，所提出的方法提供了一站式解决方案，可以同时选择和分组与不同类别的多项式响应相关的重要变量。开发了一种交替最小化算法来解决由此产生的优化问题，该算法结合了凸优化和聚类步骤。通过广泛的仿真研究，将所提出的方法在预测和变量聚类性能方面与最新技术进行了比较。此外，还提供了三个真实的数据示例，以演示如何应用我们的方法并进一步验证模拟研究中的发现。模拟和真实数据研究的结果还揭示了几种不同的惩罚回归方法在不同场景下的变量聚类和预测方面的优缺点。

著录项

来源
《Statistics and computing》 |2020年第2期|291-304|共14页
作者
Ren Sheng; Kang Emily L.; Lu Jason L.;
展开▼
作者单位

UnitedHlth Grp R&D Minnetonka MN USA;

Univ Cincinnati Dept Math Sci Div Stat & Data Sci Cincinnati OH USA;

Cincinnati Childrens Hosp Med Ctr Dept Biomed Informat Cincinnati OH 45229 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Classification; Clustering; High dimensional; Multinomial regression; Optimization; Pairwise correlation;

机译：分类;集群;高尺寸;多项式回归优化;成对相关;

相似文献

外文文献
中文文献
专利

1. Sparse Bayesian variable selection in multinomial probit regression model with application to high-dimensional data classification [J] . Yang Aijun, Jiang Xuejun, Xiang Liming, Communications in Statistics . 2017,第11a12期

机译：多项式概率回归模型中的稀疏贝叶斯变量选择及其在高维数据分类中的应用
2. Bayesian Variable Selection and Model Averaging in High-Dimensional Multinomial Nonparametric Regression [J] . Paul Yau, Robert Kohn, Sally Wood Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2003,第1期

机译：高维多项式非参数回归中的贝叶斯变量选择和模型平均
3. A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables [J] . Ryoya Oda, Hirokazu Yanagihara Electronic Journal of Statistics . 2020,第1期

机译：具有大量解释变量的高维多元线性回归的快速且一致的变量选择方法
4. A Variable Clustering Method and it's Applying to Variable Selection in Multiple Linear Regression Model [C] . XuYuan The proceedings of 2010 international conference on probability and statistics of the International Institute for General Systems Studies.;vol. 2.;Applications on probability and statistics . 2010

机译：变量聚类方法及其在多元线性回归模型中的变量选择
5. Empirical bayes variable selection in high-dimensional regression. [D] . Pungpapong, Vitara. 2012

机译：高维回归中的经验贝叶斯变量选择。
6. Simultaneous regression shrinkage variable selection and clustering of predictors with OSCAR [O] . Howard D. Bondell, Brian J. Reich -1

机译：同时使用OSCAR进行收缩收缩变量选择和预测变量聚类
7. Nonparametric Variable Selection, Clustering and Prediction for High-Dimensional Regression [O] . Guha, Subharup, Baladandayuthapani, Veerabhadran 2016

机译：非参数变量选择，聚类和预测高维回归

MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅