首页> 外文期刊>Statistics and computing >MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression
【24h】

MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression

机译:MCEN:一种用于高维多项式回归的同时变量选择和聚类方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Multinomial regression is often used to investigate the association between potential independent variables and multi-class nominal responses such as multiple disease subtypes. However, it cannot identify groups of variables that have similar effects on predicting the same subtypes of diseases, which is an important problem in biomedical research. Clustering variables in this problem is not trivial, since correlated variables may have distinct predictive effects on the multi-class nominal responses. For example, a group of moderately to highly correlated expressed genes may be associated with different subtypes of a disease. This paper presents a new data-driven simultaneous variable selection and clustering method for high-dimensional multinomial regression. By using a novel penalty function that incorporates both regression coefficients and pairwise correlation to define clusters of variables, the proposed method provides a one-stop solution to select and group important variables associated with different classes of multinomial response at the same time. An alternating minimization algorithm is developed to solve the resulting optimizing problem, which incorporates both convex optimization and clustering steps. The proposed method is compared with the state of the art in terms of prediction and variable clustering performance through extensive simulation studies. In addition, three real data examples are presented to demonstrate how to apply our method and further verify the findings in our simulation studies. The results of simulation and real data studies also shed light on the strength and weakness of several different penalized regression methods with respect to variable clustering and prediction in different scenarios.
机译:多项回归通常用于研究潜在的独立变量与多类别名义反应(例如多种疾病亚型)之间的关联。但是,它无法确定对预测相同疾病亚型具有相似影响的变量组,这是生物医学研究中的重要问题。由于相关变量可能对多类名义响应具有明显的预测效果,因此在该问题中对变量进行聚类并非易事。例如,一组中度到高度相关的表达基因可能与疾病的不同亚型相关。本文提出了一种新的数据驱动的同时变量选择和聚类的高维多项式回归方法。通过使用结合了回归系数和成对相关性的新颖惩罚函数来定义变量集群,所提出的方法提供了一站式解决方案,可以同时选择和分组与不同类别的多项式响应相关的重要变量。开发了一种交替最小化算法来解决由此产生的优化问题,该算法结合了凸优化和聚类步骤。通过广泛的仿真研究,将所提出的方法在预测和变量聚类性能方面与最新技术进行了比较。此外,还提供了三个真实的数据示例,以演示如何应用我们的方法并进一步验证模拟研究中的发现。模拟和真实数据研究的结果还揭示了几种不同的惩罚回归方法在不同场景下的变量聚类和预测方面的优缺点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号