首页> 外文期刊>BMC Medical Research Methodology >Graphical modeling of binary data using the LASSO: a simulation study
【24h】

Graphical modeling of binary data using the LASSO: a simulation study

机译:使用LASSO对二进制数据进行图形化建模:仿真研究

获取原文
获取外文期刊封面目录资料

摘要

Background Graphical models were identified as a promising new approach to modeling high-dimensional clinical data. They provided a probabilistic tool to display, analyze and visualize the net-like dependence structures by drawing a graph describing the conditional dependencies between the variables. Until now, the main focus of research was on building Gaussian graphical models for continuous multivariate data following a multivariate normal distribution. Satisfactory solutions for binary data were missing. We adapted the method of Meinshausen and Bühlmann to binary data and used the LASSO for logistic regression. Objective of this paper was to examine the performance of the Bolasso to the development of graphical models for high dimensional binary data. We hypothesized that the performance of Bolasso is superior to competing LASSO methods to identify graphical models. Methods We analyzed the Bolasso to derive graphical models in comparison with other LASSO based method. Model performance was assessed in a simulation study with random data generated via symmetric local logistic regression models and Gibbs sampling. Main outcome variables were the Structural Hamming Distance and the Youden Index. We applied the results of the simulation study to a real-life data with functioning data of patients having head and neck cancer. Results Bootstrap aggregating as incorporated in the Bolasso algorithm greatly improved the performance in higher sample sizes. The number of bootstraps did have minimal impact on performance. Bolasso performed reasonable well with a cutpoint of 0.90 and a small penalty term. Optimal prediction for Bolasso leads to very conservative models in comparison with AIC, BIC or cross-validated optimal penalty terms. Conclusions Bootstrap aggregating may improve variable selection if the underlying selection process is not too unstable due to small sample size and if one is mainly interested in reducing the false discovery rate. We propose using the Bolasso for graphical modeling in large sample sizes.
机译:背景技术图形模型被认为是一种对高维临床数据建模的有前途的新方法。他们通过绘制描述变量之间条件依存关系的图表,提供了一种概率工具来显示,分析和可视化类似网络的依存结构。到目前为止,研究的主要重点是建立遵循多元正态分布的连续多元数据的高斯图形模型。缺少二进制数据的令人满意的解决方案。我们将Meinshausen和Bühlmann的方法调整为二进制数据,并使用LASSO进行逻辑回归。本文的目的是检验Bolasso在开发高维二进制数据图形模型方面的性能。我们假设Bolasso的性能优于竞争LASSO方法来识别图形模型。方法与其他基于LASSO的方法相比,我们分析了Bolasso以得出图形模型。在模拟研究中,使用通过对称局部逻辑回归模型和Gibbs采样生成的随机数据评估了模型性能。主要结果变量是结构汉明距离和尤登指数。我们将模拟研究的结果应用于具有头颈癌患者功能数据的真实生活数据。结果Bolasso算法中合并了Bootstrap聚合,大大提高了较大样本量的性能。引导程序的数量对性能的影响很小。 Bolasso的表现相当不错,切入点为0.90,罚款期较短。与AIC,BIC或交叉验证的最佳惩罚条款相比,Bolasso的最佳预测导致模型非常保守。结论如果由于样本量较小而基础的选择过程不太不稳定,并且主要对降低错误发现率感兴趣,那么自举聚合可以改善变量选择。我们建议将Bolasso用于大样本量的图形建模。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号