首页> 外文期刊>BMC Medical Research Methodology >Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia
【24h】

Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia

机译:在俄罗斯与健康相关的生活质量和物质滥用的背景下,线性回归中子集选择方法的比较

获取原文
       

摘要

Background Automatic stepwise subset selection methods in linear regression often perform poorly, both in terms of variable selection and estimation of coefficients and standard errors, especially when number of independent variables is large and multicollinearity is present. Yet, stepwise algorithms remain the dominant method in medical and epidemiological research. Methods Performance of stepwise (backward elimination and forward selection algorithms using AIC, BIC, and Likelihood Ratio Test, p?=?0.05 (LRT)) and alternative subset selection methods in linear regression, including Bayesian model averaging (BMA) and penalized regression (lasso, adaptive lasso, and adaptive elastic net) was investigated in a dataset from a cross-sectional study of drug users in St. Petersburg, Russia in 2012–2013. Dependent variable measured health-related quality of life, and independent correlates included 44 variables measuring demographics, behavioral, and structural factors. Results In our case study all methods returned models of different size and composition varying from 41 to 11 variables. The percentage of significant variables among those selected in final model varied from 100?% to 27?%. Model selection with stepwise methods was highly unstable, with most (and all in case of backward elimination: BIC, forward selection: BIC, and backward elimination: LRT) of the selected variables being significant (95?% confidence interval for coefficient did not include zero). Adaptive elastic net demonstrated improved stability and more conservative estimates of coefficients and standard errors compared to stepwise. By incorporating model uncertainty into subset selection and estimation of coefficients and their standard deviations, BMA returned a parsimonious model with the most conservative results in terms of covariates significance. Conclusions BMA and adaptive elastic net performed best in our analysis. Based on our results and previous theoretical studies the use of stepwise methods in medical and epidemiological research may be outperformed by alternative methods in cases such as ours. In situations of high uncertainty it is beneficial to apply different methodologically sound subset selection methods, and explore where their outputs do and do not agree. We recommend that researchers, at a minimum, should explore model uncertainty and stability as part of their analyses, and report these details in epidemiological papers.
机译:背景技术在变量选择以及系数和标准误差的估计方面,线性回归中的自动逐步子集选择方法通常效果较差,尤其是在自变量数量较大且存在多重共线性的情况下。然而,逐步算法仍然是医学和流行病学研究中的主要方法。方法逐步回归(使用AIC,BIC和似然比检验的向后消除和正向选择算法,p?=?0.05(LRT))和替代子集选择方法在线性回归中的性能,包括贝叶斯模型平均(BMA)和惩罚回归(套索,自适应套索和自适应弹性网)是在2012-2013年俄罗斯圣彼得堡吸毒者横断面研究的数据集中进行的。因变量可衡量与健康相关的生活质量,而独立因数则包括44个用于衡量人口统计学,行为和结构因素的变量。结果在我们的案例研究中,所有方法均返回了大小和组成从41到11个变量不等的模型。在最终模型中选择的变量中,显着变量的百分比从100%到27%不等。使用逐步方法进行的模型选择非常不稳定,大多数(并且在所有情况下,向后消除:BIC,正向选择:BIC和向后消除:LRT)所选择的变量都很重要(系数的95%置信区间不包括在内)零)。与逐步相比,自适应弹性网具有更高的稳定性,并且系数和标准误差的估计更为保守。通过将模型不确定性纳入子集选择以及系数及其标准偏差的估计中,BMA返回了一个简约模型,该模型在协变量意义上具有最保守的结果。结论BMA和自适应弹性网在我们的分析中表现最佳。根据我们的研究结果和先前的理论研究,在诸如我们这样的案例中,逐步方法在医学和流行病学研究中的使用可能会优于其他方法。在不确定性较高的情况下,应用不同的方法学上合理的子集选择方法,并探索其输出在哪里一致和不一致是有益的。我们建议研究人员至少应在分析中探索模型的不确定性和稳定性,并在流行病学论文中报告这些细节。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号