首页> 外文期刊>BMC research notes >Variable selection methods for developing a biomarker panel for prediction of dengue hemorrhagic fever
【24h】

Variable selection methods for developing a biomarker panel for prediction of dengue hemorrhagic fever

机译:用于开发可预测登革出血热的生物标志物的变量选择方法

获取原文
获取外文期刊封面目录资料

摘要

Background The choice of selection methods to identify important variables for binary classification modeling is critical to produce stable models that are interpretable, that generate accurate predictions and have minimum bias. This work is motivated by data on clinical and laboratory features of severe dengue infections (dengue hemorrhagic fever, DHF) obtained from 51 individuals enrolled in a prospective observational study of acute human dengue infections. Results We carry out a comprehensive performance comparison using several classification models for DHF over the dengue data set. We compared variable selection results by Multivariate Adaptive Regression Splines, Learning Ensemble, Random Forest , Bayesian Moving Averaging, Stochastic Search Variable Selection , and Generalized Regularized Logistics Regression. Model averaging methods (bagging, boosting and ensemble learners) have higher accuracy, but the generalized regularized regression model has the highest predictive power because the linearity assumptions of candidate predictors are strongly satisfied via deviance chi-square testing procedures. Bootstrapping applications for evaluating predictive regression coefficients in regularized regression model are performed. Conclusions Feature reduction methods introduce inherent biases and therefore are data-type dependent. We propose that these limitations can be overcome using an exhaustive approach for searching feature space. Using this approach, our results suggest that IL-10, platelet and lymphocyte counts are the major features for predicting dengue DHF on the basis of blood chemistries and cytokine measurements.
机译:背景技术选择识别二元分类模型中重要变量的选择方法对于产生可解释的,产生准确预测并具有最小偏差的稳定模型至关重要。这项工作的动机是从51例参与急性人类登革热感染前瞻性观察研究的个体获得的严重登革热感染(登革热出血热,DHF)的临床和实验室特征数据。结果我们对登革热数据集使用DHF的几种分类模型进行了全面的性能比较。我们通过多元自适应回归样条,学习合奏,随机森林,贝叶斯移动平均,随机搜索变量选择和广义正则后勤回归比较了变量选择结果。模型平均方法(套袋学习,提升学习和整体学习者)具有较高的准确性,但是广义正则回归模型具有最高的预测能力,因为通过偏差卡方检验程序强烈满足了候选预测变量的线性假设。执行用于评估正则回归模型中的预测回归系数的自举应用程序。结论特征约简方法引入了固有的偏差,因此取决于数据类型。我们建议可以使用穷举方法搜索特征空间来克服这些限制。使用这种方法,我们的结果表明,IL-10,血小板和淋巴细胞计数是根据血液化学和细胞因子测量预测登革热DHF的主要特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号