首页> 外文期刊>BMC Systems Biology >Network-based logistic regression integration method for biomarker identification
【24h】

Network-based logistic regression integration method for biomarker identification

机译:基于网络的logistic回归整合方法用于生物标记识别

获取原文
           

摘要

Many mathematical and statistical models and algorithms have been proposed to do biomarker identification in recent years. However, the biomarkers inferred from different datasets suffer a lack of reproducibilities due to the heterogeneity of the data generated from different platforms or laboratories. This motivates us to develop robust biomarker identification methods by integrating multiple datasets. In this paper, we developed an integrative method for classification based on logistic regression. Different constant terms are set in the logistic regression model to measure the heterogeneity of the samples. By minimizing the differences of the constant terms within the same dataset, both the homogeneity within the same dataset and the heterogeneity in multiple datasets can be kept. The model is formulated as an optimization problem with a network penalty measuring the differences of the constant terms. The L1 penalty, elastic penalty and network related penalties are added to the objective function for the biomarker discovery purpose. Algorithms based on proximal Newton method are proposed to solve the optimization problem. We first applied the proposed method to the simulated datasets. Both the AUC of the prediction and the biomarker identification accuracy are improved. We then applied the method to two breast cancer gene expression datasets. By integrating both datasets, the prediction AUC is improved over directly merging the datasets and MetaLasso. And it’s comparable to the best AUC when doing biomarker identification in an individual dataset. The identified biomarkers using network related penalty for variables were further analyzed. Meaningful subnetworks enriched by breast cancer were identified. A network-based integrative logistic regression model is proposed in the paper. It improves both the prediction and biomarker identification accuracy.
机译:近年来,已经提出了许多数学和统计模型以及算法来进行生物标记识别。然而,由于从不同平台或实验室产生的数据的异质性,从不同数据集推断出的生物标记物缺乏可重复性。这激励我们通过整合多个数据集来开发强大的生物标志物识别方法。在本文中,我们开发了一种基于逻辑回归的综合分类方法。在逻辑回归模型中设置了不同的常数项,以测量样本的异质性。通过最小化同一数据集中常数项的差异,可以保持同一数据集中的同质性和多个数据集中的异质性。该模型被公式化为一个优化问题,并带有测量常数项差异的网络损失。将L1罚分,弹性罚分和与网络相关的罚分添加到目标函数中,以实现生物标记物发现目的。提出了基于近邻牛顿法的优化算法。我们首先将提出的方法应用于模拟数据集。预测的AUC和生物标志物识别准确性均得到改善。然后,我们将该方法应用于两个乳腺癌基因表达数据集。通过集成两个数据集,与直接合并数据集和MetaLasso相比,改进了预测AUC。在单个数据集中进行生物标记识别时,它可以媲美最佳AUC。使用与网络相关的惩罚变量对已识别的生物标志物进行进一步分析。确定了有意义的富含乳腺癌的子网。提出了一种基于网络的集成逻辑回归模型。它提高了预测和生物标记识别的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号