...
首页> 外文期刊>Genomics, proteomics & bioinformatics >GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
【24h】

GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting

机译:GTB-PPI:基于L1正则化物流回归和梯度树增压预测蛋白质 - 蛋白质相互作用

获取原文
           

摘要

Protein–protein interactions (PPIs) are of great importance to understand genetic mechanisms, delineate disease pathogenesis, and guide drug design. With the increase of PPI data and development of machine learning technologies, prediction and identification of PPIs have become a research hotspot in proteomics. In this study, we propose a new prediction pipeline for PPIs based on gradient tree boosting (GTB). First, the initial feature vector is extracted by fusing pseudo amino acid composition (PseAAC), pseudo position-specific scoring matrix (PsePSSM), reduced sequence and index-vectors (RSIV), and autocorrelation descriptor (AD). Second, to remove redundancy and noise, we employ L1-regularized logistic regression (L1-RLR) to select an optimal feature subset. Finally, GTB-PPI model is constructed. Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets, respectively. In addition, GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans , Escherichia coli , Homo sapiens , and Mus musculus , the one-core PPI network for CD9, and the crossover PPI network for the Wnt-related signaling pathways. The results show that GTB-PPI can significantly improve accuracy of PPI prediction. The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/ .
机译:蛋白质 - 蛋白质相互作用(PPI)非常重视了解遗传机制,描绘疾病发病机制和指导药物设计。随着PPI数据的增加和机器学习技术的开发,PPI的预测和鉴定已成为蛋白质组学中的研究热点。在本研究中,我们提出了一种基于梯度树升压(GTB)的PPI的新预测管道。首先,通过熔化伪氨基酸组合物(PSEAAC),伪位置特异性评分矩阵(PSEPSSM),减少序列和索引矢量(RSIV),以及自相关描述符(AD)来提取初始特征载体。其次,要删除冗余和噪声,我们采用L1正常化的逻辑回归(L1-RLR)来选择最佳特征子集。最后,构建了GTB-PPI模型。五倍的交叉验证表明,GTB-PPI分别在酿酒酵母和幽门螺杆菌数据集上达到了95.15%和90.47%的准确度。此外,GTB-PPI可以应用于预测Caenorhabditis elegans的独立测试数据集,大肠杆菌,Homo Sapiens和Mus Musculus,用于CD9的单核PPI网络,以及用于WNT相关的信号通路的交叉PPI网络。结果表明,GTB-PPI可以显着提高PPI预测的准确性。 GTB-PPI的代码和数据集可以从https://github.com/qust-aibbdrc/gtb-pi/下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号