首页> 外文期刊>Journal of Integrative Bioinformatics >An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions
【24h】

An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions

机译:评估蛋白质和蛋白质相互作用网络的机器和统计学习方法的评估

获取原文
       

摘要

Protein-protein interactions (PPI) play a key role in many biological systems. Over the past few years, an explosion in availability of functional biological data obtained from high-throughput technologies to infer PPI has been observed. However, results obtained from such experiments show high rates of false positives and false negatives predictions as well as systematic predictive bias. Recent research has revealed that several machine and statistical learning methods applied to integrate relatively weak, diverse sources of large-scale functional data may provide improved predictive accuracy and coverage of PPI. In this paper we describe the effects of applying different computational, integrative methods to predict PPI in Saccharomyces cerevisiae. We investigated the predictive ability of combining different sets of relatively strong and weak predictive datasets. We analysed several genomic datasets ranging from mRNA co-expression to marginal essentiality. Moreover, we expanded an existing multi-source dataset from S. cerevisiae by constructing a new set of putative interactions extracted from Gene Ontology (GO)-driven annotations in the Saccharomyces Genome Database. Different classification techniques: Simple Naive Bayesian (SNB), Multilayer Perceptron (MLP) and K-Nearest Neighbors (KNN) were evaluated. Relatively simple classification methods (i.e. less computing intensive and mathematically complex), such as SNB, have been proven to be proficient at predicting PPI. SNB produced the highest predictive quality obtaining an area under Receiver Operating Characteristic (ROC) curve (AUC) value of 0.99. The lowest AUC value of 0.90 was obtained by the KNN classifier. This assessment also demonstrates the strong predictive power of GO-driven models, which offered predictive performance above 0.90 using the different machine learning and statistical techniques. As the predictive power of single-source datasets became weaker MLP and SNB performed better than KNN. Moreover, predictive performance saturation may be reached independently of the classification models applied, which may be explained by predictive bias and incompleteness of existing Gold Standards. More comprehensive and accurate PPI maps will be produced for S. cerevisiae and beyond with the emergence of large-scale datasets of better predictive quality and the integration of intelligent classification methods.
机译:蛋白质-蛋白质相互作用(PPI)在许多生物系统中起关键作用。在过去的几年中,已经观察到从高通量技术获得的推断PPI的功能生物学数据的可用性爆炸式增长。但是,从此类实验中获得的结果显示出较高的假阳性和假阴性预测率以及系统的预测偏差。最近的研究表明,几种用于集成相对弱的,多样化的大规模功能数据源的机器和统计学习方法可能会提供更好的预测准确性和PPI的覆盖范围。在本文中,我们描述了应用不同的计算,综合方法来预测酿酒酵母中PPI的效果。我们调查了结合不同的相对强和弱的预测数据集的预测能力。我们分析了从mRNA共表达到边缘必需性的几个基因组数据集。此外,我们通过构建酿酒酵母基因组数据库中从基因本体论(GO)驱动的注释中提取的一组新的假定相互作用,扩展了酿酒酵母的现有多源数据集。不同的分类技术:简单朴素贝叶斯(SNB),多层感知器(MLP)和K最近邻(KNN)进行了评估。事实证明,相对简单的分类方法(例如,计算量较小,数学上较复杂的分类法),例如SNB,可熟练预测PPI。 SNB获得了最高的预测质量,从而在接收器工作特性(ROC)曲线(AUC)值下达到0.99。 KNN分类器获得的最低AUC值为0.90。该评估还证明了GO驱动模型的强大预测能力,使用不同的机器学习和统计技术可提供0.90以上的预测性能。随着单源数据集的预测能力变弱,MLP和SNB的性能优于KNN。此外,可以独立于所应用的分类模型来达到预测性能饱和,这可以通过现有金标准的预测偏差和不完整性来解释。随着更好的预测质量的大规模数据集的出现以及智能分类方法的集成,将为酿酒酵母及其以后的产品制作更全面,准确的PPI图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号