首页> 外文期刊>PLoS Computational Biology >Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions
【24h】

Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions

机译:用于基因组数据整合的贝叶斯推理可减少预测蛋白质与蛋白质相互作用的错误分类率

获取原文
           

摘要

Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic na?ve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than na?ve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.
机译:蛋白质-蛋白质相互作用(PPI)对于大多数基本细胞过程至关重要。人们对重建PPI网络越来越感兴趣。但是,在获得可靠的预测中存在一些关键困难。显然,误报率可能高达80%以上。由于难以在单个测试中覆盖来自多个级别的数据处理过程的错误,因此来自每个生成源的错误校正可能既耗时又效率低下。我们提出了一种新颖的贝叶斯积分方法,即非参数贝叶斯集成学习(NBEL),通过自动增加信息量最大的数据源的权重,同时减少信息量较少和有偏见的数据源的权重,降低误分类率(误报和误报) 。广泛的研究表明,对于不可靠,易出错且受污染的数据,NBEL比经典的朴素贝叶斯(Bayes)强得多。在庞大的人类数据集上,我们的NBEL方法预测的PPI比纯朴素贝叶斯多。这表明以前的研究不仅可能有大量的假阳性,而且还有假阴性。对两个具有高质量的人类PPI数据集的验证支持了我们的观察。我们的实验表明,通过计算减少假阳性和假阴性的高通量PPI是可行的。可靠且自动地预测大量PPI的能力可能会激发人们使用计算方法来总体纠正数据错误,并可以提高高质量的PPI预测。这种可靠的预测可以为其他研究提供坚实的平台,例如蛋白质功能预测以及PPI在疾病易感性中的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号