...
首页> 外文期刊>Bioinformatics >Simple sequence-based kernels do not predict protein-protein interactions
【24h】

Simple sequence-based kernels do not predict protein-protein interactions

机译:基于序列的简单内核无法预测蛋白质间相互作用

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: A number of methods have been reported that predict protein-protein interactions (PPIs) with high accuracy using only simple sequence-based features such as amino acid 3mer content. This is surprising, given that many protein interactions have high specificity that depends on detailed atomic recognition between physiochemically complementary surfaces. Are the reported high accuracies realistic?Results: We find that the reported accuracies of the predictions are significantly over-estimated, and strongly dependent on the structure of the training and testing datasets used. The choice of which protein pairs are deemed as non-interactions in the training data has a variable impact on the accuracy estimates, and the accuracies can be artificially inflated by a bias towards dominant samples in the positive data which result from the presence of hub proteins in the protein interaction network. To address this bias, we propose a positive set-specific method to create a 'balanced' negative set maintaining the degree distribution for each protein, leading to the conclusion that simple sequence-based features contain insufficient information to be useful for predicting PPIs, but that protein domain-based features have some predictive value.
机译:动机:据报道,许多方法仅使用简单的基于序列的特征(例如氨基酸3mer含量)即可高精度预测蛋白质-蛋白质相互作用(PPI)。鉴于许多蛋白质相互作用具有很高的特异性,这取决于生理化学互补表面之间的详细原子识别,这令人惊讶。结果:我们发现报告的预测准确性被严重高估了,并且在很大程度上取决于所使用的训练和测试数据集的结构。训练数据中哪些蛋白质对被认为是非相互作用的选择对准确性估计值产生可变影响,并且由于存在毂蛋白的存在,可以通过偏向阳性数据中的优势样本来人为地提高准确性在蛋白质相互作用网络中。为了解决这一偏见,我们提出了一种特定于阳性集的方法,以创建一个“平衡”阴性集,以维持每种蛋白质的程度分布,从而得出结论,即基于简单序列的特征所包含的信息不足以用于预测PPI,但是基于蛋白质结构域的功能具有一定的预测价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号