...
首页> 外文期刊>Proteomics >Effect of training datasets on support vector machine prediction of protein-protein interactions
【24h】

Effect of training datasets on support vector machine prediction of protein-protein interactions

机译:训练数据集对支持向量机预测蛋白质相互作用的影响

获取原文
获取原文并翻译 | 示例
           

摘要

Knowledge of protein-protein interaction is useful for elucidating protein function via the concept of 'guilt-by-association'. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein-protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455-460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein-protein interaction prediction.
机译:蛋白质间相互作用的知识对于通过“内gui关联”的概念阐明蛋白质功能是有用的。最近,人们探索了一种统计学习方法,即支持向量机(SVM),用于使用人工改组的序列作为假设的非相互作用蛋白来预测蛋白质-蛋白质相互作用,并显示出令人鼓舞的结果(Bock,JR,Gough,DA,Bioinformatics 2001, 17、455-460)。然而,尚不清楚的是,如果使用真实的蛋白质序列表示非相互作用的蛋白质,预测精度将受到怎样的影响。在这项工作中,通过比较使用真实蛋白质序列和使用改组序列所获得的结果来评估这种效果。假设的非相互作用蛋白的真实蛋白序列是通过排除分析结合相互作用蛋白数据库中发现的相互作用蛋白的亚细胞定位信息生成的。使用真实蛋白质序列的预测准确性为76.9%,而使用人工改组序列的预测准确性为94.1%。差异可能是由于从两组人工蛋白质序列中分离出一组真实蛋白质序列的难度要比从一组人工序列中分离出一组真实蛋白质序列的难度更高。在实际情况下,使用真实蛋白质序列训练SVM分类系统有望提供更好的预测结果。通过使用两个SVM系统预测一组与硫氧还蛋白相关的蛋白质的推定蛋白质伴侣进行了测试。预测结果与观察结果一致,表明真实序列在支持蛋白质-蛋白质相互作用预测的SVM分类系统的开发中更实用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号