首页> 外文OA文献 >The development of machine learning based software for predicting protein-protein interactions and protein function from protein primary structure
【2h】

The development of machine learning based software for predicting protein-protein interactions and protein function from protein primary structure

机译:基于机器学习的软件开发,可从蛋白质一级结构预测蛋白质-蛋白质相互作用和蛋白质功能

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Understanding proteins functions is a major goal in the post-genomic era. Proteins usually work in context of other proteins and rarely function alone. Therefore, it is highly relevant to study the interaction partners of a protein in order to understand its function. For this reason, the main objective of this thesis is to predict protein-protein interactions based only on protein primary structure. Using the Support Vector Machines (SVM), different protein features have been studied and examined. These features include protein domain structures, hydrophobicity and amino acid compositions. The results imply that the protein domain structure is the most informative feature for predicting protein-protein interactions. It also requires much lower running time compared to the other features. However, using normal binary SVM requires positive and negative data samples. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Previous researches cope with this problem by artificially generate random set of proteins pairs that are not listed in the Database of Interacting Proteins (DIP) as negative examples. This approach can be used for comparing features because the error will be uniform. In this research, we consider this problem as a one-class classification problem and solve it using the One-Class SVM. Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples. Finally, a Bayesian Kernel for SVM was implemented to incorporate the probabilistic information about protein-protein interactions that were compiled from different sources. The probabilistic output from the Bayesian Kernel can assist the biologist to conduct more research on the highly predicted interactions.
机译:了解蛋白质功能是后基因组时代的主要目标。蛋白质通常在其他蛋白质的背景下起作用,很少单独发挥作用。因此,研究蛋白质的相互作用伴侣以了解其功能非常重要。因此,本论文的主要目的是仅基于蛋白质一级结构预测蛋白质-蛋白质相互作用。使用支持向量机(SVM),已经研究和检查了不同的蛋白质特征。这些特征包括蛋白质结构域结构,疏水性和氨基酸组成。结果暗示蛋白质结构域结构是预测蛋白质-蛋白质相互作用的最有用的特征。与其他功能相比,它还需要更少的运行时间。但是,使用普通的二进制SVM需要正负数据样本。尽管很容易获得相互作用蛋白的数据集作为阳性实例,但尚无实验证实的非相互作用蛋白被视为阴性实例。以前的研究通过人为产生随机的一组蛋白质对来解决此问题,这些蛋白质对在负面蛋白质数据库(DIP)中未列出作为阴性示例。该方法可用于比较特征,因为误差将是一致的。在本研究中,我们将此问题视为一类分类问题,并使用一类SVM解决。在训练阶段仅使用阳性示例(相互作用的蛋白质对),一类SVM即可达到80%的准确性。这些结果暗示,可以使用一类分类器以与使用人工构建的阴性实例的二元分类器相当的准确性预测蛋白质-蛋白质相互作用。最终,实现了用于SVM的贝叶斯内核,以合并有关从不同来源收集的蛋白质间相互作用的概率信息。贝叶斯核的概率输出可以帮助生物学家对高度预测的相互作用进行更多的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号