...
首页> 外文期刊>Bioinformatics >Applying the Naive Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites
【24h】

Applying the Naive Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites

机译:将朴素贝叶斯分类器与核密度估计一起应用到蛋白质-蛋白质相互作用位点的预测中

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: The limited availability of protein structures often restricts the functional annotation of proteins and the identification of their protein-protein interaction sites. Computational methods to identify interaction sites from protein sequences alone are, therefore, required for unraveling the functions of many proteins. This article describes a new method (PSIVER) to predict interaction sites, i.e. residues binding to other proteins, in protein sequences. Only sequence features (position-specific scoring matrix and predicted accessibility) are used for training a Naive Bayes classifier (NBC), and conditional probabilities of each sequence feature are estimated using a kernel density estimation method (KDE).Results: The leave-one out cross-validation of PSIVER achieved a Matthews correlation coefficient (MCC) of 0.151, an F-measure of 35.3%, a precision of 30.6% and a recall of 41.6% on a non-redundant set of 186 protein sequences extracted from 105 heterodimers in the Protein Data Bank (consisting of 36 219 residues, of which 15.2% were known interface residues). Even though the dataset used for training was highly imbalanced, a randomization test demonstrated that the proposed method managed to avoid overfitting. PSIVER was also tested on 72 sequences not used in training (consisting of 18 140 residues, of which 10.6% were known interface residues), and achieved an MCC of 0.135, an F-measure of 31.5%, a precision of 25.0% and a recall of 46.5%, outperforming other publicly available servers tested on the same dataset. PSIVER enables experimental biologists to identify potential interface residues in unknown proteins from sequence information alone, and to mutate those residues selectively in order to unravel protein functions.
机译:动机:有限的蛋白质结构可用性通常会限制蛋白质的功能注释及其蛋白质-蛋白质相互作用位点的识别。因此,需要计算机方法来单独从蛋白质序列中识别相互作用位点,以揭示许多蛋白质的功能。本文介绍了一种预测蛋白质序列中相互作用位点(即与其他蛋白质结合的残基)的新方法(PSIVER)。仅使用序列特征(特定于位置的得分矩阵和预测的可访问性)来训练朴素贝叶斯分类器(NBC),并且使用核密度估计方法(KDE)估计每个序列特征的条件概率。 PSIVER的交叉验证获得了从105种异二聚体中提取的186个蛋白质序列的非冗余集,其Matthews相关系数(MCC)为0.151,F值为35.3%,精度为30.6%,召回率为41.6%。在蛋白质数据库中(由36 219个残基组成,其中15.2%是已知的界面残基)。即使用于训练的数据集高度不平衡,但随机测试表明所提出的方法能够避免过拟合。还对未用于训练的72个序列(由18140个残基组成,其中10.6%是已知的界面残基)进行了PSIVER的测试,MCC为0.135,F值为31.5%,精度为25.0%,召回率为46.5%,胜过在同一数据集中测试的其他公共可用服务器。 PSIVER使实验生物学家能够仅从序列信息中识别未知蛋白质中的潜在界面残基,并选择性地突变这些残基以揭示蛋白质功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号