首页> 外文会议>International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics >Prediction of the Bonding State of Cysteine Residues in Proteins with Machine-Learning Methods
【24h】

Prediction of the Bonding State of Cysteine Residues in Proteins with Machine-Learning Methods

机译:用机器学习方法预测蛋白质中半胱氨酸残基的粘合状态

获取原文

摘要

In this paper we evaluate the performance of machine learning methods in the task of predicting the bonding state of cysteines starting from protein sequences. This task is the first step for the identification of disulfide bonds in proteins. We score the performance of three different approaches: 1) Hidden Support Vector Machines (HSVMs) which integrate the SVM predictions with a Hidden Markov Model; 2) SVM-HMMs which discriminatively train models that are isomorphic to a kth-order hidden Markov model; 3) Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) that we recently introduced. We evaluate two different encoding schemes based on sequence profile and position specific scoring matrix (PSSM) as computed with the PSI-BLAST program and we show that when the evolutionary information is encoded with PSSM all the methods perform better than with sequence profile. Among the different methods it appears that GRHCRFs perform slightly better than the others achieving a per protein accuracy of 87% with a Matthews correlation coefficient (C) of 0.73. Finally, we investigate the difference between disulfide bonding state predictions in Eukaryotes and Prokaryotes. Our analysis shows that the per-protein accuracy in Prokaryotic proteins is higher than that in Eukaryotes (0.88 vs 0.83). However, given the paucity of bonded cysteines in Prokaryotes as compared to Eukaryotes the Matthews correlation coefficient is drastically reduced (0.48 vs 0.80).
机译:本文中,我们评估了从蛋白质序列开始预测半胱氨酸键合状态的任务中的机器学习方法的性能。该任务是蛋白质中鉴定二硫键的第一步。我们得分三种不同方法的性能:1)隐藏的支持向量机(HSVM)与隐藏的马尔可夫模型集成了SVM预测; 2)SVM-HMMS,差异地训练模型,这些模型是纯粹的隐马尔可夫模型的同性; 3)我们最近引入的语法限制隐藏条件随机字段(GRHCRF)。我们基于序列轮廓和位置特定评分矩阵(PSSM)评估两种不同的编码方案,如PSI-BLAST程序所计算的,并且我们表明当使用PSSM编码进化信息时,所有方法都比序列配置更好。在不同的方法中,似乎GRHCRFS的表现略微好于其他蛋白质精度为87%的其他方法,其Matthews相关系数(C)为0.73。最后,我们探讨了真核生物和原核生物中二硫键粘合状态预测的差异。我们的分析表明,原核蛋白质中的每蛋白质精度高于真核生物(0.88 Vs 0.83)。然而,在与真核生物相比,考虑到原核生物中的粘合半胱氨酸的缺乏,马修斯相关系数急剧下降(0.48 Vs 0.80)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号