首页> 外文期刊>Bioinformatics >Fold recognition by combining profile-profile alignment and support vector machine
【24h】

Fold recognition by combining profile-profile alignment and support vector machine

机译:通过结合轮廓-轮廓对齐和支持向量机进行折叠识别

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Currently, the most accurate fold-recognition method is to perform profile-profile alignments and estimate the statistical significances of those alignments by calculating Z-score or E-value. Although this scheme is reliable in recognizing relatively close homologs related at the family level, it has difficulty in finding the remote homologs that are related at the superfamily or fold level.Results: In this paper, we present an alternative method to estimate the significance of the alignments. The alignment between a query protein and a template of length n in the fold library is transformed into a feature vector of length n + 1, which is then evaluated by support vector machine (SVM). The output from SVM is converted to a posterior probability that a query sequence is related to a template, given SVM output. Results show that a new method shows significantly better performance than PSI-BLAST and profile-profile alignment with Z-score scheme. While PSI-BLAST and Z-score scheme detect 16 and 20% of superfamily-related proteins, respectively, at 90% specificity, a new method detects 46% of these proteins, resulting in more than 2-fold increase in sensitivity. More significantly, at the fold level, a new method can detect 14% of remotely related proteins at 90% specificity, a remarkable result considering the fact that the other methods can detect almost none at the same level of specificity.
机译:动机:目前,最准确的折叠识别方法是执行轮廓-轮廓比对,并通过计算Z值或E值来估计这些比对的统计显着性。尽管该方案能够可靠地识别相对亲缘的家族同源物,但很难找到与超家族或倍数亲缘相关的远程同源物。结果:在本文中,我们提出了另一种方法来评估亲缘关系。对齐方式。折叠文库中查询蛋白和长度为n的模板之间的比对被转换为长度为n +1的特征向量,然后由支持向量机(SVM)对其进行评估。在给定SVM输出的情况下,将SVM的输出转换为查询序列与模板相关的后验概率。结果表明,新方法显示出比PSI-BLAST更好的性能,并且Z-score方案的轮廓与轮廓对齐。虽然PSI-BLAST和Z评分方案分别以90%的特异性检测到16%和20%的超家族相关蛋白,但一种新方法可检测到46%的这些蛋白,从而使灵敏度提高了2倍以上。更重要的是,在折叠水平上,一种新方法可以以90%的特异性检测到14%的远距离相关蛋白,考虑到其他方法在相同的特异性水平下几乎无法检测到这一事实,这一结果令人瞩目。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号