首页> 外文期刊>Bioinformatics >How significant is a protein structure similarity with TM-score=0.5?
【24h】

How significant is a protein structure similarity with TM-score=0.5?

机译:TM-分数= 0.5的蛋白质​​结构相似性有多重要?

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Protein structure similarity is often measured by root mean squared deviation, global distance test score and template modeling score (TM-score). However, the scores themselves cannot provide information on how significant the structural similarity is. Also, it lacks a quantitative relation between the scores and conventional fold classifications. This article aims to answer two questions: (i) what is the statistical significance of TM-score? (ii) What is the probability of two proteins having the same fold given a specific TM-score?Results: We first made an all-to-all gapless structural match on 6684 non-homologous single-domain proteins in the PDB and found that the TM-scores follow an extreme value distribution. The data allow us to assign each TM-score a P-value that measures the chance of two randomly selected proteins obtaining an equal or higher TM-score. With a TM-score at 0.5, for instance, its P-value is 5.5x10(-7), which means we need to consider at least 1.8 million random protein pairs to acquire a TM-score of no less than 0.5. Second, we examine the posterior probability of the same fold proteins from three datasets SCOP, CATH and the consensus of SCOP and CATH. It is found that the posterior probability from different datasets has a similar rapid phase transition around TM-score = 0.5. This finding indicates that TM-score can be used as an approximate but quantitative criterion for protein topology classification, i. e. protein pairs with a TM-score >0.5 are mostly in the same fold while those with a TM-score <0.5 are mainly not in the same fold.
机译:动机:蛋白质结构相似性通常通过均方根偏差,整体距离测试得分和模板建模得分(TM-core)来衡量。但是,分数本身无法提供有关结构相似性的重要性的信息。而且,它在得分和常规的折叠分类之间缺乏定量关系。本文旨在回答两个问题:(i)TM得分的统计意义是什么? (ii)在给定特定TM分数的情况下,两种蛋白质具有相同倍数的概率是多少?结果:我们首先对PDB中的6684个非同源单域蛋白质进行了全部到全部的无间隙结构匹配,结果发现TM分数遵循极值分布。数据使我们可以为每个TM得分分配一个P值,该P值测量两种随机选择的蛋白质获得相等或更高TM得分的机会。例如,以TM得分为0.5时,其P值为5.5x10(-7),这意味着我们需要考虑至少180万随机蛋白质对以获得不小于0.5的TM得分。其次,我们检查了来自三个数据集SCOP,CATH以及SCOP和CATH共识的相同折叠蛋白的后验概率。发现来自不同数据集的后验概率在TM分数= 0.5附近具有相似的快速相变。该发现表明TM得分可以用作蛋白质拓扑分类的近似但定量标准。 e。 TM得分> 0.5的蛋白质​​对多数处于同一折叠,而TM得分<0.5的蛋白质​​对多数不在同一折叠中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号