【24h】

The evolutionary capacity of protein structures

机译:蛋白质结构的进化能力

获取原文

摘要

In nature, one finds large collections of different protein sequences exhibiting roughly the same three-dimensional structure, and this observation underpins the study of structural protein families. In studying such families at a global level, a natural question to ask is how close to "optimal" the native sequences are in terms of their energy. We therefore define and compute the evolutionary capacity of a protein structure as the total number of sequences whose energy in the structure is below that of the native sequence. An important aspect of our definition is that we consider the space of all possible protein sequences, i.e. the exponentially large set of all strings over the 20-letter amino acid alphabet, rather than just the set of sequences found in nature.In order to make our approach computationally feasible, we develop randomized algorithms that perform approximate enumeration in sequence space with provable performance guarantees. We draw on the area of rapidly mixing Markov chains, by exhibiting a connection between the evolutionary capacity of proteins and the number of feasible solutions to the Knapsack problem. This connection allows us to design an algorithm for approximating the evolutionary capacity, extending a recent result of Morris and Sinclair on the Knapsack problem. We present computational experiments that show the method to be effective in practice on large collections of protein structures. In addition, we show how to use approximations to the evolutionary capacity to compute a statistical mechanics notion of "evolutionary temperature" on sequence space.
机译:在自然界中,人们发现了大批不同的蛋白质序列,它们具有大致相同的三维结构,这一发现为结构蛋白质家族的研究奠定了基础。在全球范围内研究此类家庭时,自然要问的一个问题是,天然序列在能量方面有多接近“最佳”。因此,我们将蛋白质结构的进化能力定义并计算为结构中能量低于天然序列能量的序列总数。定义的一个重要方面是,我们考虑所有所有可能的蛋白质序列的空间,即20个字母的氨基酸字母上所有字符串的指数级集合,而不仅仅是找到的序列集为了使我们的方法在计算上可行,我们开发了随机算法,该算法在序列空间中执行近似枚举,并提供可证明的性能保证。通过展示蛋白质的进化能力与背包问题的可行解的数量之间的联系,我们利用了快速混合马尔可夫链的领域。这种联系使我们能够设计一种近似进化能力的算法,从而扩展了Morris和Sinclair在背包问题上的最新成果。我们目前的计算实验表明该方法在实践中对大量蛋白质结构的收集是有效的。此外,我们展示了如何使用进化能力的近似值来计算序列空间上“进化温度”的统计力学概念。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号