首页> 美国卫生研究院文献>other >Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners
【2h】

Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners

机译:快速准确的蛋白质家族多元高斯建模:预测残基接触和蛋白质相互作用伙伴

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website .
机译:在进化过程中,蛋白质显示出其三维结构和生物学功能的显着保守性,从而导致同源蛋白质之间序列变异性受到强烈的进化限制。我们的方法旨在从快速积累的序列数据中提取此类约束条件,从而仅从序列信息中推断蛋白质的结构和功能。近来,全局统计推断方法(例如直接耦合分析,稀疏逆协方差估计)朝着该目标取得了突破,并且其预测已成功地应用于三级和四级蛋白质结构预测方法中。但是,由于基本变量(氨基酸)的离散性质,精确的推论需要蛋白质长度的指数时间,因此对于实际应用需要有效的近似值。在这里,我们提出了一种非常有效的多元高斯建模方法,作为直接耦合分析的一种变体:离散氨基酸变量被连续高斯随机变量代替。由此产生的统计推断问题可以有效而准确地解决。我们表明,通过直接耦合分析可以得出,推论的质量可与通过均值场近似实现的与离散变量推论相比可比或更高。对于(i)预测蛋白质中的残基-残基接触,以及(ii)鉴定细菌信号转导中的蛋白质-蛋白质相互作用伴侣,这是正确的。在网站上可以找到我们的多元高斯方法的实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号