首页> 外文OA文献 >Bayesian statistical approach for protein residue-residue contact Prediction
【2h】

Bayesian statistical approach for protein residue-residue contact Prediction

机译:蛋白质残基 - 残基接触的贝叶斯统计方法预测

摘要

Despite continuous efforts in automating experimental structure determination and systematic target selection in structural genomics projects, the gap between the number of known amino acid sequences and solved 3D structures for proteins is constantly widening. While DNA sequencing technologies are advancing at an extraordinary pace, thereby constantly increasing throughput while at the same time reducing costs, protein structure determination is still labour intensive, time-consuming and expensive. This trend illustrates the essential importance of complementary computational approaches in order to bridge the so-called sequence-structure gap.ududAbout half of the protein families lack structural annotation and therefore are not amenable to techniques that infer protein structure from homologs. These protein families can be addressed by de novo structure prediction approaches that in practice are often limited by the immense computational costs required to search the conformational space for the lowest-energy conformation. Improved predictions of contacts between amino acid residues have been demonstrated to sufficiently constrain the overall protein fold and thereby extend the applicability of de novo methods to larger proteins. Residue-residue contact prediction is based on the idea that selection pressure on protein structure and function can lead to compensatory mutations between spatially close residues. This leaves an echo of correlation signatures that can be traced down from the evolutionary record. Despite the success of contact prediction methods, there are several challenges. The most evident limitation lies in the requirement of deep alignments, which excludes the majority of protein families without associated structural information that are the focus for contact guided de novo structure prediction. The heuristics applied by current contact prediction methods pose another challenge, since they omit available coevolutionary information.ududThis work presents two different approaches for addressing the limitations of contact prediction methods. Instead of inferring evolutionary couplings by maximizing the pseudo-likelihood, I maximize the full likelihood of the statistical model for protein sequence families. This approach performed with comparable precision up to minor improvements over the pseudo-likelihood methods for protein families with few homologous sequences. A Bayesian statistical approach has been developed that provides posterior probability estimates for residue-residue contacts and eradicates the use of heuristics. The full information of coevolutionary signatures is exploited by explicitly modelling the distribution of statistical couplings that reflects the nature of residue-residue interactions. Surprisingly, the posterior probabilities do not directly translate into more precise predictions than obtained by pseudo-likelihood methods combined with prior knowledge. However, the Bayesian framework offers a statistically clean and theoretically solid treatment for the contact prediction problem. This flexible and transparent framework provides a convenient starting point for further developments, such as integrating more complex prior knowledge. The model can also easily be extended towards the Derivation of probability estimates for residue-residue distances to enhance the precision of predicted structures.
机译:尽管在结构基因组学计划中不断进行自动化的实验结构确定和系统的靶标选择方面的不断努力,但是已知氨基酸序列的数量与蛋白质的已解析3D结构之间的差距仍在不断扩大。尽管DNA测序技术以惊人的速度发展,从而在不断提高产量的同时降低了成本,但蛋白质结构的确定仍然需要大量劳动,费时且昂贵。这种趋势说明了互补计算方法对于弥合所谓的序列结构缺口的至关重要性。 ud ud大约一半的蛋白质家族缺乏结构注释,因此不适合从同源物推断蛋白质结构的技术。这些蛋白质家族可以通过从头结构预测方法解决,在实践中,该方法通常受到搜索构象空间以寻找最低能量构象所需的巨大计算成本的限制。氨基酸残基之间的接触的改进的预测已被证明足以约束整个蛋白质的折叠,从而将从头方法的适用性扩展到更大的蛋白质。残基-残基接触预测基于这样的思想,即蛋白质结构和功能上的选择压力可能导致空间上接近的残基之间发生补偿性突变。这留下了可以从进化记录中追溯到的相关签名的回声。尽管接触预测方法取得了成功,但仍存在一些挑战。最明显的局限性在于对深度比对的需求,该比对排除了没有相关结构信息的大多数蛋白质家族,而这些结构信息是接触指导的从头进行结构预测的重点。当前的接触预测方法所应用的启发式方法提出了另一个挑战,因为它们忽略了可用的协同进化信息。 ud ud这项工作提出了两种不同的方法来解决接触预测方法的局限性。我没有通过最大化伪似然来推断进化耦合,而是最大化了蛋白质序列家族统计模型的全部可能性。对于具有很少同源序列的蛋白质家族,该方法以相当的精度执行,与伪似然方法相比略有改进。已经开发出一种贝叶斯统计方法,该方法提供了残留物-残留物接触的后验概率估计并消除了启发式方法的使用。通过明确建模反映残基-残基相互作用性质的统计耦合分布,可以利用协同进化签名的全部信息。出人意料的是,与通过伪似然方法结合先验知识获得的后验概率相比,后验概率没有直接转化为更精确的预测。但是,贝叶斯框架为接触预测问题提供了统计上干净且理论上可靠的处理方法。这种灵活透明的框架为进一步发展提供了便利的起点,例如集成更复杂的先验知识。该模型还可以轻松地扩展到残基-残基距离的概率估计的推导,以提高预测结构的精度。

著录项

  • 作者

    Vorberg Susann;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号