首页> 外文学位 >Using structure to explore the sequence alignment space of remote homologs.
【24h】

Using structure to explore the sequence alignment space of remote homologs.

机译:利用结构探索远程同源物的序列比对空间。

获取原文
获取原文并翻译 | 示例

摘要

The success of protein structure modeling by homology requires an accurate sequence alignment between the query sequence and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that would produce the best structural model is generally not optimal, in the sense of having the highest DP score. Suboptimal alignment methods can be used to generate alternative alignments, but encounter difficulties given the enormous number of alignments that need to be considered. We present here a new suboptimal alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements (SSEs) and combining high-scoring fragments that pass basic tests for 'modelability', we can generate accurate alignments within a set of limited size.;Chapter 1 introduces the field of protein structure prediction in general and the technique of homology modeling in particular. One subproblem of homology modeling—the sequence to structure alignment of proteins — is discussed in Chapter 2. Particular attention is given to descriptions of the size, density and redundancy of alignment space as well as an explanation of the dynamic programming technique and its strengths and weaknesses. The rationale for developing alternative alignment techniques and the unique difficulties of these methods are also discussed.;Chapter 3 explains the methodologies of S4 — the alternative alignment program we developed that is the main focus of this thesis. The process of finding alternative alignments with S4 involves several steps, but can be roughly divided into two main parts. First, the program looks for combinations of high-similarity fragments that pass basic rules for modelability. These 'fragment alignments' define regions of alignment space that can be searched more thoroughly with a statistical potential for a single representative for that region. The ensemble of alignments that is thus created needs to be evaluated for accuracy against the correct alignment. Current methods for doing so, as well as adjustments to those methods to better suit the realm of remote homology alignments, are discussed in Chapter 4. A novel measure for determining similarity between alignments, termed the inter-alignment distance (IAD) also is developed. This measure can be used to assess quality, but is also well-suited to finding redundant alignments within an ensemble.;In Chapter 5, the results of testing S4 on a large set of targets from previous CASP experiments are analyzed. Comparisons to the optimal alignment as well as two standard alternative alignment methods, all of which use the same similarity score as S4, demonstrate that S4's improvement in accuracy is due to better sampling and filtering rather than more sophisticated scoring. Models made from S4 alignments are also shown to significantly improve upon those made from optimal alignments, especially for remote homologs. Finally, an example of a sequence to structure alignment offers an in depth explanation of how S4 finds correct alignments where the other methods do not.;Chapter 6 describes a set of three experiments that paired S4 with the model evaluation tool ProsaII in a homology modeling pipeline. There were two primary objectives in this project. First, we wanted to test different methods for finding remote homologs that could serve as input to S4. And second, we evaluated the use of ProsaII as a method for discriminating between good and bad models, and thus also between homologous and non-homologous templates. The first two experiments are essentially blind searches for homologous sequences and structures. The third experiment takes remote templates returned by PSI-BLAST and uses S4 and ProsaII to find alignments and determine whether the template is a structural homolog. While S4 was able to find homologs in the blind searches, the alignment/model quality and level of discrimination was found to be higher when the input to the pipeline came from a set of structures produced by a template selection method.;Finally, Chapter 7 discusses the consequences of this research and suggests future directions for its application.
机译:通过同源性成功进行蛋白质结构建模需要在查询序列及其结构模板之间进行准确的序列比对。但是,基于动态编程(DP)的序列比对方法通常无法为远程序列同源物生成准确的比对,从而限制了建模方法的适用性。一个中心问题是,就具有最高DP得分的意义而言,将产生最佳结构模型的比对通常不是最佳的。次优比对方法可用于生成替代比对,但由于需要考虑大量比对,因此会遇到困难。我们在这里提出了一种新的次优比对方法,该方法在很大程度上依赖于模板的结构。通过首先将查询序列与二级结构元素(SSE)中的各个片段对齐,并结合通过基本测试的“可建模性”的高分片段,我们可以在一组有限的大小内生成准确的对齐方式。第1章介绍了蛋白质结构预测,尤其是同源性建模技术。同源性建模的一个子问题-蛋白质的结构比对序列-在第2章中进行了讨论。特别注意比对空间的大小,密度和冗余的描述,以及对动态编程技术及其优势的说明。弱点。还讨论了开发替代对准技术的原理以及这些方法的独特困难。;第三章解释了S4的方法论-我们开发的替代对准程序是本文的重点。寻找与S4的替代比对的过程涉及几个步骤,但可以大致分为两个主要部分。首先,程序寻找通过相似性基本规则的高相似性片段的组合。这些“片段比对”定义了比对空间的区域,可以使用该区域的单个代表的统计潜力进行更彻底的搜索。因此,需要针对正确的对齐方式评估所创建的对齐方式的准确性。第4章中讨论了当前的方法以及对这些方法的调整,以更好地适应远程同源性比对的领域。还开发了一种用于确定比对之间相似性的新方法,称为比对间距离(IAD)。 。此度量可用于评估质量,但也非常适合在整体中查找多余的比对。在第5章中,分析了从以前的CASP实验中在大量目标上测试S4的结果。与最佳对齐方式以及两种标准替代对齐方式(它们都使用与S4相同的相似性评分)的比较表明,S4准确性的提高归因于更好的采样和过滤,而不是更复杂的评分。还显示了由S4比对制成的模型相对于由最佳比对制成的模型有显着改进,尤其是对于远程同源物。最后,一个序列到结构比对的示例提供了S4如何找到其他方法无法找到的正确比对的深入解释;第6章描述了三个实验的集合,这些实验在同源性建模中将S4与模型评估工具ProsaII配对管道。该项目有两个主要目标。首先,我们想测试不同的方法来寻找可以作为S4输入的远程同源物。其次,我们评估了ProsaII作为区分优劣模型,以及区分同源模板和非同源模板的方法的使用。前两个实验本质上是盲目搜索同源序列和结构。第三个实验采用PSI-BLAST返回的远程模板,并使用S4和ProsaII查找比对并确定模板是否为结构同源物。虽然S4能够在盲目搜索中找到同源物,但是当流水线的输入来自通过模板选择方法生成的一组结构时,对齐/模型质量和辨别水平会更高。讨论了这项研究的结果,并提出了应用前景。

著录项

  • 作者

    Kuziemko, Andrew Stephen.;

  • 作者单位

    Columbia University.;

  • 授予单位 Columbia University.;
  • 学科 Biophysics General.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 144 p.
  • 总页数 144
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号