首页> 外文学位 >Bioinformatics methods for the analysis and interpretation of DNA and protein structure.
【24h】

Bioinformatics methods for the analysis and interpretation of DNA and protein structure.

机译:用于分析和解释DNA和蛋白质结构的生物信息学方法。

获取原文
获取原文并翻译 | 示例

摘要

This bioinformatics dissertation focuses on DNA and protein sequence analysis. We develop new sequence-based computational methods to investigate the structural or compositional properties of biological macromolecules.; We first develop a general framework for sequence analysis based on additive scales, structural or other. The framework addresses the following issues (1) construction of sequences with extremal properties; (2) quantitative evaluation of sequences with respect to a given genomic background; (3) automatic extraction of extremal sequences and profiles from genomic databases; (4) distribution and asymptotic behavior as the length N of the sequences increases; and (5) complete analysis of correlations between scales. The framework is applied to the analysis of DNA tandem repeats, using existing di- and tri-nucleotide scales that capture various aspects of DNA structure, including base stacking energy, propeller twist angle, protein deformability, bendability, and position preference. We derive exact expressions for counting the number of repeat-unit classes at all lengths. Tandem repeats are likely to result from a variety of different mechanisms, a fraction of which is likely to depend on profiles characterized by extreme structural features.; We then show that the genetic code generally allows for the superimposition of any DNA structural signal onto any protein-coding sequence, through amino acid substitution. Structural scales might thus usefully complement pure-sequence analysis in motif detection. Only punctual, loosely positioned signals can be freely superimposed to conserved amino acid sequences.; Using Markov models and genome-wide computations, we next measure and characterize the compositional symmetry observed between complementary DNA strands at orders 1–9. We establish the universality and variability range of strand symmetry. We show that symmetry emerges from the combined effects of a wide spectrum of mechanisms operating at multiple orders and length scales.; Lastly, we develop methods to identify and characterize an under-recognized form of interaction between protein chains, which is mediated by β-sheet formation and is central to healthy biological function and diseases ranging from AIDS and cancer to Alzheimer's and Huntington's diseases. We describe a database of such interchain β-sheet interactions within entries in the Protein Data Bank and corresponding likely macromolecules. An index quantifies the strength of the interactions.
机译:该生物信息学论文侧重于DNA和蛋白质序列分析。我们开发新的基于序列的计算方法,以研究生物大分子的结构或组成特性。我们首先开发一个基于加性,结构或其他尺度的序列分析的通用框架。该框架解决了以下问题(1)具有极端特性的序列的构建; (2)根据给定的基因组背景对序列进行定量评估; (3)自动从基因组数据库中提取末端序列和概况; (4)随着序列长度 N 的增加,其分布和渐近行为; (5)全面分析量表之间的相关性。该框架使用现有的二核苷酸和三核苷酸标度,可捕获DNA结构的各个方面,包括碱基堆积能量,螺旋桨扭转角,蛋白质可变形性,弯曲性和位置偏好,可用于分析DNA串联重复序列。我们得出精确的表达式,用于计算所有长度的重复单元类的数量。串联重复很可能是由多种不同的机制引起的,其中一部分可能取决于具有极端结构特征的谱。然后,我们表明遗传密码通常允许通过氨基酸取代将任何DNA结构信号叠加到任何蛋白质编码序列上。因此,结构标度可能会有用地补充基序检测中的纯序列分析。只有守时,位置宽松的信号才能自由地与保守的氨基酸序列叠加。使用马尔可夫模型和全基因组计算,我们接下来可以测量和表征互补DNA链之间以1–9阶观察到的组成对称性。我们建立了链对称性的通用性和可变性范围。我们表明,对称性是由多种机制在多个数量级和长度尺度上的综合作用产生的。最后,我们开发了鉴定和表征蛋白质链之间相互作用的形式的方法,该形式的识别方式不足,该形式由β-折叠形成,对健康的生物学功能和疾病(从艾滋病,癌症到老年痴呆症和亨廷顿氏病)至关重要。我们描述了蛋白质数据库中条目和相应可能的大分子中此类链间β-折叠相互作用的数据库。索引量化了交互的强度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号