首页> 外文学位 >Analysis of the frequencies of short DNA subsequences in bacterial genomes.
【24h】

Analysis of the frequencies of short DNA subsequences in bacterial genomes.

机译:细菌基因组中短DNA子序列的频率分析。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation is a comprehensive study of the statistical properties of short nucleic acid subsequences found in bacterial genomes. This work revealed that a correlation exists between the frequency a short DNA subsequence has in bacterial genomes and the total number of its one-mismatch (substitution only) combinations. Moreover, the correlations are independent across the full range of G+C content observed in the bacterial genomic sequences studied. This has profound implications for the evolutionary dynamics of bacteria, implying similar rates of replication and mutation within these genomes. The pattern of consistent correlations is not reproduced by sequences simulated from a zero-order Markov model.;A group of mostly intracellular organisms was found to have the presence of multiple distinct clusters in their subsequence frequency versus one-mismatch neighbor scatter plots; this was not observed in the genomes of other bacteria having similar length and G+C content. The clustering was found to be an effect of lower total variance in the high-order transition matrix describing the sequence. This observation implies that the genomes of intracellular bacteria are more constrained than that of free-living bacteria.;Two frameworks for the generation of simulated genomes have been presented, both based on third-order Markov models. Markov models parameterized by a particular genome recreate the general shape for subsequence frequency versus one-mismatch neighbor plots at the third-order. A method for generating de novo Markov models capable of synthesizing a sequence having similar statistical properties to that of a bacterial genome was developed. The model was parameterized only by desired sequence length, G+C content and variance. It was observed that a relatively small amount of variance added to the third-order transition matrix describing a zero-order Markov process can generate a sequence with statistical properties similar to a bacterial genomic sequence with the same length and G+C content. Variations between the simulated and genomic sequences, such as outliers observed in the subsequence frequency versus one-mismatch neighbor plots for bacterial genomes but not reproduced by simulated sequences, highlight several subsequences of well-known biological interest. This novel sequence model can serve as a better null-hypothesis and variations from such can be considered as possible features of biological significance.
机译:本文对细菌基因组中短核酸亚序列的统计特性进行了全面的研究。这项工作表明,细菌基因组中短的DNA子序列的频率与其一次不匹配(仅取代)组合的总数之间存在相关性。此外,相关性在研究的细菌基因组序列中观察到的G + C含量的整个范围内都是独立的。这对细菌的进化动力学具有深远的影响,意味着这些基因组内的复制和突变率相似。零序马尔可夫模型模拟的序列不能重现一致的相关模式。一组大多数细胞内生物被发现在其子序列频率相对于一个不匹配的邻居散点图中存在多个不同的簇;在其他具有相似长度和G + C含量的细菌的基因组中未观察到此现象。发现聚类是描述序列的高阶转换矩阵中较低的总方差的影响。该观察结果暗示细胞内细菌的基因组比自由生活细菌的基因组受到更多的约束。提出了两个用于生成模拟基因组的框架,这两个框架均基于三阶马尔可夫模型。由特定基因组参数化的马尔可夫模型重建了子序列频率与三阶不匹配邻居图的一般形状。开发了一种用于产生能够合成具有与细菌基因组相似的统计特性的序列的从头马尔可夫模型的方法。仅通过所需序列长度,G + C含量和差异对模型进行参数化。观察到,添加到描述零阶马尔可夫过程的三阶过渡矩阵的相对少量的方差可以生成具有与具有相同长度和G + C含量的细菌基因组序列相似的统计特性的序列。模拟序列和基因组序列之间的差异(例如,在子序列频率中观察到的离群值与细菌基因组的一个不匹配邻居图相比,但未通过模拟序列再现的异常值)突出了一些众所周知的生物学意义的子序列。这种新颖的序列模型可以作为更好的零假设,并且由此产生的变异可被视为具有生物学意义的可能特征。

著录项

  • 作者

    Skewes, Aaron D.;

  • 作者单位

    University of Houston.;

  • 授予单位 University of Houston.;
  • 学科 Engineering Electronics and Electrical.;Computer Science.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 91 p.
  • 总页数 91
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号