首页> 外文学位 >Analysis of the frequencies of short DNA subsequences in bacterial genomes.

【24h】

Analysis of the frequencies of short DNA subsequences in bacterial genomes.

机译：细菌基因组中短DNA子序列的频率分析。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation is a comprehensive study of the statistical properties of short nucleic acid subsequences found in bacterial genomes. This work revealed that a correlation exists between the frequency a short DNA subsequence has in bacterial genomes and the total number of its one-mismatch (substitution only) combinations. Moreover, the correlations are independent across the full range of G+C content observed in the bacterial genomic sequences studied. This has profound implications for the evolutionary dynamics of bacteria, implying similar rates of replication and mutation within these genomes. The pattern of consistent correlations is not reproduced by sequences simulated from a zero-order Markov model.;A group of mostly intracellular organisms was found to have the presence of multiple distinct clusters in their subsequence frequency versus one-mismatch neighbor scatter plots; this was not observed in the genomes of other bacteria having similar length and G+C content. The clustering was found to be an effect of lower total variance in the high-order transition matrix describing the sequence. This observation implies that the genomes of intracellular bacteria are more constrained than that of free-living bacteria.;Two frameworks for the generation of simulated genomes have been presented, both based on third-order Markov models. Markov models parameterized by a particular genome recreate the general shape for subsequence frequency versus one-mismatch neighbor plots at the third-order. A method for generating de novo Markov models capable of synthesizing a sequence having similar statistical properties to that of a bacterial genome was developed. The model was parameterized only by desired sequence length, G+C content and variance. It was observed that a relatively small amount of variance added to the third-order transition matrix describing a zero-order Markov process can generate a sequence with statistical properties similar to a bacterial genomic sequence with the same length and G+C content. Variations between the simulated and genomic sequences, such as outliers observed in the subsequence frequency versus one-mismatch neighbor plots for bacterial genomes but not reproduced by simulated sequences, highlight several subsequences of well-known biological interest. This novel sequence model can serve as a better null-hypothesis and variations from such can be considered as possible features of biological significance.

机译：本文对细菌基因组中短核酸亚序列的统计特性进行了全面的研究。这项工作表明，细菌基因组中短的DNA子序列的频率与其一次不匹配（仅取代）组合的总数之间存在相关性。此外，相关性在研究的细菌基因组序列中观察到的G + C含量的整个范围内都是独立的。这对细菌的进化动力学具有深远的影响，意味着这些基因组内的复制和突变率相似。零序马尔可夫模型模拟的序列不能重现一致的相关模式。一组大多数细胞内生物被发现在其子序列频率相对于一个不匹配的邻居散点图中存在多个不同的簇；在其他具有相似长度和G + C含量的细菌的基因组中未观察到此现象。发现聚类是描述序列的高阶转换矩阵中较低的总方差的影响。该观察结果暗示细胞内细菌的基因组比自由生活细菌的基因组受到更多的约束。提出了两个用于生成模拟基因组的框架，这两个框架均基于三阶马尔可夫模型。由特定基因组参数化的马尔可夫模型重建了子序列频率与三阶不匹配邻居图的一般形状。开发了一种用于产生能够合成具有与细菌基因组相似的统计特性的序列的从头马尔可夫模型的方法。仅通过所需序列长度，G + C含量和差异对模型进行参数化。观察到，添加到描述零阶马尔可夫过程的三阶过渡矩阵的相对少量的方差可以生成具有与具有相同长度和G + C含量的细菌基因组序列相似的统计特性的序列。模拟序列和基因组序列之间的差异（例如，在子序列频率中观察到的离群值与细菌基因组的一个不匹配邻居图相比，但未通过模拟序列再现的异常值）突出了一些众所周知的生物学意义的子序列。这种新颖的序列模型可以作为更好的零假设，并且由此产生的变异可被视为具有生物学意义的可能特征。

著录项

作者
Skewes, Aaron D.;
展开▼
作者单位

University of Houston.;

展开▼
授予单位 University of Houston.;
学科 Engineering Electronics and Electrical.;Computer Science.;Biology Bioinformatics.
学位 Ph.D.
年度 2009
页码 91 p.
总页数 91
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Distribution and evolution of short tandem repeats in closely related bacterial genomes. [J] . Kassai Jager E, Ortutay C, Toth G Gene: An International Journal Focusing on Gene Cloning and Gene Structure and Function . 2008,第1期

机译：短串联重复序列在紧密相关的细菌基因组中的分布和进化。
2. Short read fragment assembly of bacterial genomes. [J] . Chaisson MJ, Pevzner PA Genome research . 2008,第2期

机译：细菌基因组的短读片段组装。
3. Frequency of formation of chimeric molecules as a consequence of PCR coamplification of 16S rRNA genes from mixed bacterial genomes. [J] . G C Wang, Y Wang Applied and Environmental Microbiology . 1997,第12期

机译：PCR共扩增来自混合细菌基因组的16S rRNA基因的嵌合分子形成频率。
4. Machine Learning Applications to DNA Subsequence and Restriction Site Analysis [C] . E. Moyer, A. Das IEEE Signal Processing in Medicine and Biology Symposium . 2020

机译：机器学习应用于DNA后续和限制性地点分析
5. Scaling short read de novo DNA sequence assembly to gigabase genomes. [D] . Cook, Jeffrey J. 2011

机译：将短读从头DNA序列组装扩展到gigabase基因组。
6. Frequency of formation of chimeric molecules as a consequence of PCR coamplification of 16S rRNA genes from mixed bacterial genomes. [O] . G C Wang, Y Wang 1997

机译：PCR共扩增来自混合细菌基因组的16S rRNA基因的嵌合分子形成频率。
7. Short, interspersed repetitive DNA sequences in prokaryotic genomes. [O] . Lupski, J R, Weinstock, G M 1992

机译：原核生物基因组中短而散布的重复DNA序列。

Analysis of the frequencies of short DNA subsequences in bacterial genomes.

摘要

著录项

相似文献

相关主题

期刊订阅