首页> 外文学位 >Decoding the multifactorial nature of mutation rate variation in the human genome using computational and statistical approaches.
【24h】

Decoding the multifactorial nature of mutation rate variation in the human genome using computational and statistical approaches.

机译:使用计算和统计方法,对人类基因组中突变率变化的多因素性质进行解码。

获取原文
获取原文并翻译 | 示例

摘要

Whole genome sequencing and resequencing projects have provided a rich source for studying mutations. There is now substantial evidence indicating regional variation and co-variation of rates of nucleotide substitutions, insertions, deletions, and repeat number alterations of microsatellites in the human genome. This knowledge has advanced our understanding of mutagenesis and has proved to be a useful resource for mining the functional genomics landscape. Despite this, a thorough characterization (structure, causes, geography, and implications) of mutation rate variation and co-variation is lacking. Moreover, although the rapid rise of next generation sequencing (NGS) data has enriched our understanding of substitutions, insertions, and deletions, microsatellites have largely been sidelined due to the technical difficulties associated with their identification and genotyping from NGS short-read data. This hinders our understanding of the mutational properties of microsatellites, which are among the most variable genomic sequences and implicated in numerous diseases.;In this dissertation, I develop and use reproducible bioinformatics- and statistical tools to study various facets of rate variation and co-variation of mutations along the human genome using whole genome primate alignments (Chapters 1 and 2) as well as to identify and model the mutational behavior of m icrosatellites in human populations using the 1000 Genomes Project data (Chapters 3 and 4). I ask and provide answers to the following detailed questions:;1. How do rates of different mutation types co-vary in the human genome? And what determines their co-variation? Rates of substitutions, short insertions and short deletions show strong linear co-variation and genomic landscape features influence the structure and strength of this covariation. Microsatellite mutability varies orthogonally.;2. Can we define and characterize typical states of mutation rate variation? identified six states with various combinations of elevated or depressed mutation rates --- these states differ in their prevalence, lengths, genomic locations, and associations with genomic landscape features, and influence the localization of genes and functional marks.;3. When does a short tandem repeat (TR) turn into a highly mutable microsatellite, and what factors influence this switch? Not only the absolute levels of polymorphism, but also the rate of exponential growth of polymorphism incidence with repeat number influences the propensity of a TR to turn into a microsatellite. The change points occur at repeat numbers 9, 5, 4, and 4 for mono-, di-, tri- and tetranucleotide TRs respectively.;4. What are the main sources of errors associated with identification and genotyping of TRs from short-read data? And are these errors influenced by a TR's intrinsic properties? PCR slippage errors might constitute a large part of TR-associated errors. Error rates are strongly influenced by intrinsic properties of TRs including (i) motif size, (ii) motif composition, and (iii) repeat number.;Results from this dissertation contribute to understanding the mechanisms and rate variations of multiple mutation types, and have important implications for identification and screening of functional elements and disease-causing mutations. Importantly, this dissertation contributes tools to the scientific community via Galaxy, an open-source genomics portal, and thus facilitates future large-scale genomics studies.
机译:全基因组测序和重测序项目为研究突变提供了丰富的资源。现在有大量证据表明,人类基因组中微卫星的核苷酸取代,插入,缺失和重复数改变的速率存在区域变异和共变异。这些知识已经提高了我们对诱变的理解,并被证明是挖掘功能基因组学领域的有用资源。尽管如此,仍缺乏对突变率变化和共变的全面表征(结构,原因,地理位置和影响)。此外,尽管下一代测序(NGS)数据的迅速增长丰富了我们对替代,插入和缺失的理解,但是由于与从NGS短读数据进行鉴定和基因分型有关的技术难题,微卫星在很大程度上被搁置了。这阻碍了我们对微卫星突变特性的理解,因为微卫星突变特性是基因组序列中变化最大的一种,并且与多种疾病有关。本论文中,我开发并使用了可重现的生物信息学和统计工具来研究速率变化和协同效应的各个方面。使用全基因组灵长类比对(第1章和第2章)沿人类基因组的突变的变化,以及使用1000个Genomes Project数据(第3章和第4章)鉴定和建模微硅藻土在人类群体中的突变行为。我要求并提供以下详细问题的答案:; 1。人类基因组中不同突变类型的比率如何变化?什么决定了它们的协变?取代,短插入和短缺失的比率显示出强线性协变,而基因组景观特征影响这种协变的结构和强度。微卫星变异性正交变化; 2。我们可以定义和表征突变率变化的典型状态吗?确定了六个具有突变率升高或降低的组合的州-这些州的流行程度,长度,基因组位置以及与基因组景观特征的关联都不同,并影响基因和功能标记的定位; 3。短串联重复(TR)何时会变成高度易变的微卫星,什么因素会影响这种转换?不仅绝对的多态性水平,而且具有重复数的多态性发生率的指数增长速率也会影响TR转变为微卫星的倾向。对于单核苷酸,二核苷酸,三核苷酸和四核苷酸TR,改变点分别出现在重复数9、5、4和4处; 4。与从短读数据中鉴定TR和进行基因分型有关的主要错误来源是什么?这些错误是否受TR的固有属性影响? PCR滑移错误可能构成TR相关错误的很大一部分。错误率受TRs固有特性的强烈影响,这些固有特性包括(i)基序大小,(ii)基序组成和(iii)重复数。)本文的结果有助于理解多种突变类型的机制和速率变化,并且具有对功能元件和致病突变的鉴定和筛选具有重要意义。重要的是,本论文通过开源基因组学门户网站Galaxy为科学界提供了工具,从而促进了未来的大规模基因组学研究。

著录项

  • 作者

    Ananda, Guruprasad.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Bioinformatics.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 631 p.
  • 总页数 631
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号