首页> 外文期刊>BMC Bioinformatics >Origins and characterization of variants shared between databases of somatic and germline human mutations
【24h】

Origins and characterization of variants shared between databases of somatic and germline human mutations

机译:体细胞和种群人类突变数据库之间共享的变异的起源和特征

获取原文
       

摘要

Mutations arise in the human genome in two major settings: the germline and the soma. These settings involve different inheritance patterns, time scales, chromatin structures, and environmental exposures, all of which impact the resulting distribution of substitutions. Nonetheless, many of the same single nucleotide variants (SNVs) are shared between germline and somatic mutation databases, such as between the gnomAD database of 120,000 germline exomes and the TCGA database of 10,000 somatic exomes. Here, we sought to explain this overlap. After strict filtering to exclude common germline polymorphisms and sites with poor coverage or mappability, we found 336,987 variants shared between the somatic and germline databases. A uniform statistical model explains 34% of these shared variants; a model that incorporates the varying mutation rates of the basic mutation types explains another 50% of shared variants; and a model that includes extended nucleotide contexts (e.g. surrounding 3 bases on either side) explains an additional 4% of shared variants. Analysis of read depth finds mixed evidence that up to 4% of the shared variants may represent germline variants leaked into somatic call sets. 9% of the shared variants are not explained by any model. Sequencing errors and convergent evolution did not account for these. We surveyed other factors as well: Cancers driven by endogenous mutational processes share a greater fraction of variants with the germline, and recently derived germline variants were more likely to be somatically shared than were ancient germline ones. Overall, we find that shared variants largely represent bona fide biological occurrences of the same variant in the germline and somatic setting and arise primarily because DNA has some of the same basic chemical vulnerabilities in either setting. Moreover, we find mixed evidence that somatic call-sets leak appreciable numbers of germline variants, which is relevant to genomic privacy regulations. In future studies, the similar chemical vulnerability of DNA between the somatic and germline settings might be used to help identify disease-related genes by guiding the development of background-mutation models that are informed by both somatic and germline patterns of variation.
机译:在两个主要环境中,人类基因组出现突变:种系和躯体。这些设置涉及不同的遗传模式,时间尺度,染色质结构和环境暴露,所有这些都会影响所产生的取代的分布。尽管如此,许多相同的单一核苷酸变体(SNV)在种系和体细胞突变数据库之间共享,例如120,000种系突厥物和10,000个躯体展开的TCGA数据库之间的Gnomad数据库之间。在这里,我们试图解释这个重叠。在严格过滤后,排除普通的种质多态性和具有差的覆盖或可用性的网站,我们发现了336,987个在体细胞和种系数据库之间共享的变体。统一的统计模型解释了这些共享变体的34%;包含基本突变类型的变化突变率的模型解释了另外50%的共享变体;和包括扩展核苷酸环境的模型(例如,在任一侧的3个碱基)解释了另外4%的共享变体。读取深度的分析发现混合证据,即高达4%的共享变体可以代表泄漏到躯体呼叫组中的种系变体。任何模型都没有解释9%的共享变体。测序错误和收敛演变没有占这些。我们也调查了其他因素:由内源性突变过程驱动的癌症患有较大的含有种系的变体,最近衍生的种系变体比古种种系列更容易被群体共享。总体而言,我们发现共享变体在很大程度上代表了种系和体细胞环境中同样变体的真实生物学发生,主要是因为DNA在任一环境中具有一些相同的基本化学脆弱性。此外,我们发现混合证据表明,体细胞呼叫集泄漏了明显数量的种系变体,这与基因组隐私法规有关。在未来的研究中,体细胞和种系环境之间DNA的类似化学脆弱性可能用于通过指导通过指导通过各种各样的变异模式的背景突变模型的发展来识别疾病相关基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号