首页> 美国卫生研究院文献>Genes >De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data
【2h】

De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data

机译:从头开始的两个瑞典基因组的大会揭示了人类GRCh38参考文献中缺失的部分并改善了人口规模测序数据的变异性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.
机译:当前的人类参考序列(GRCh38)是大规模测序项目的基础。但是,最近的研究表明GRCh38可能是不完整的,不能代表特定人群。在这里,我们进行了两个瑞典基因组的从头组装,揭示了每个人中人类GRCh38参考缺乏的10 Mb序列。这些新序列(NS)中约有6 Mb与中国个人基因组共享。 NS具有高度重复性,GC含量较高,并且主要位于着丝粒或端粒区域。最多可以将1 Mb的NS分配给Y染色体,并且GRCh38在14、17和21号染色体上也缺少大片段。将NS包含在GRCh38参考中可从根本上改善短读全序列的比对和变异调用。几个基因组位点的基因组测序数据。瑞典人口规模测序项目的重新分析产生了> 75,000个推定的新型单核苷酸变异(SNV),并且每个人消除了> 10,000个假阳性SNV呼叫,其中一些位于蛋白质编码区域。我们的结果表明,GRCh38参考文献尚未完成,并证明了来自本地人群的个人基因组组装可以改善对短读全基因组测序数据的分析。

著录项

相似文献

  • 外文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号