...
首页> 外文期刊>Genome research >Discovery and genotyping of structural variation from long-read haploid genome sequence data
【24h】

Discovery and genotyping of structural variation from long-read haploid genome sequence data

机译:长读单倍体基因组序列数据的结构变异的发现和基因分型

获取原文
获取原文并翻译 | 示例
           

摘要

In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF >1%). We estimate that this theoretical human diploid differs by as much as similar to 16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery fromgenotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that similar to 59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
机译:努力更充分地了解全谱的人类遗传变异,我们产生深度单分子,来自两倍单倍体人类基因组的实时(SMRT)测序数据。通过使用基于组装的方法(SMRT-SV),我们系统地为每个基因组独立地用于结构变体(SVS),并将461,553个遗传变体的序列结构的长度分解为28kbp的诱导序列结构。我们发现,即使在调整更常见的变体(MAF> 1%)后,也错过了89%的这些变体。我们估计这种理论的人类二倍体与相对于人类参考相似的相似,具有长读取测序数据,提供了与短的7bp至1 kbp的敏感性敏感性敏感性的五倍增加,而不是短暂的-read序列数据。尽管通过短读取方法未检测到大部分遗传变异,但是一旦替代等位基因序列分辨,我们表明61%的SV可以以高精度的短读序列数据集进行基因分开。因此,从预算中的脱象允许大多数这种错失的常见变异在人口中进行基因分型。有趣的是,当我们通过合并两个单倍体通过合并两个单倍体在硅中构建的假二倍孔基因组上检测到硅藻的SV检测时,发现类似于SMRT-SV不再检测到类似于59%的杂合SV。这些结果表明,长读测序数据的单倍体分辨率将显着提高SV检测的敏感性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号