首页> 外文期刊>Bioinformatics >Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold
【24h】

Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold

机译:使用下一代测序读取和单倍型支架进行基因型调用和定相

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Given the current costs of next-generation sequencing, large studies carry out low-coverage sequencing followed by application of methods that leverage linkage disequilibrium to infer genotypes. We propose a novel method that assumes study samples are sequenced at low coverage and genotyped on a genome-wide microarray, as in the 1000 Genomes Project (1KGP). We assume polymorphic sites have been detected from the sequencing data and that genotype likelihoods are available at these sites. We also assume that the microarray genotypes have been phased to construct a haplotype scaffold. We then phase each polymorphic site using an MCMC algorithm that iteratively updates the unobserved alleles based on the genotype likelihoods at that site and local haplotype information. We use a multivariate normal model to capture both allele frequency and linkage disequilibrium information around each site. When sequencing data are available from trios, Mendelian transmission constraints are easily accommodated into the updates. The method is highly parallelizable, as it analyses one position at a time. Results: We illustrate the performance of the method compared with other methods using data from Phase 1 of the 1KGP in terms of genotype accuracy, phasing accuracy and downstream imputation performance. We show that the haplotype panel we infer in African samples, which was based on a trio-phased scaffold, increases downstream imputation accuracy for rare variants (R2 increases by > 0.05 for minor allele frequency < 1%), and this will translate into a boost in power to detect associations. These results highlight the value of incorporating microarray genotypes when calling variants from next-generation sequence data.
机译:动机:鉴于下一代测序的当前成本,大型研究进行了低覆盖率测序,然后应用了利用连锁不平衡来推断基因型的方法。我们提出了一种新颖的方法,该方法假定研究样本的覆盖率较低,并在全基因组微阵列上进行基因分型,如1000个基因组计划(1KGP)中所述。我们假设已从测序数据中检测到多态位点,并且在这些位点存在基因型可能性。我们还假设微阵列基因型已经定相以构建单倍型支架。然后,我们使用MCMC算法对每个多态位点进行定相,该算法根据该位点的基因型可能性和局部单倍型信息来迭代更新未观察到的等位基因。我们使用多元正常模型来捕获每个站点周围的等位基因频率和连锁不平衡信息。当可从三重奏获得测序数据时,孟德尔传输约束很容易纳入更新。该方法可高度并行化,因为它一次分析一个位置。结果:我们从基因型准确性,相位准确性和下游插补性能方面说明了该方法与使用1KGP第1阶段数据的其他方法相比的性能。我们显示,我们在非洲样品中推断出的单倍型面板(基于三阶段支架)提高了稀有变体的下游归因准确度(对于较小的等位基因频率<1%,R2增加> 0.05),这将转化为增强发现关联的能力。这些结果突出了从下一代序列数据中调用变体时纳入微阵列基因型的价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号