首页> 外文期刊>European journal of human genetics: EJHG >Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs
【24h】

Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs

机译:数千种特定于研究的全基因组序列的罕见变异基因型插补:对具有成本效益的研究设计的启示

获取原文
获取原文并翻译 | 示例
       

摘要

The utility of genotype imputation in genome-wide association studies is increasing as progressively larger reference panels are improved and expanded through whole-genome sequencing. Developing general guidelines for optimally cost-effective imputation, however, requires evaluation of performance issues that include the relative utility of study-specific compared with general/multipopulation reference panels; genotyping with various array scaffolds; effects of different ethnic backgrounds; and assessment of ranges of allele frequencies. Here we compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA). We also examined different combinations of genome-wide and custom arrays for baseline genotypes. In Sardinians, the study-specific reference panel provided better coverage and genotype imputation accuracy than the 1000G panels and other large European panels. In fact, even gene-centered custom arrays (interrogating similar to 200 000 variants) provided highly informative content across the entire genome. Gain in accuracy was also observed for Minnesotans using the study-specific reference panel, although the increase was smaller than in Sardinians, especially for rare variants. Notably, a combined panel including both study-specific and 1000G reference panels improved imputation accuracy only in the Minnesota sample, and only at rare sites. Finally, we found that when imputation is performed with a study-specific reference panel, cutoffs different from the standard thresholds of MACH-Rsq and IMPUTE-INFO metrics should be used to efficiently filter badly imputed rare variants. This study thus provides general guidelines for researchers planning large-scale genetic studies.
机译:随着全基因组测序的不断完善和扩展,越来越大的参考面板使基因型插补在全基因组关联研究中的效用不断提高。然而,要制定最佳成本效益估算的一般准则,就需要评估性能问题,其中包括与一般/多人口参考小组相比,针对具体研究的相对效用;使用各种阵列支架进行基因分型;不同种族背景的影响;和评估等位基因频率范围。在这里,我们将研究专用参考面板与孤立的撒丁岛居民和包括美国明尼苏达州在内的欧洲血统的群体中常用的1000个基因组计划(1000G)参考面板的有效性进行了比较。我们还检查了基线基因型的全基因组阵列和定制阵列的不同组合。在撒丁岛,特定于研究的参考面板比1000G面板和其他大型欧洲面板具有更好的覆盖率和基因型估算准确性。实际上,甚至以基因为中心的定制阵列(类似于20万个变体的查询)在整个基因组中也提供了非常有用的信息。对于明尼苏达州,使用研究专用参考面板还可以观察到准确性的提高,尽管增加的幅度小于撒丁岛,特别是对于稀有变异体。值得注意的是,包括研究专用面板和1000G参考面板在内的组合面板仅在明尼苏达州的样本中,并且仅在稀有部位,才提高了插补精度。最后,我们发现,当使用特定于研究的参考小组进行插补时,应使用与MACH-Rsq和IMPUTE-INFO指标的标准阈值不同的临界值来有效过滤不良插补的稀有变异。因此,这项研究为规划大规模遗传研究的研究人员提供了一般指导。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号