首页> 美国卫生研究院文献>Bioinformatics >Fast and accurate imputation of summary statistics enhances evidence of functional enrichment
【2h】

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

机译:快速准确地估算汇总统计信息可增强功能丰富性的证据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available.>Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses.>Availability and implementation: Publicly available software package available at .>Contact: or >Supplementary information: are available at Bioinformatics online.
机译:>动机:使用外部参考面板(例如1000个基因组)进行插补是一种广泛使用的方法,可以提高全基因组关联研究和荟萃分析的能力。现有的基于隐马尔可夫模型(HMM)的插补方法需要个体水平的基因型。在这里,我们从汇总关联统计数据中开发出了一种用于高斯插补的新方法,该数据正在变得越来越普遍。>结果:在使用1000个基因组(1000G)数据的模拟中,该方法可恢复84%(相对于黄金标准,常见(> 5%)和低频(1-5%)变体的有效样本量的54%(当可从目标样本获得汇总连锁不平衡信息时,增加到87%(60%))基于HMM的插补的89%(67%),不能应用于摘要统计。我们的方法考虑到参考面板的样本量有限,这是消除假阳性关联的关键步骤,并且计算速度非常快。作为一项经验证明,我们将我们的方法应用于来自惠康信托基金会病例对照协会(WTCCC)数据的7种病例对照表型以及对英国1958年出生队列(1958BC)身高的研究。与基于HMM的个体水平基因型的估算值相比,来自汇总统计数据的高斯估算值可以恢复有效样本量的95%(105%)(以χ 2 关联统计的比率进行量化)。 (176)在WTCCC(1958BC height)数据中发表了单核苷酸多态性(SNPs)。此外,对于可从四个脂质性状的大型荟萃分析中获得的公开摘要统计数据,我们公开发布了1000G SNP的推算摘要统计数据(使用以前发布的方法无法获得),并通过掩盖数据的子集来证明其准确性。我们显示,与没有1000G估算的分析相比,使用我们的方法进行1000G估算会增加这些特征在基因位点和非基因位点的富集程度和统计学证据。因此,归纳统计摘要将是将来功能丰富分析中的宝贵工具。>可用性和实现:可从以下网站获得公开软件包:>联系人:或>补充信息:可从生物信息学在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号