首页> 美国卫生研究院文献>American Journal of Human Genetics >A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank
【2h】

A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank

机译:一种快速准确的基因组时间对事件数据分析及其在英国BioBank的应用方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

With increasing biobanking efforts connecting electronic health records and national registries to germline genetics, the time-to-event data analysis has attracted increasing attention in the genetics studies of human diseases. In time-to-event data analysis, the Cox proportional hazards (PH) regression model is one of the most used approaches. However, existing methods and tools are not scalable when analyzing a large biobank with hundreds of thousands of samples and endpoints, and they are not accurate when testing low-frequency and rare variants. Here, we propose a scalable and accurate method, SPACox (a saddlepoint approximation implementation based on the Cox PH regression model), that is applicable for genome-wide scale time-to-event data analysis. SPACox requires fitting a Cox PH regression model only once across the genome-wide analysis and then uses a saddlepoint approximation (SPA) to calibrate the test statistics. Simulation studies show that SPACox is 76–252 times faster than other existing alternatives, such as gwasurvivr, 185–511 times faster than the standard Wald test, and more than 6,000 times faster than the Firth correction and can control type I error rates at the genome-wide significance level regardless of minor allele frequencies. Through the analysis of UK Biobank inpatient data of 282,871 white British European ancestry samples, we show that SPACox can efficiently analyze large sample sizes and accurately control type I error rates. We identified 611 loci associated with time-to-event phenotypes of 12 common diseases, of which 38 loci would be missed within a logistic regression framework with a binary phenotype defined as event occurrence status during the follow-up period.
机译:随着将电子健康记录和国家注册管理机构连接到种系遗传学的生物人工努力,活动时间分析引起了人类疾病遗传研究的越来越关注。在时间 - 事件数据分析中,COX比例危害(pH)回归模型是最常用的方法之一。但是,当分析具有数百个样本和端点的大型生物汉时,现有方法和工具在分析大量的BioBank时,并且在测试低频和稀有变体时它们不准确。在这里,我们提出了一种可扩展和准确的方法,Spacox(基于Cox pH回归模型的鞍点近似实现),适用于基因组 - 范围的刻度对事件数据分析。 Spacox需要仅在全基因组的分析中拟合一次Cox pH回归模型,然后使用SaddlePoint近似(SPA)来校准测试统计数据。仿真研究表明,SPACOX比其他现有替代品快76-252倍,如GWASURVIVR,比标准沃尔德检测速度快于185-511倍,比围场校正快6000倍以上,可以控制I型错误率基因组显着性水平无论次要等位基因频率如何。通过分析英国Biobank住院数据的282,871欧洲祖先样本,我们表明Spacox可以有效地分析大型样本尺寸和准确控制I型错误率。我们鉴定了与12个常见疾病的时间表型表型相关的611个基因座,其中38个基因座在后续期间被定义为事件发生状态的二元表型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号