首页> 外文期刊>The Annals of applied statistics >A FAST ALGORITHM FOR DETECTING GENE-GENE INTERACTIONS IN GENOME-WIDE ASSOCIATION STUDIES
【24h】

A FAST ALGORITHM FOR DETECTING GENE-GENE INTERACTIONS IN GENOME-WIDE ASSOCIATION STUDIES

机译:基因组关联研究中检测基因-基因相互作用的快速算法

获取原文
获取原文并翻译 | 示例
           

摘要

With the recent advent of high-throughput genotyping techniques, genetic data for genome-wide association studies (GWAS) have become increasingly available, which entails the development of efficient and effective statistical approaches. Although many such approaches have been developed and used to identify single-nucleotide polymorphisms (SNPs) that are associated with complex traits or diseases, few are able to detect gene-gene interactions among different SNPs. Genetic interactions, also known as epistasis, have been recognized to play a pivotal role in contributing to the genetic variation of phenotypic traits. However, because of an extremely large number of SNP-SNP combinations in GWAS, the model dimensionality can quickly become so overwhelming that no prevailing variable selection methods are capable of handling this problem. In this paper, we present a statistical framework for characterizing main genetic effects and epistatic interactions in a GWAS study. Specifically, we first propose a two-stage sure independence screening (TS-SIS) procedure and generate a pool of candidate SNPs and interactions, which serve as predictors to explain and predict the phenotypes of a complex trait. We also propose a rates adjusted thresholding estimation (RATE) approach to determine the size of the reduced model selected by an independence screening. Regularization regression methods, such as LASSO or SCAD, are then applied to further identify important genetic effects. Simulation studies show that the TS-SIS procedure is computationally efficient and has an outstanding finite sample performance in selecting potential SNPs as well as gene-gene interactions. We apply the proposed framework to analyze an ultrahigh-dimensional GWAS data set from the Framingham Heart Study, and select 23 active SNPs and 24 active epistatic interactions for the body mass index variation. It shows the capability of our procedure to resolve the complexity of genetic control.
机译:随着高通量基因分型技术的出现,用于全基因组关联研究(GWAS)的遗传数据变得越来越可用,这需要开发有效的统计方法。尽管已开发出许多此类方法并用于鉴定与复杂性状或疾病相关的单核苷酸多态性(SNP),但很少能够检测不同SNP之间的基因-基因相互作用。遗传相互作用,也称为上位性,已被公认在促成表型性状的遗传变异中起关键作用。但是,由于GWAS中存在大量SNP-SNP组合,因此模型维数很快就会变得不堪重负,以至于没有任何主流的变量选择方法能够处理此问题。在本文中,我们提供了一个用于描述GWAS研究中主要遗传效应和上位相互作用的统计框架。具体来说,我们首先提出一个两阶段肯定独立性筛选(TS-SIS)程序,并生成候选SNP和相互作用库,作为预测因子来解释和预测复杂性状的表型。我们还提出了一种速率调整阈值估计(RATE)方法,以确定通过独立筛选选择的简化模型的大小。然后将正则化回归方法(例如LASSO或SCAD)应用于进一步确定重要的遗传效应。仿真研究表明,TS-SIS程序计算效率高,并且在选择潜在的SNP以及基因与基因的相互作用方面具有出色的有限样本性能。我们应用提出的框架来分析来自Framingham心脏研究的超高维GWAS数据集,并为体重指数变化选择23个活跃的SNP和24个活跃的上位相互作用。它显示了我们程序解决遗传控制复杂性的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号