首页> 外文会议>ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008 >FastANOVA: an Efficient Algorithm for Genome-Wide Association Study
【24h】

FastANOVA: an Efficient Algorithm for Genome-Wide Association Study

机译:FastANOVA:用于全基因组关联研究的高效算法

获取原文

摘要

Studying the association between quantitative phenotype (such as height or weight) and single nucleotide polymorphisms (SNPs) is an important problem in biology. To understand underlying mechanisms of complex phenotypes, it is often necessary to consider joint genetic effects across multiple SNPs. ANOVA (analysis of variance) test is routinely used in association study. Important findings from studying gene-gene (SNP-pair) interactions are appearing in the literature. However, the number of SNPs can be up to millions. Evaluating joint effects of SNPs is a challenging task even for SNP-pairs. Moreover, with large number of SNPs correlated, permutation procedure is preferred over simple Bonferroni correction for properly controlling family-wise error rate and retaining mapping power, which dramatically increases the computational cost of association study.In this paper, we study the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. We derive an upper bound of SNP-pair ANOVA test, which can be expressed as the sum of two terms. The first term is based on single-SNP ANOVA test. The second term is based on the SNPs and independent of any phenotype permutation. Furthermore, SNP-pairs can be organized into groups, each of which shares a common upper bound. This allows for maximum reuse of intermediate computation, efficient upper bound estimation, and effective SNP-pair pruning. Consequently, FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones. Extensive experiments demonstrate that FastANOVA is orders of magnitude faster than the brute-force implementation of ANOVA tests on all SNP pairs.
机译:研究定量表型(例如身高或体重)与单核苷酸多态性(SNP)之间的关联是生物学中的重要问题。要了解复杂表型的潜在机制,通常有必要考虑跨多个SNP的联合遗传效应。关联研究通常使用ANOVA(方差分析)测试。研究基因-基因(SNP对)相互作用的重要发现出现在文献中。但是,SNP的数量可能高达数百万。即使对于SNP对,评估SNP的联合作用也是一项艰巨的任务。此外,由于有大量的SNP相关联,因此置换程序比简单的Bonferroni校正更可取,以适当地控制族错误率并保持映射能力,这大大增加了关联研究的计算成本。 在本文中,我们研究发现与给定定量表型有显着关联的SNP对的问题。我们提出了一种高效的算法FastANOVA,用于以批处理模式对SNP对执行ANOVA测试,该算法还支持大型置换测试。我们推导了SNP对ANOVA检验的上限,可以将其表示为两个项之和。第一项基于单SNP方差分析测试。第二项基于SNP,并且独立于任何表型排列。此外,SNP对可以组织成组,每个组共享一个共同的上限。这允许最大程度地重用中间计算,有效的上限估计和有效的SNP对修剪。因此,FastANOVA只需要对少量候选SNP对进行ANOVA测试,而不会遗漏任何重要的对。大量实验表明,在所有SNP对上,FastANOVA比ANOVA测试的强力实施快几个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号