首页> 外文期刊>The American Journal of Human Genetics >Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data.
【24h】

Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data.

机译:通过常规方法对GWAS数据进行基因集分析,深入了解结肠癌病因。

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Genome-wide association studies (GWAS) have successfully identified susceptibility loci from marginal association analysis of SNPs. Valuable insight into genetic variation underlying complex diseases will likely be gained by considering functionally related sets of genes simultaneously. One approach is to further develop gene set enrichment analysis methods, which are initiated in gene expression studies, to account for the distinctive features of GWAS data. These features include the large number of SNPs per gene, the modest and sparse SNP associations, and the additional information provided by linkage disequilibrium (LD) patterns within genes. We propose a "gene set ridge regression in association studies (GRASS)" algorithm. GRASS summarizes the genetic structure for each gene as eigenSNPs and uses a novel form of regularized regression technique, termed group ridge regression, to select representative eigenSNPs for each gene and assess their joint association with disease risk. Compared with existing methods, the proposed algorithm greatly reduces the high dimensionality of GWAS data while still accounting for multiple hits and/or LD in the same gene. We show by simulation that this algorithm performs well in situations in which there are a large number of predictors compared to sample size. We applied the GRASS algorithm to a genome-wide association study of colon cancer and identified nicotinate and nicotinamide metabolism and transforming growth factor beta signaling as the top two significantly enriched pathways. Elucidating the role of variation in these pathways may enhance our understanding of colon cancer etiology.
机译:全基因组关联研究(GWAS)已从SNP的边缘关联分析中成功鉴定出易感基因座。通过同时考虑功能相关的基因集,可能会获得对复杂疾病潜在遗传变异的宝贵见解。一种方法是进一步开发基因集富集分析方法,该方法在基因表达研究中启动,以说明GWAS数据的独特特征。这些特征包括每个基因大量的SNP,适度和稀疏的SNP关联以及由基因内的连锁不平衡(LD)模式提供的其他信息。我们提出了“关联研究中的基因集岭回归(GRASS)”算法。 GRASS将每个基因的遗传结构概括为eigenSNP,并使用一种新型的正则化回归技术(称为群体岭回归)为每个基因选择代表性的eigenSNP,并评估它们与疾病风险的联合关系。与现有方法相比,该算法大大降低了GWAS数据的高维性,同时仍然考虑了同一基因中的多个匹配和/或LD。通过仿真显示,与样本量相比,该算法在存在大量预测变量的情况下效果很好。我们将GRASS算法应用于结肠癌的全基因组关联研究,并将烟酸酯和烟酰胺代谢以及转化生长因子β信号转导确定为最重要的两个丰富途径。阐明变异在这些途径中的作用可能会增强我们对结肠癌病因学的理解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号