...
首页> 外文期刊>Genetic epidemiology. >Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging
【24h】

Reprioritizing Genetic Associations in Hit Regions Using LASSO-Based Resample Model Averaging

机译:使用基于LASSO的重采样模型平均对命中区域的遗传关联进行优先排序

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Significance testing one SNP at a time has proven useful for identifying genomic regions that harbor variants affecting human disease. But after an initial genome scan has identified a "hit region" of association, single-locus approaches can falter. Local linkage disequilibrium (LD) can make both the number of underlying true signals and their identities ambiguous. Simultaneous modeling of multiple loci should help. However, it is typically applied ad hoc: conditioning on the top SNPs, with limited exploration of the model space and no assessment of how sensitive model choice was to sampling variability. Formal alternatives exist but are seldom used. Bayesian variable selection is coherent but requires specifying a full joint model, including priors on parameters and the model space. Penalized regression methods (e.g., LASSO) appear promising but require calibration, and, once calibrated, lead to a choice of SNPs that can be misleadingly decisive. We present a general method for characterizing uncertainty in model choice that is tailored to reprioritizing SNPs within a hit region under strong LD. Our method, LASSO local automatic regularization resample model averaging (LLARRMA), combines LASSO shrinkage with resample model averaging and multiple imputation, estimating for each SNP the probability that it would be included in a multi-SNP model in alternative realizations of the data. We apply LLARRMA to simulations based on case-control genome-wide association studies data, and find that when there are several causal loci and strong LD, LLARRMA identifies a set of candidates that is enriched for true signals relative to single locus analysis and to the recently proposed method of Stability Selection.
机译:一次一次测试一个SNP的重要性已被证明对鉴定具有影响人类疾病的变异的基因组区域很有用。但是,在最初的基因组扫描已确定关联的“命中区域”之后,单基因座方法可能会步履蹒跚。局部连锁不平衡(LD)可以使基础真实信号的数量及其身份不明确。多个基因座的同时建模应该会有所帮助。但是,它通常是临时应用的:对顶部SNP进行条件处理,对模型空间的探索有限,并且没有评估模型选择对采样变异性的敏感程度。存在形式上的替代方案,但很少使用。贝叶斯变量选择是连贯的,但需要指定一个完整的联合模型,包括参数和模型空间的先验。惩罚性回归方法(例如LASSO)看起来很有希望,但需要校准,并且一旦校准,就会导致选择可能误导性决定性的SNP。我们提出了一种表征模型选择中不确定性的通用方法,该方法专为在强LD下命中区域内的SNP重新排序而设计。我们的方法LASSO本地自动正则化重采样模型平均(LLARRMA)将LASSO收缩与重采样模型平均和多次插值相结合,为每个SNP估计在数据的替代实现中将其包含在多SNP模型中的可能性。我们将LLARRMA应用于基于病例对照全基因组关联研究数据的模拟中,发现当存在多个因果基因座和强LD时,LLARRMA会识别出一组候选物,这些候选物相对于单基因座分析和相对于真实位点分析而言,可以充实真实信号最近提出的稳定性选择方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号