首页> 美国卫生研究院文献>Journal of Biomedical Research >Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
【2h】

Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares

机译:病例对照全基因组关联研究基于维数减少的逻辑回归模型的比较:主成分分析与偏最小二乘

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the performance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
机译:随着生物技术的最新进展,全基因组关联研究(GWAS)已被广泛用于识别人类复杂疾病和特征的遗传变异。在病例控制的GWAS中,典型的统计策略是基于单场所分析的传统Logistic回归(LR)。然而,这样的单基因座分析导致众所周知的多重性问题,具有增大I型错误并降低功率的风险。基于降维的技术,例如基于主成分的逻辑回归(PC-LR),基于偏最小二乘的逻辑回归(PLS-LR),最近在高维基因组数据分析中引起了广泛关注。但是,这些方法的性能仍然不清楚,尤其是在GWAS中。我们进行了仿真和实际数据应用,以比较在定义的单核苷酸多态性(SNP)设置区域内适用于GWAS的PC-LR,PLS-LR和LR的I型错误和功效。我们发现,在原假设下,PC-LR和PLS可以合理地控制I型错误。相比之下,通过Bonferroni方法校正的LR在所有模拟设置中均更为保守。尤其是,我们发现PC-LR和PLS-LR具有可比的性能,并且两者均胜过LR,尤其是当因果SNP与基因分型的SNP处于高度连锁不平衡并且在模拟中有效尺寸较小时。基于SNP集分析,我们应用了所有三种方法来分析非小细胞肺癌GWAS数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号