首页> 外文期刊>BMC proceedings. >Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17
【24h】

Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17

机译:评价遗传分析研讨会无关样本的套索回归方法17

获取原文
           

摘要

The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms ( SNPs ) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs , modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs . If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.
机译:我们使用的遗传分析研讨会17数据包括由Mini-Exome扫描的24,487个单核苷酸多态性(SNP)基因分型的697个不相关的个​​体,使用由1000个基因组项目和模拟表型注释的3,205个基因的真实序列数据。我们研究了200组的特征Q2模拟表型。该数据集的一个重要特征是大多数SNP都很罕见,87%的SNP具有小于0.05的次要等位基因频率。对于罕见的SNP检测,在本研究中,我们在基因水平上进行了最小的绝对收缩和选择操作员(套索)回归和F测试,并计算了避免任何选择偏差的广义自由度。为了比较,我们还进行了线性回归和折叠方法,其总和稀有SNP,修改为定量性状和两种不同的等位基因频率阈值。本文的目的是评估这四种方法在这个迷你极端数据中,并比较他们在权力和假阳性率方面的性能。在大多数情况下,套索方法比线性回归和折叠方法更强大。我们还注意到难以确定折叠方法的最佳阈值以及连锁不平衡在检测稀有因果SNPS中起作用的重要作用。如果稀有因果SNP在相同基因中具有常见标记的常见标记处于强烈的连接不平衡,则力将得到很大改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号