首页> 美国卫生研究院文献>Genomics Informatics >Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data
【2h】

Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data

机译:用于分析高维基因组数据的多组测试程序

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistical testing procedures based on an individual test suffer from multiple testing issues such as the control of family-wise error rate and dependent tests. Moreover, detecting only a few of genes associated with a phenotype outcome among tens of thousands of genes is of main interest in genetic association studies. In this reason regularization procedures, where a phenotype outcome regresses on all genomic markers and then regression coefficients are estimated based on a penalized likelihood, have been considered as a good alternative approach to analysis of high-dimensional genomic data. But, selection performance of regularization procedures has been rarely compared with that of statistical group testing procedures. In this article, we performed extensive simulation studies where commonly used group testing procedures such as principal component analysis, Hotelling's T2 test, and permutation test are compared with group lasso (least absolute selection and shrinkage operator) in terms of true positive selection. Also, we applied all methods considered in simulation studies to identify genes associated with ovarian cancer from over 20,000 genetic sites generated from Illumina Infinium HumanMethylation27K Beadchip. We found a big discrepancy of selected genes between multiple group testing procedures and group lasso.
机译:在具有高维基因组数据的遗传关联研究中,通常需要使用多组测试程序来鉴定与疾病/特征相关的基因或遗传区域,其中多个遗传位点或变异体位于同一基因或遗传区域内。但是,基于单个测试的统计测试程序存在多个测试问题,例如控制家庭错误率和相关测试。而且,在成千上万的基因中仅检测与表型结果相关的少数基因是遗传关联研究的主要兴趣。因此,正则化程序被认为是分析高维基因组数据的一种很好的替代方法,其中表型结果在所有基因组标记上回归,然后基于受罚的可能性估算回归系数。但是,正则化程序的选择性能很少与统计组测试程序相比。在本文中,我们进行了广泛的仿真研究,将常用的组测试程序(例如主成分分析,Hotelling的T 2 检验和置换测试)与组套索(最小绝对选择和收缩算子)进行了比较。真正肯定选择的条件。此外,我们应用了模拟研究中考虑的所有方法,从Illumina Infinium HumanMethylation27K Beadchip产生的20,000多个遗传位点中鉴定与卵巢癌相关的基因。我们发现在多个小组测试程序和小组套索之间所选基因存在很大差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号