首页> 外文期刊>European journal of human genetics: EJHG >A screening methodology based on Random Forests to improve the detection of gene-gene interactions.
【24h】

A screening methodology based on Random Forests to improve the detection of gene-gene interactions.

机译:一种基于随机森林的筛选方法,可改善对基因与基因相互作用的检测。

获取原文
获取原文并翻译 | 示例
           

摘要

The search for susceptibility loci in gene-gene interactions imposes a methodological and computational challenge for statisticians because of the large dimensionality inherent to the modelling of gene-gene interactions or epistasis. In an era in which genome-wide scans have become relatively common, new powerful methods are required to handle the huge amount of feasible gene-gene interactions and to weed out false positives and negatives from these results. One solution to the dimensionality problem is to reduce data by preliminary screening of markers to select the best candidates for further analysis. Ideally, this screening step is statistically independent of the testing phase. Initially developed for small numbers of markers, the Multifactor Dimensionality Reduction (MDR) method is a nonparametric, model-free data reduction technique to associate sets of markers with optimal predictive properties to disease. In this study, we examine the power of MDR in larger data sets and compare it with other approaches that are able to identify gene-gene interactions. Under various interaction models (purely and not purely epistatic), we use a Random Forest (RF)-based prescreening method, before executing MDR, to improve its performance. We find that the power of MDR increases when noisy SNPs are first removed, by creating a collection of candidate markers with RFs. We validate our technique by extensive simulation studies and by application to asthma data from the European Committee of Respiratory Health Study II.
机译:由于基因-基因相互作用或上位性建模固有的大维度,寻找基因-基因相互作用中的易感基因座给统计学家带来了方法论和计算上的挑战。在全基因组扫描变得相对普遍的时代,需要新的强大方法来处理大量可行的基因-基因相互作用,并从这些结果中剔除假阳性和阴性。解决维数问题的一种方法是通过初步筛选标记以选择最佳候选物进行进一步分析来减少数据。理想地,此筛选步骤在统计上独立于测试阶段。最初针对少量标记而开发的多因素降维(MDR)方法是一种非参数,无模型的数据约简技术,用于将具有最佳预测特性的标记集与疾病相关联。在这项研究中,我们检查了MDR在较大数据集中的功能,并将其与其他能够识别基因-基因相互作用的方法进行了比较。在各种交互模型(纯粹且不是纯粹上位的)下,我们在执行MDR之前使用基于随机森林(RF)的预筛选方法来提高其性能。我们发现,通过创建带有RF的候选标记集合,当首先去除嘈杂的SNP时,MDR的功能会增强。我们通过广泛的模拟研究以及对来自欧洲呼吸健康委员会II的哮喘数据的应用来验证我们的技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号