首页> 美国卫生研究院文献>Bioinformatics >Maximal conditional chi-square importance in random forests
【2h】

Maximal conditional chi-square importance in random forests

机译:随机森林中最大条件卡方重要性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: High-dimensional data are frequently generated in genome-wide association studies (GWAS) and other studies. It is important to identify features such as single nucleotide polymorphisms (SNPs) in GWAS that are associated with a disease. Random forests represent a very useful approach for this purpose, using a variable importance score. This importance score has several shortcomings. We propose an alternative importance measure to overcome those shortcomings.>Results: We characterized the effect of multiple SNPs under various models using our proposed importance measure in random forests, which uses maximal conditional chi-square (MCC) as a measure of association between a SNP and the trait conditional on other SNPs. Based on this importance measure, we employed a permutation test to estimate empirical P-values of SNPs. Our method was compared to a univariate test and the permutation test using the Gini and permutation importance. In simulation, the proposed method performed consistently superior to the other methods in identifying of risk SNPs. In a GWAS of age-related macular degeneration, the proposed method confirmed two significant SNPs (at the genome-wide adjusted level of 0.05). Further analysis showed that these two SNPs conformed with a heterogeneity model. Compared with the existing importance measures, the MCC importance measure is more sensitive to complex effects of risk SNPs by utilizing conditional information on different SNPs. The permutation test with the MCC importance measure provides an efficient way to identify candidate SNPs in GWAS and facilitates the understanding of the etiology between genetic variants and complex diseases.>Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机:在全基因组关联研究(GWAS)和其他研究中经常生成高维数据。重要的是要识别与疾病相关的特征,例如GWAS中的单核苷酸多态性(SNP)。为此,随机森林代表了一种非常有用的方法,即使用可变的重要性评分。该重要性分数具有几个缺点。 >结果:我们在随机森林中使用建议的重要性度量,使用最大条件卡方(MCC)作为特征,在各种模型下表征了多个SNP的效果。一个SNP与其他SNP的性状之间的关联度。基于此重要性度量,我们采用了置换检验来估计SNP的经验P值。我们的方法与单变量检验和使用基尼系数和置换重要性的置换检验进行了比较。在仿真中,所提出的方法在识别风险单核苷酸多态性方面始终如一地优于其他方法。在年龄相关性黄斑变性的GWAS中,拟议的方法证实了两个重要的SNP(在全基因组调整水平为0.05时)。进一步的分析表明,这两个SNP符合异质性模型。与现有的重要性度量相比,MCC重要性度量通过利用关于不同SNP的条件信息对风险SNP的复杂影响更为敏感。具有MCC重要性度量的置换测试为识别GWAS中的候选SNP提供了一种有效的方法,并有助于了解遗传变异与复杂疾病之间的病因学。>联系方式: >补充信息:可在生物信息学在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号