...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Measuring consistency among gene set analysis methods: A systematic study
【24h】

Measuring consistency among gene set analysis methods: A systematic study

机译:测量基因集分析方法之间的一致性:系统研究

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.
机译:基因集分析是一种用于从基因表达数据集产生生物洞察的定量方法。基因设定分析方法的丰富与他们的普及表示,但提出了通过选择方法影响的程度的问题。我们对13种流行方法的系统分析,来自DNA微阵列和RNA-SEQ起源的6种不同数据集,表明这一选择很重要。我们观察到每种方法报告的基因集的总数差异多达2个数量级,并朝向报告大型基因集的偏差有一些方法。此外,在该方法报道的20个最统计学上显着的基因集之间存在显着的分歧。当扩展到100个最统计学显着的报告的基因套时,也观察到这一点。对于相同表型/条件的不同数据集,即使在使用相同的方法时,前20个和前100名最显着的结果也没有达到任何协议。在比较20和100个最统计学上显着的基因套时,Gage,Page和ORA是能够实现相对高的再现性的方法。幼年特发性关节炎(jia)数据集的生物验证显示出顶部20和前100名最重要的基因集到疾病的已知生物学的相关性方面的宽变化,其中Gage预测最相关的基因套,其次是GSEA, ora和页面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号