首页> 外文期刊>BMC Bioinformatics >Comparative evaluation of gene set analysis approaches for RNA-Seq data
【24h】

Comparative evaluation of gene set analysis approaches for RNA-Seq data

机译:基因组分析方法对RNA-Seq数据的比较评估

获取原文
       

摘要

Background Over the last few years transcriptome sequencing (RNA-Seq) has almost completely taken over microarrays for high-throughput studies of gene expression. Currently, the most popular use of RNA-Seq is to identify genes which are differentially expressed between two or more conditions. Despite the importance of Gene Set Analysis (GSA) in the interpretation of the results from RNA-Seq experiments, the limitations of GSA methods developed for microarrays in the context of RNA-Seq data are not well understood. Results We provide a thorough evaluation of popular multivariate and gene-level self-contained GSA approaches on simulated and real RNA-Seq data. The multivariate approach employs multivariate non-parametric tests combined with popular normalizations for RNA-Seq data. The gene-level approach utilizes univariate tests designed for the analysis of RNA-Seq data to find gene-specific P-values and combines them into a pathway P-value using classical statistical techniques. Our results demonstrate that the Type I error rate and the power of multivariate tests depend only on the test statistics and are insensitive to the different normalizations. In general standard multivariate GSA tests detect pathways that do not have any bias in terms of pathways size, percentage of differentially expressed genes, or average gene length in a pathway. In contrast the Type I error rate and the power of gene-level GSA tests are heavily affected by the methods for combining P-values, and all aforementioned biases are present in detected pathways. Conclusions Our result emphasizes the importance of using self-contained non-parametric multivariate tests for detecting differentially expressed pathways for RNA-Seq data and warns against applying gene-level GSA tests, especially because of their high level of Type I error rates for both, simulated and real data.
机译:背景技术在过去的几年中,转录组测序(RNA-Seq)几乎完全取代了微阵列,用于基因表达的高通量研究。目前,RNA-Seq的最流行用途是鉴定在两个或多个条件之间差异表达的基因。尽管在解释RNA-Seq实验结果中基因组分析(GSA)的重要性,但是对于在RNA-Seq数据背景下为微阵列开发的GSA方法的局限性还没有很好地理解。结果我们提供了对模拟和真实RNA-Seq数据上流行的多变量和基因水平自包含GSA方法的全面评估。多元方法采用多元非参数检验,并结合针对RNA-Seq数据的常用归一化方法。基因水平方法利用为分析RNA-Seq数据而设计的单变量测试,以找到基因特异性的P值,并使用经典的统计技术将它们组合为途径P值。我们的结果表明,I类错误率和多元测试的功效仅取决于测试统计信息,并且对不同的规范化不敏感。在一般标准多变量GSA测试中,检测到的途径在途径大小,差异表达基因的百分比或途径中的平均基因长度方面都没有任何偏差。相比之下,I型错误率和基因水平GSA测试的功效受到P值组合方法的严重影响,并且所有上述偏差都存在于检测到的途径中。结论我们的结果强调了使用独立的非参数多变量检验来检测RNA-Seq数据差异表达途径的重要性,并警告不要应用基因水平的GSA检验,尤其是因为它们的I型错误率都很高,模拟和真实数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号