...
首页> 外文期刊>BMC Bioinformatics >Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq
【24h】

Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq

机译:使用RNA-SEQ对人类样品差异表达分析分析工作流程的实证评估

获取原文
           

摘要

Background RNA-Seq has supplanted microarrays as the preferred method of transcriptome-wide identification of differentially expressed genes. However, RNA-Seq analysis is still rapidly evolving, with a large number of tools available for each of the three major processing steps: read alignment, expression modeling, and identification of differentially expressed genes. Although some studies have benchmarked these tools against gold standard gene expression sets, few have evaluated their performance in concert with one another. Additionally, there is a general lack of testing of such tools on real-world, physiologically relevant datasets, which often possess qualities not reflected in tightly controlled reference RNA samples or synthetic datasets. Results Here, we evaluate 219 combinatorial implementations of the most commonly used analysis tools for their impact on differential gene expression analysis by RNA-Seq. A test dataset was generated using highly purified human classical and nonclassical monocyte subsets from a clinical cohort, allowing us to evaluate the performance of 495 unique workflows, when accounting for differences in expression units and gene- versus transcript-level estimation. We find that the choice of methodologies leads to wide variation in the number of genes called significant, as well as in performance as gauged by precision and recall, calculated by comparing our RNA-Seq results to those from four previously published microarray and BeadChip analyses of the same cell populations. The method of differential gene expression identification exhibited the strongest impact on performance, with smaller impacts from the choice of read aligner and expression modeler. Many workflows were found to exhibit similar overall performance, but with differences in their calibration, with some biased toward higher precision and others toward higher recall. Conclusions There is significant heterogeneity in the performance of RNA-Seq workflows to identify differentially expressed genes. Among the higher performing workflows, different workflows exhibit a precision/recall tradeoff, and the ultimate choice of workflow should take into consideration how the results will be used in subsequent applications. Our analyses highlight the performance characteristics of these workflows, and the data generated in this study could also serve as a useful resource for future development of software for RNA-Seq analysis.
机译:背景技术RNA-SEQ已经使用的微阵列作为转录型基因的传递体覆盖的优选方法。然而,RNA-SEQ分析仍然迅速发展,对于三个主要处理步骤中的每一个具有大量工具:读取对准,表达建模和差异表达基因的鉴定。虽然有些研究已经基准这些工具对抗黄金标准基因表达集,但很少有很少的评估它们的演奏彼此。此外,普遍缺乏对现实世界生理相关数据集的这种工具的测试,这通常具有在紧密控制的参考RNA样品或合成数据集中不反映的质量。结果在此,我们评估219个组合实施方法的最常用的分析工具,用于对RNA-SEQ对差分基因表达分析的影响。使用临床群组的高度纯化的人类经典和非生物单核细胞子群生成测试数据集,允许我们评估表达单位和基因与转录级估计的差异时495个独特工作流的性能。我们发现方法的选择导致所谓的基因数量的宽变化,以及通过比较我们的RNA-SEQ结果与来自四个先前公布的微阵列和珠芯片分析的那些来计算的精确和召回的性能相同的细胞群。鉴别基因表达识别的方法表现出对性能的最强烈影响,从读对准器和表达式建模的选择具有较小的影响。发现许多工作流程表现出类似的整体性能,但校准的差异,一些偏向于更高的精度和其他对更高的召回。结论RNA-SEQ工作流程的性能具有显着的异质性,以鉴定差异表达基因。在较高的执行工作流程中,不同的工作流程表现出精确/召回权衡,并且应该考虑到后续应用程序的最终选择。我们的分析突出了这些工作流的性能特征,本研究中产生的数据也可以作为RNA-SEQ分析的软件开发的有用资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号