...
首页> 外文期刊>BMC Genomics >Inferential considerations for low-count RNA-seq transcripts: a case study on the dominant prairie grass Andropogon gerardii
【24h】

Inferential considerations for low-count RNA-seq transcripts: a case study on the dominant prairie grass Andropogon gerardii

机译:低计数RNA-seq转录本的推论性考虑因素:以优势草原草Andropogon gerardii为例

获取原文

摘要

Background Differential expression (DE) analysis of RNA-seq data still poses inferential challenges, such as handling of transcripts characterized by low expression levels. In this study, we use a plasmode-based approach to assess the relative performance of alternative inferential strategies on RNA-seq transcripts, with special emphasis on transcripts characterized by a small number of read counts, so-called low-count transcripts, as motivated by an ecological application in prairie grasses. Big bluestem ( Andropogon gerardii ) is a wide-ranging dominant prairie grass of ecological and agricultural importance to the US Midwest while edaphic subspecies sand bluestem ( A. gerardii ssp. Hallii ) grows exclusively on sand dunes. Relative to big bluestem, sand bluestem exhibits qualitative phenotypic divergence consistent with enhanced drought tolerance, plausibly associated with transcripts of low expression levels. Our dataset consists of RNA-seq read counts for 25,582 transcripts (60?% of which are classified as low-count) collected from leaf tissue of individual plants of big bluestem ( n =?4) and sand bluestem ( n =?4). Focused on low-count transcripts, we compare alternative ad-hoc data filtering techniques commonly used in RNA-seq pipelines and assess the inferential performance of recently developed statistical methods for DE analysis, namely DESeq2 and edgeR robust. These methods attempt to overcome the inherently noisy behavior of low-count transcripts by either shrinkage or differential weighting of observations, respectively. Results Both DE methods seemed to properly control family-wise type 1 error on low-count transcripts, whereas edgeR robust showed greater power and DESeq2 showed greater precision and accuracy. However, specification of the degree of freedom parameter under edgeR robust had a non-trivial impact on inference and should be handled carefully. When properly specified, both DE methods showed overall promising inferential performance on low-count transcripts, suggesting that ad-hoc data filtering steps at arbitrary expression thresholds may be unnecessary. A note of caution is in order regarding the approximate nature of DE tests under both methods. Conclusions Practical recommendations for DE inference are provided when low-count RNA-seq transcripts are of interest, as is the case in the comparison of subspecies of bluestem grasses. Insights from this study may also be relevant to other applications focused on transcripts of low expression levels.
机译:RNA-seq数据的背景差异表达(DE)分析仍然带来推论性挑战,例如以低表达水平为特征的转录本的处理。在这项研究中,我们使用基于等离子的方法评估RNA seq转录本上其他推论策略的相对性能,并特别着重以少量读计数为特征的转录本,即所谓的低计数转录本。通过在草原草中的生态应用。大蓝茎(Andropogon gerardii)是对美国中西部具有重要生态和农业意义的广泛优势草原草,而深亚种沙蓝茎(A. gerardii ssp。Hallii)仅在沙丘上生长。相对于大蓝茎,沙蓝茎表现出定性表型差异,与增强的耐旱性相一致,可能与低表达水平的转录本有关。我们的数据集包含从大蓝茎(n =?4)和沙蓝茎(n =?4)的单株植物的叶片组织中收集的25,582个转录本的RNA-seq读数计数(其中60 %%分类为低计数)。 。重点关注低计数转录本,我们比较了RNA-seq管道中常用的替代性临时数据过滤技术,并评估了最近开发的用于DE分析的统计方法的推断性能,即DESeq2和edgeR健壮性。这些方法试图通过缩小或分别加权观测值来克服低计数笔录的固有噪声行为。结果两种DE方法似乎都能正确控制低计数转录本的家族型1型错误,而edgeR健壮性显示出更高的功效,而DESeq2显示出更高的准确性和准确性。但是,在edgeR鲁棒下对自由度参数的指定对推论有不小的影响,应谨慎处理。正确指定后,两种DE方法在低计数笔录上都显示出总体有希望的推理性能,这表明在任意表达阈值处的即席数据过滤步骤可能是不必要的。为了在两种方法下进行DE测试的近似性质,请注意。结论当感兴趣的是低计数RNA-seq转录本时,可提供有关DE推断的实用建议,例如在比较蓝茎草亚种时也是如此。这项研究的见识也可能与其他关注低表达水平转录本的应用有关。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号