首页> 外文期刊>GigaScience >Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples
【24h】

Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples

机译:长期非编码RNA定量用于癌症样品RNA测序的基准

获取原文
           

摘要

Background Long non-coding RNAs (lncRNAs) are emerging as important regulators of various biological processes. While many studies have exploited public resources such as RNA sequencing (RNA-Seq) data in The Cancer Genome Atlas to study lncRNAs in cancer, it is crucial to choose the optimal method for accurate expression quantification. Results In this study, we compared the performance of pseudoalignment methods Kallisto and Salmon, alignment-based transcript quantification method RSEM, and alignment-based gene quantification methods HTSeq and featureCounts, in combination with read aligners STAR, Subread, and HISAT2, in lncRNA quantification, by applying them to both un-stranded and stranded RNA-Seq datasets. Full transcriptome annotation, including protein-coding and non-coding RNAs, greatly improves the specificity of lncRNA expression quantification. Pseudoalignment methods and RSEM outperform HTSeq and featureCounts for lncRNA quantification at both sample- and gene-level comparison, regardless of RNA-Seq protocol type, choice of aligners, and transcriptome annotation. Pseudoalignment methods and RSEM detect more lncRNAs and correlate highly with simulated ground truth. On the contrary, HTSeq and featureCounts often underestimate lncRNA expression. Antisense lncRNAs are poorly quantified by alignment-based gene quantification methods, which can be improved using stranded protocols and pseudoalignment methods. Conclusions Considering the consistency with ground truth and computational resources, pseudoalignment methods Kallisto or Salmon in combination with full transcriptome annotation is our recommended strategy for RNA-Seq analysis for lncRNAs.
机译:背景技术长的非编码RNA(lncRNA)逐渐成为各种生物过程的重要调节剂。尽管许多研究已经利用《癌症基因组图谱》中的RNA测序(RNA-Seq)数据等公共资源来研究癌症中的lncRNA,但选择最佳方法进行精确的表达定量至关重要。结果在本研究中,我们比较了在lncRNA定量分析中伪比对方法Kallisto和Salmon,基于比对的转录物定量方法RSEM和基于比对的基因定量方法HTSeq和featureCounts以及读取的比对器STAR,Subread和HISAT2的性能。 ,通过将它们应用于非链和链RNA-Seq数据集。完整的转录组注释,包括蛋白质编码RNA和非编码RNA,大大提高了lncRNA表达定量的特异性。无论样品和基因水平的比较如何,伪对准方法和RSEM在lncRNA定量方面的表现均优于HTSeq和featureCounts,而与RNA-Seq协议类型,对准剂的选择和转录组注释无关。伪比对方法和RSEM可检测更多lncRNA,并与模拟的地面真实情况高度相关。相反,HTSeq和featureCounts通常会低估lncRNA表达。反义lncRNAs不能通过基于比对的基因定量方法进行定量,这可以使用链式操作和伪比对方法进行改进。结论考虑到与地面真相和计算资源的一致性,伪对准方法Kallisto或Salmon结合完整的转录组注释是我们推荐用于lncRNA的RNA-Seq分析的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号