首页> 外文期刊>BMC Genomics >A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data
【24h】

A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data

机译:一种高效的内存算法,可从RNA-Seq数据的de Bruijn图获得剪接图和从头表达估计

获取原文
           

摘要

BackgroundThe recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an additional step is needed to construct predicted transcripts from the graph.ResultsSince the de Bruijn graph contains all branching possibilities, we develop a memory-efficient algorithm to recover alternative splicing information and library-specific expression information directly from the graph without prior genomic knowledge. We implement the algorithm as a postprocessing module of the Velvet assembler. We validate our algorithm by simulating the transcriptome assembly of Drosophila using its known genome, and by performing Drosophila transcriptome assembly using publicly available RNA-Seq libraries. Under a range of conditions, our algorithm recovers sequences and alternative splicing junctions with higher specificity than Oases or Trans-ABySS.ConclusionsSince our postprocessing algorithm does not consume as much memory as Velvet and is less memory-intensive than Oases, it allows biologists to assemble large libraries with limited computational resources. Our algorithm has been applied to perform transcriptome assembly of the non-model blow fly Lucilia sericata that was reported in a previous article, which shows that the assembly is of high quality and it facilitates comparison of the Lucilia sericata transcriptome to Drosophila and two mosquitoes, prediction and experimental validation of alternative splicing, investigation of differential expression among various developmental stages, and identification of transposable elements.
机译:背景技术高通量测序的最新进展使得通过应用de novo序列组装算法研究整个转录组变得可行。虽然一种流行的策略是首先构建一个中间的de Bruijn图结构来表示转录组,但还需要一个额外的步骤来从该图构建预测的转录本。结果由于de Bruijn图包含所有分支可能性,因此我们开发了一种内存有效的算法来无需事先了解基因组知识,即可直接从图形中恢复替代的剪接信息和特定于文库的表达信息。我们将该算法实现为Velvet汇编程序的后处理模块。我们通过使用果蝇的已知基因组模拟果蝇的转录组装配,并使用可公开获得的RNA-Seq文库进行果蝇的转录组装配来验证算法。在一定条件下,我们的算法比Oases或Trans-ABySS具有更高的特异性来恢复序列和其他剪接连接点。结论由于我们的后处理算法消耗的内存不如Velvet,并且不像Oases那样占用大量内存,因此它使生物学家可以组装计算资源有限的大型图书馆。我们的算法已应用于上一篇文章中报道的非模型吹蝇Lucilia sericata的转录组组装,这表明该组装具有较高的质量,它有助于将Lucilia sericata转录组与果蝇和两种蚊子进行比较,选择性剪接的预测和实验验证,不同发育阶段之间差异表达的研究以及转座因子的鉴定。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号