A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data

Sing-Hoi Sze; Aaron M Tarone

首页> 外文期刊>BMC Genomics >A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data

【24h】

A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data

机译：一种高效的内存算法，可从RNA-Seq数据的de Bruijn图获得剪接图和从头表达估计

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

BackgroundThe recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an additional step is needed to construct predicted transcripts from the graph.ResultsSince the de Bruijn graph contains all branching possibilities, we develop a memory-efficient algorithm to recover alternative splicing information and library-specific expression information directly from the graph without prior genomic knowledge. We implement the algorithm as a postprocessing module of the Velvet assembler. We validate our algorithm by simulating the transcriptome assembly of Drosophila using its known genome, and by performing Drosophila transcriptome assembly using publicly available RNA-Seq libraries. Under a range of conditions, our algorithm recovers sequences and alternative splicing junctions with higher specificity than Oases or Trans-ABySS.ConclusionsSince our postprocessing algorithm does not consume as much memory as Velvet and is less memory-intensive than Oases, it allows biologists to assemble large libraries with limited computational resources. Our algorithm has been applied to perform transcriptome assembly of the non-model blow fly Lucilia sericata that was reported in a previous article, which shows that the assembly is of high quality and it facilitates comparison of the Lucilia sericata transcriptome to Drosophila and two mosquitoes, prediction and experimental validation of alternative splicing, investigation of differential expression among various developmental stages, and identification of transposable elements.

机译：背景技术高通量测序的最新进展使得通过应用de novo序列组装算法研究整个转录组变得可行。虽然一种流行的策略是首先构建一个中间的de Bruijn图结构来表示转录组，但还需要一个额外的步骤来从该图构建预测的转录本。结果由于de Bruijn图包含所有分支可能性，因此我们开发了一种内存有效的算法来无需事先了解基因组知识，即可直接从图形中恢复替代的剪接信息和特定于文库的表达信息。我们将该算法实现为Velvet汇编程序的后处理模块。我们通过使用果蝇的已知基因组模拟果蝇的转录组装配，并使用可公开获得的RNA-Seq文库进行果蝇的转录组装配来验证算法。在一定条件下，我们的算法比Oases或Trans-ABySS具有更高的特异性来恢复序列和其他剪接连接点。结论由于我们的后处理算法消耗的内存不如Velvet，并且不像Oases那样占用大量内存，因此它使生物学家可以组装计算资源有限的大型图书馆。我们的算法已应用于上一篇文章中报道的非模型吹蝇Lucilia sericata的转录组组装，这表明该组装具有较高的质量，它有助于将Lucilia sericata转录组与果蝇和两种蚊子进行比较，选择性剪接的预测和实验验证，不同发育阶段之间差异表达的研究以及转座因子的鉴定。

著录项

来源
《BMC Genomics》 |2014年第5期|共页
作者
Sing-Hoi Sze; Aaron M Tarone;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医学遗传学;
关键词

相似文献

外文文献
中文文献
专利

1. A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data [J] . Sing-Hoi Sze, Aaron M Tarone BMC Genomics . 2014,第SUPPLEMENTa5期

机译：一种高效的内存算法，可从RNA-Seq数据的de Bruijn图获得剪接图和从头表达估计
2. Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era [J] . Raffaella Rizzi, Stefano Beretta, Murray Patterson, Quantitative biology . 2019,第4期

机译：重叠图和de Bruijn图：大数据时代中从头基因组组装的数据结构
3. Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era [J] . Raffaella Rizzi, Stefano Beretta, Murray Patterson, 定量生物学（英文版） . 2019,第004期

机译：重叠图和de Bruijn图：大数据时代中从头基因组组装的数据结构
4. A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-seq data [C] . Sze Sing-Hoi, Tarone Aaron M. IEEE International Conference on Computational Advances in Bio and Medical Sciences . 2013

机译：一种高效的内存算法，可从RNA序列数据的de Bruijn图获得剪接图和从头表达估计
5. De novo protein structure modeling from cryoem data through a dynamic programming algorithm in the secondary structure topology graph. [D] . Al Nasr, Kamal H. 2012

机译：从低温蛋白质数据通过二级结构拓扑图中的动态编程算法从头进行蛋白质结构建模。
6. A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data [O] . Sing-Hoi Sze, Aaron M Tarone 2014

机译：一种高效的内存算法可从RNA-Seq数据的de Bruijn图获得剪接图和从头表达估计
7. A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data [O] . Sze, Sing-Hoi, Tarone, Aaron M 2014

机译：一种高效的内存算法，可从RNA-Seq数据的de Bruijn图获得剪接图和从头表达估计

A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data

摘要

著录项

相似文献

相关主题

期刊订阅