...
首页> 外文期刊>LIPIcs : Leibniz International Proceedings in Informatics >Rainbowfish: A Succinct Colored de Bruijn Graph Representation
【24h】

Rainbowfish: A Succinct Colored de Bruijn Graph Representation

机译:彩虹鱼:简洁的彩色de Bruijn图表示

获取原文

摘要

The colored de Bruijn graph- a variant of the de Bruijn graph which associates each edge (i.e., k-mer) with some set of colors - is an increasingly important combinatorial structure in computational biology. Iqbal et al. demonstrated the utility of this structure for representing and assembling a collection (population) of genomes, and showed how it can be used to accurately detect genetic variants. Muggli et al. introduced VARI, a representation of the colored de Bruijn graph that adopts the BOSS representation for the de Bruijn graph topology and achieves considerable savings in space over Cortex, albeit with some sacrifice in speed. The memory-efficient representation of VARI allows the colored de Bruijn graph to be constructed and analyzed for large datasets, beyond what is possible with Cortex. In this paper, we introduce Rainbowfish, a succinct representation of the color information of the colored de Bruijn graph that reduces the space usage even further. Our representation also uses BOSS to represent the de Bruijn graph, but decomposes the color sets based on an equivalence relation and exploits the inherent skewness in the distribution of these color sets. The Rainbowfish representation is compressed based on the 0th-order entropy of the color sets, which can lead to a significant reduction in the space required to store the relevant information for each edge. In practice, Rainbowfish achieves up to a 20x improvement in space over VARI. Rainbowfish is written in C++11 and is available at https://github.com/COMBINE-lab/rainbowfish.
机译:彩色的de Bruijn图是de Bruijn图的一种变体,它将每个边缘(即k-mer)与一组颜色相关联,在计算生物学中是越来越重要的组合结构。 Iqbal等。演示了此结构在表示和组装基因组集合(种群)中的效用,并展示了如何将其用于准确检测遗传变异。 Muggli等。引入了VARI,它是彩色de Bruijn图的一种表示形式,它采用了de Bruijn图拓扑的BOSS表示形式,尽管在速度上有所牺牲,但在Cortex上节省了大量空间。 VARI的内存效率表示法允许构建彩色de Bruijn图,并针对大型数据集进行分析,这超出了Cortex的能力。在本文中,我们介绍了Rainbowfish,它是彩色de Bruijn图的颜色信息的简洁表示,可进一步减少空间使用。我们的表示法也使用BOSS表示de Bruijn图,但是根据等价关系分解颜色集,并利用这些颜色集的分布中固有的偏斜度。 Rainbowfish表示基于颜色集的0阶熵进行压缩,这可以导致存储每个边缘的相关信息所需的空间大大减少。实际上,Rainbowfish的空间是VARI的20倍之多。 Rainbowfish用C ++ 11编写,可从https://github.com/COMBINE-lab/rainbowfish获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号