Fast lossless compression via cascading Bloom filters

Roye Rozov; Ron Shamir; Eran Halperin

首页> 外文期刊>BMC Bioinformatics >Fast lossless compression via cascading Bloom filters

【24h】

Fast lossless compression via cascading Bloom filters

机译：通过级联Bloom滤波器实现快速无损压缩

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Data from large Next Generation Sequencing (NGS) experiments present challenges both in terms of costs associated with storage and in time required for file transfer. It is sometimes possible to store only a summary relevant to particular applications, but generally it is desirable to keep all information needed to revisit experimental results in the future. Thus, the need for efficient lossless compression methods for NGS reads arises. It has been shown that NGS-specific compression schemes can improve results over generic compression methods, such as the Lempel-Ziv algorithm, Burrows-Wheeler transform, or Arithmetic Coding. When a reference genome is available, effective compression can be achieved by first aligning the reads to the reference genome, and then encoding each read using the alignment position combined with the differences in the read relative to the reference. These reference-based methods have been shown to compress better than reference-free schemes, but the alignment step they require demands several hours of CPU time on a typical dataset, whereas reference-free methods can usually compress in minutes. Results We present a new approach that achieves highly efficient compression by using a reference genome, but completely circumvents the need for alignment, affording a great reduction in the time needed to compress. In contrast to reference-based methods that first align reads to the genome, we hash all reads into Bloom filters to encode, and decode by querying the same Bloom filters using read-length subsequences of the reference genome. Further compression is achieved by using a cascade of such filters. Conclusions Our method, called BARCODE, runs an order of magnitude faster than reference-based methods, while compressing an order of magnitude better than reference-free methods, over a broad range of sequencing coverage. In high coverage (50-100 fold), compared to the best tested compressors, BARCODE saves 80-90% of the running time while only increasing space slightly.

机译：来自大型下一代测序（NGS）实验的背景数据在与存储相关的成本以及文件传输所需的时间方面都提出了挑战。有时可能仅存储与特定应用程序相关的摘要，但是通常希望保留将来重新访问实验结果所需的所有信息。因此，需要用于NGS读取的有效的无损压缩方法。已经显示，NGS特定的压缩方案可以比通用压缩方法（例如Lempel-Ziv算法，Burrows-Wheeler变换或算术编码）提高结果。当参考基因组可用时，可以通过以下方法实现有效的压缩：首先将读数与参考基因组比对，然后使用比对位置结合读数相对于参考的差异，对每个读数进行编码。这些基于参考的方法已显示出比无参考方案更好的压缩效果，但是它们的对齐步骤需要在典型数据集上花费几小时的CPU时间，而无参考方法通常可以在数分钟内完成压缩。结果我们提出了一种新方法，该方法可通过使用参考基因组实现高效压缩，但完全避免了比对的需要，从而大大减少了压缩所需的时间。与首先将读取序列与基因组对齐的基于参考的方法相反，我们将所有读取哈希散列到Bloom过滤器中进行编码，并通过使用参考基因组的读取长度子序列查询相同的Bloom过滤器进行解码。通过使用这种滤波器的级联来进一步压缩。结论我们的称为BARCODE的方法比基于参考的方法运行速度快一个数量级，而在比对范围更广的序列范围内，其压缩效果比无参考方法好一个数量级。与经过最佳测试的压缩机相比，在高覆盖率（50-100倍）中，BARCODE节省了80-90％的运行时间，而仅略微增加了空间。

著录项

来源
《BMC Bioinformatics》 |2014年第9期|共页
作者
Roye Rozov; Ron Shamir; Eran Halperin;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. Improved Compression of DNA Sequencing Data with Cascading Bloom Filters [J] . Salikhov Kamil International Journal of Foundations of Computer Science . 2018,第8期

机译：通过级联绽放过滤器改善DNA测序数据的压缩
2. Lossless compression of hyperspectral imagery using a fast adaptive-length-prediction RLS filter [J] . Jinwei Song, Li Zhou, Chao Deng, Remote sensing letters . 2019,第4a6期

机译：使用快速自适应长度预测RLS滤波器对高光谱图像进行无损压缩
3. Cascaded Direction Filtering for Fast Multidirectional Inter-Prediction in H.264/AVC Main and High Profile Compression [J] . Rhee C. E., Kim J.-S., Lee H.-J. Circuits and Systems for Video Technology, IEEE Transactions on . 2012,第3期

机译：H.264 / AVC主压缩和高压缩压缩中的级联方向滤波用于快速多方向帧间预测
4. Bandwidth Reduction in SNMP Monitoring System with Bloom Filter using Lossless Compression [C] . Warinda Kiatdherarat, Chayakorn Netramai International Conference on Science and Technology . 2015

机译：使用无损压缩的Bloom滤波器带宽减少SNMP监控系统
5. Novel Jpeg 2000 Compression for Faster Medical Image Streaming and Diagnostically Lossless Quality = [D] . Pambrun, Jean-Francois. 2016

机译：新型Jpeg 2000压缩可实现更快的医学图像流传输和诊断无损质量=
6. Fast lossless compression via cascading Bloom filters [O] . Roye Rozov, Ron Shamir, Eran Halperin 2014

机译：通过级联Bloom滤波器实现快速无损压缩
7. Fast lossless compression via cascading Bloom filters [O] . 2014

机译：通过级联Bloom滤波器实现快速无损压缩
8. Low-Complexity Lossless Compression of Hyperspectral Imagery via Adaptive Filtering [R] . Klimesh, M. 2005

机译：基于自适应滤波的高光谱图像低复杂度无损压缩

Fast lossless compression via cascading Bloom filters

摘要

著录项

相似文献

相关主题

期刊订阅