【24h】

A Parallel Algorithm for Compression of Big Next-Generation Sequencing Datasets

机译:压缩大型下一代测序数据集的并行算法

获取原文

摘要

The amount of big data from high-throughput Next-Generation Sequencing (NGS) techniques represents various challenges such as storage, analysis and transmission of massive datasets. One solution to storage and transmission of data is compression using specialized compression algorithms. The existing specialized algorithms suffer from poor scalability with increasing size of the datasets and best available solutions can take hours to compress gigabytes of data. Compression and decompression using these techniques for peta-scale data sets is prohibitively expensive in terms of time and energy. In this paper we introduce paraDSRC, a parallel implementation of the DNA Sequence Reads Compression (DSRC) application using a message passing model that presents reduction of the compression time complexity by a factor of O(1/p) (where p is the number of processing units). Our experimental results show that paraDSRC achieves compression times that are 43% to 99% faster than DSRC and compression throughputs of up to 8.4GB/s on a moderate size cluster. For many of the datasets used in our experiments super-linear speedups have been registered making the implementation strongly scalable. We also show that paraDSRC is more than 25.6x faster than comparable parallel compression algorithms. The code is available for free-academic use at https://github.com/PCDS/paraDSRC.
机译:高通量下一代测序(NGS)技术带来的大数据量代表着各种挑战,例如海量数据集的存储,分析和传输。一种存储和传输数据的解决方案是使用专用压缩算法进行压缩。现有的专用算法由于数据集规模的增加而具有可伸缩性差的问题,最佳的可用解决方案可能需要数小时才能压缩千兆字节的数据。在时间和精力方面,使用这些技术对Peta级数据集进行压缩和解压缩非常昂贵。在本文中,我们介绍paraDSRC,这是一种使用消息传递模型的DNA序列读取压缩(DSRC)应用程序的并行实现,该模型将压缩时间复杂度降低了O(1 / p)倍(其中p是处理单位)。我们的实验结果表明,在中等大小的群集上,paraDSRC的压缩时间比DSRC快43%至99%,压缩吞吐量高达8.4GB / s。对于我们实验中使用的许多数据集,已经记录了超线性加速,从而使实现具有很强的可扩展性。我们还显示paraDSRC比同类并行压缩算法快25.6倍以上。可以在https://github.com/PCDS/paraDSRC上免费使用该代码。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号