首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Scalable data structure to compress next-generation sequencing files and its application to compressive genomics
【24h】

Scalable data structure to compress next-generation sequencing files and its application to compressive genomics

机译:可扩展的数据结构以压缩下一代测序文件及其应用于压缩基因组学

获取原文

摘要

It is now possible to compress and decompress large-scale Next-Generation Sequencing files taking advantage of high-performance computing techniques. To this end, we have recently introduced a scalable hybrid parallel algorithm, called phyNGSC, which allows fast compression as well as decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. In this paper we present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in compresso data processing. Using our data structure compression and decompression throughputs of up to 8.71 GB/s and 10.12 GB/s were observed. Our proposed structure and methodology brings us one step closer to compressive genomics and sublinear analysis of big NGS datasets. The code for this implementation is available at https://github.com/pcdslab/PHYNGSD
机译:现在可以使用高性能计算技术压缩和解压缩大规模的下一代测序文件。为此,我们最近推出了一种可扩展的混合并行算法,称为PHYNGSC,其允许快速压缩以及通过MPI和OpenMP使用分布式和共享的内存编程模型的Big FastQ数据集的减压。在本文中,我们介绍了一种新颖的并行数据结构的设计和实现,其减少了对减压的依赖性,并利用如在Compresso数据处理中识别的技术中使用细粒度的减压来处理其压缩状态的DNA序列。使用我们的数据结构压缩和减压吞吐量高达8.71 GB / s和10.12 Gb / s。我们所提出的结构和方法使我们更接近大NGS数据集的压缩基因组学和Sublinear分析。此实现的代码可用于https://github.com/pcdslab/phyngsd

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号