首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Scalable data structure to compress next-generation sequencing files and its application to compressive genomics
【24h】

Scalable data structure to compress next-generation sequencing files and its application to compressive genomics

机译:可压缩的数据结构可压缩下一代测序文件及其在压缩基因组学中的应用

获取原文

摘要

It is now possible to compress and decompress large-scale Next-Generation Sequencing files taking advantage of high-performance computing techniques. To this end, we have recently introduced a scalable hybrid parallel algorithm, called phyNGSC, which allows fast compression as well as decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. In this paper we present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in compresso data processing. Using our data structure compression and decompression throughputs of up to 8.71 GB/s and 10.12 GB/s were observed. Our proposed structure and methodology brings us one step closer to compressive genomics and sublinear analysis of big NGS datasets. The code for this implementation is available at https://github.com/pcdslab/PHYNGSD
机译:现在,可以利用高性能计算技术来压缩和解压缩大规模的下一代测序文件。为此,我们最近引入了一种可扩展的混合并行算法phyNGSC,该算法允许通过MPI和OpenMP使用分布式和共享内存编程模型对大型FASTQ数据集进行快速压缩和解压缩。在本文中,我们介绍了一种新颖的并行数据结构的设计和实现,该结构减少了对解压缩的依赖性,并简化了以压缩数据处理中确定的技术使用细粒度解压缩的DNA序列在压缩状态下的处理。使用我们的数据结构,压缩和解压缩吞吐量分别达到8.71 GB / s和10.12 GB / s。我们提出的结构和方法使我们更接近大型NGS数据集的压缩基因组学和亚线性分析。可在https://github.com/pcdslab/PHYNGSD上获得此实现的代码

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号