首页> 美国卫生研究院文献>Bioinformatics >Data-dependent bucketing improves reference-free compression of sequencing reads
【2h】

Data-dependent bucketing improves reference-free compression of sequencing reads

机译:依赖数据的存储桶可改善无序读段的无参考压缩

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: The storage and transmission of high-throughput sequencing data consumes significant resources. As our capacity to produce such data continues to increase, this burden will only grow. One approach to reduce storage and transmission requirements is to compress this sequencing data.>Results: We present a novel technique to boost the compression of sequencing that is based on the concept of bucketing similar reads so that they appear nearby in the file. We demonstrate that, by adopting a data-dependent bucketing scheme and employing a number of encoding ideas, we can achieve substantially better compression ratios than existing de novo sequence compression tools, including other bucketing and reordering schemes. Our method, Mince, achieves up to a 45% reduction in file sizes (28% on average) compared with existing state-of-the-art de novo compression schemes.>Availability and implementation: Mince is written in C++11, is open source and has been made available under the GPLv3 license. It is available at .>Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机:高通量测序数据的存储和传输会消耗大量资源。随着我们产生此类数据的能力不断提高,这种负担只会增加。降低存储和传输需求的一种方法是压缩此测序数据。>结果:我们提出了一种新颖的技术,该技术基于对相似读段进行分批存储以使它们出现在附近的概念,从而提高了序列的压缩率。在文件中。我们证明,通过采用依赖数据的存储桶方案并采用多种编码思想,我们可以比现有的从头序列压缩工具(包括其他存储桶和重新排序方案)获得更好的压缩率。与现有的最新de novo压缩方案相比,我们的方法Mince可将文件大小减少多达45%(平均28%)。>可用性和实现:编写了Mince C ++ 11中的C#是开源的,并已根据GPLv3许可证提供。可以从以下网站获取。>联系方式: >补充信息:可以从在线生物信息学获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号