首页> 外文期刊>Algorithms for Molecular Biology >ReCoil - an algorithm for compression of extremely large datasets of dna data
【24h】

ReCoil - an algorithm for compression of extremely large datasets of dna data

机译:ReCoil-一种压缩dna数据超大数据集的算法

获取原文
           

摘要

The growing volume of generated DNA sequencing data makes the problem of its long term storage increasingly important. In this work we present ReCoil - an I/O efficient external memory algorithm designed for compression of very large collections of short reads DNA data. Typically each position of DNA sequence is covered by multiple reads of a short read dataset and our algorithm makes use of resulting redundancy to achieve high compression rate. While compression based on encoding mismatches between the dataset and a similar reference can yield high compression rate, good quality reference sequence may be unavailable. Instead, ReCoil's compression is based on encoding the differences between similar or overlapping reads. As such reads may appear at large distances from each other in the dataset and since random access memory is a limited resource, ReCoil is designed to work efficiently in external memory, leveraging high bandwidth of modern hard disk drives.
机译:生成的DNA测序数据的数量不断增长,使其长期存储的问题变得越来越重要。在这项工作中,我们介绍了ReCoil-一种I / O高效的外部存储算法,旨在压缩大量短读DNA数据。通常,DNA序列的每个位置都由短读取数据集的多次读取覆盖,并且我们的算法利用产生的冗余来实现高压缩率。尽管基于数据集和相似参考之间的编码不匹配的压缩可以产生高压缩率,但高质量的参考序列可能不可用。相反,ReCoil的压缩基于对相似或重叠读段之间差异的编码。由于这样的读取可能在数据集中彼此之间相距很远,并且由于随机存取存储器是一种有限的资源,因此ReCoil被设计为可利用现代硬盘驱动器的高带宽在外部存储器中高效工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号