首页> 美国卫生研究院文献>Algorithms for Molecular Biology : AMB >ReCoil - an algorithm for compression of extremely large datasets of dna data
【2h】

ReCoil - an algorithm for compression of extremely large datasets of dna data

机译:ReCoil-一种用于压缩超大型dna数据集的算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The growing volume of generated DNA sequencing data makes the problem of its long term storage increasingly important. In this work we present ReCoil - an I/O efficient external memory algorithm designed for compression of very large collections of short reads DNA data. Typically each position of DNA sequence is covered by multiple reads of a short read dataset and our algorithm makes use of resulting redundancy to achieve high compression rate.While compression based on encoding mismatches between the dataset and a similar reference can yield high compression rate, good quality reference sequence may be unavailable. Instead, ReCoil's compression is based on encoding the differences between similar or overlapping reads. As such reads may appear at large distances from each other in the dataset and since random access memory is a limited resource, ReCoil is designed to work efficiently in external memory, leveraging high bandwidth of modern hard disk drives.
机译:生成的DNA测序数据的数量不断增长,使其长期存储的问题变得越来越重要。在这项工作中,我们介绍了ReCoil-一种I / O高效的外部存储算法,旨在压缩大量短读DNA数据。通常,DNA序列的每个位置都由短读取数据集的多次读取覆盖,并且我们的算法利用产生的冗余来实现高压缩率。虽然基于数据集和类似参考之间编码不匹配的压缩可以产生高压缩率,但是质量参考序列可能不可用。相反,ReCoil的压缩基于对相似或重叠读段之间差异的编码。由于这样的读取可能在数据集中彼此之间相距很远,并且由于随机存取存储器是一种有限的资源,因此ReCoil被设计为可利用现代硬盘驱动器的高带宽在外部存储器中高效地工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号