首页> 美国卫生研究院文献>Journal of Computational Biology >Toward a Better Compression for DNA Sequences Using Huffman Encoding
【2h】

Toward a Better Compression for DNA Sequences Using Huffman Encoding

机译:使用霍夫曼编码对DNA序列进行更好的压缩

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, ).
机译:>由于下一代测序仪正在生成大量的DNA数据,这些基因组的长度从兆碱基到千兆葡糖不等,因此需要将这些数据压缩到更小的空间和更快的传输速度。结合了DNA序列特征的霍夫曼编码的不同实现方式被证明可以更好地压缩DNA数据。这些实现集中在选择频繁重复的概念上,以便强制倾斜的霍夫曼树,以及在编码时构造多个霍夫曼树。与标准的霍夫曼树算法相比,这些实施方案证明了5个基因组的压缩率有所改善,其压缩范围为5至50μMbp。因此,该研究表明对使用常规霍夫曼编码的所有此类DNA序列压缩算法都进行了改进。该研究表明,对使用常规霍夫曼编码的所有DNA序列压缩算法进行了改进。随附的软件是公开可用的(AL-Okaily,)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号