【24h】

A novel compression technique for DNA sequence compaction

机译:DNA序列压缩的新压缩技术

获取原文

摘要

Modern Biotechnology produces large amount of genomic data. The explosion of DNA data has given a challenge for understanding genomic structure, the disk storage and computation. It is essential for the development of efficient compression techniques to handle genomic data storage. Data compression is used to store the data in less memory. The properties of DNA sequence offer a chance to build DNA specific compression algorithms. In this paper, a novel compression technique is proposed for genomic data. In the first stage, each base in DNA sequence is converted into binary form using 2-bit encoding system. On the resultant binary string, A Modified run length encoding is applied. The output is compressed again using Huffman encoding technique in second stage. The encoded sequence is converted into ASCII characters. This technique is quite simple and effective.
机译:现代生物技术产生了大量的基因组数据。 DNA数据的爆炸式增长对理解基因组结构,磁盘存储和计算提出了挑战。开发有效的压缩技术以处理基因组数据存储至关重要。数据压缩用于将数据存储在更少的内存中。 DNA序列的特性为构建DNA特异性压缩算法提供了机会。在本文中,提出了一种新颖的基因组数据压缩技术。在第一阶段,使用2位编码系统将DNA序列中的每个碱基转换为二进制形式。在生成的二进制字符串上,将应用修改的游程长度编码。在第二阶段,使用霍夫曼编码技术再次压缩输出。编码后的序列将转换为ASCII字符。此技术非常简单有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号