首页> 中文期刊> 《电子学报》 >基于码书索引变换的高通量DNA序列数据压缩算法

基于码书索引变换的高通量DNA序列数据压缩算法

         

摘要

提出一种高通量DNA序列数据的压缩算法。该算法先采用码书索引变换模型,将传统码书索引值的表示方法变换成由四个标准碱基字符替代的四进制数值方式,并采用一种界定替换串与非替换串的简明编码方法,接着通过信息熵的大小来决定是否进行块排序压缩变换(BWT ),最后进行前移编码变换和Huffman熵编码。在多种测序数据集上的实验结果表明,CITD在大多数情况下可以获得比本文所对比的高通量DNA专用压缩方法更优的压缩性能。%A novel high-throughput DNA sequence compression method based on codebook index transformation (CITD) is proposed .In CITD ,we used the codebook index transformation (CIT ) model ,to substitute the traditional represatation of codebook indexes by the quaternary values which are expressed by the four standard base characters ,and adopted a simple encoding method to distinguish the replaced and non-replaced substring ,and subsequently determined whether need to use the Burrow Wheeler Transfor-mation (BWT ) according to the value of information entropy ,finally used move to front (MTF ) transformation and Huffman en-tropy coding to compress the data .Experimental results on several sequencing data sets demonstrate better performance of CITD than the high-throughput DNA sequence compression algorithms cited in this paper ,in most cases .

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号