首页> 外文会议>International Conference on Audio, Language and Image Processing >K-means clustering based compression algorithm for the high-throughput DNA sequence
【24h】

K-means clustering based compression algorithm for the high-throughput DNA sequence

机译:基于K均值聚类的高通量DNA序列压缩算法

获取原文

摘要

This paper proposes a compression algorithm based on K-means clustering for high-through DNA sequence (DNAC-K). In DNAC-K, we create cluster of sequences based on K-means clustering method at first, then iterate clusters according to the edit distances of subsequences, and finally, adopt Huffman coding to encode the result of clustering result. Experimental results on several sequencing data sets demonstrate better performance of DNAC-K than many of the current high-throughput DNA sequence compression algorithms.
机译:针对高通量DNA序列(DNAC-K),提出了一种基于K均值聚类的压缩算法。在DNAC-K中,首先基于K-means聚类方法创建序列聚类,然后根据子序列的编辑距离对聚类进行迭代,最后采用霍夫曼编码对聚类结果进行编码。在多个测序数据集上的实验结果证明,与许多当前的高通量DNA序列压缩算法相比,DNAC-K具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号