首页> 外文会议>Data compression conference >Complementary Contextual Models with FM-Index for DNA Compression
【24h】

Complementary Contextual Models with FM-Index for DNA Compression

机译:具有FM-Index的DNA压缩互补上下文模型

获取原文

摘要

Demanding for efficient compression and storage of DNA sequences has been rising with the rapid growth of DNA sequencing technologies. Existing reference-based algorithms map all patterns to regions found in the reference sequence, which lead to redundancy of incomplete similarity. This paper proposes an efficient reference-based method for DNA sequence compression that integrates FM-index and complementary context models to improve compression performance. The proposed method introduces FM-index to represent the full-text matching for exact repeats between the target and reference sequences. For unmatched symbols, complementary context models are leveraged to make weighted estimation conditioned on variable-order contexts. Reversed reference index is used to guarantee the longest match of variable-length substrings. Experimental results show that the proposed method can achieve a 213-fold compression ratio when tested on the first Korean personal genome sequence data set.
机译:随着DNA测序技术的快速发展,对有效压缩和存储DNA序列的要求不断提高。现有的基于参考的算法将所有模式映射到参考序列中找到的区域,这导致不完全相似的冗余。本文提出了一种有效的基于参考的DNA序列压缩方法,该方法整合了FM索引和互补上下文模型以提高压缩性能。所提出的方法引入了FM-index来表示目标序列和参考序列之间精确重复的全文匹配。对于不匹配的符号,利用互补的上下文模型以可变顺序上下文为条件进行加权估计。反向引用索引用于确保可变长度子字符串的最长匹配。实验结果表明,该方法在第一个韩国个人基因组序列数据集上进行测试时,可以达到213倍的压缩率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号