首页> 美国卫生研究院文献>other >An Adaptive Difference Distribution-based Coding with Hierarchical Tree Structure for DNA Sequence Compression
【2h】

An Adaptive Difference Distribution-based Coding with Hierarchical Tree Structure for DNA Sequence Compression

机译:自适应差分发布基于与分层树结构的DNa序列压缩编码

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Previous reference-based compression on DNA sequences do not fully exploit the intrinsic statistics by merely concerning the approximate matches. In this paper, an adaptive difference distribution-based coding framework is proposed by the fragments of nucleotides with a hierarchical tree structure. To keep the distribution of difference sequence from the reference and target sequences concentrated, the sub-fragment size and matching offset for predicting are flexible to the stepped size structure. The matching with approximate repeats in reference will be imposed with the Hamming-like weighted distance measure function in a local region closed to the current fragment, such that the accuracy of matching and the overhead of describing matching offset can be balanced. A well-designed coding scheme will make compact both the difference sequence and the additional parameters, e.g. sub-fragment size and matching offset. Experimental results show that the proposed scheme achieves 150% compression improvement in comparison with the best reference-based compressor GReEn.
机译:以前对DNA序列进行的基于参考的压缩无法仅通过考虑近似匹配来充分利用内在统计数据。本文提出了一种具有分层树结构的核苷酸片段,提出了一种基于差异分布的自适应编码框架。为了保持参考序列和目标序列集中差异序列的分布,用于预测的子片段大小和匹配偏移量对于阶跃大小结构很灵活。在近似于当前片段的局部区域中,将利用类似于汉明的加权距离测量函数强加参考近似重复的匹配,从而可以平衡匹配的准确性和描述匹配偏移的开销。精心设计的编码方案将使差异序列和附加参数(例如子片段大小和匹配的偏移量。实验结果表明,与基于最佳参考的压缩机GReEn相比,该方案可将压缩率提高150%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号