【24h】

A Simpler Analysis of Burrows-Wheeler Based Compression

机译:基于Burrows-Wheeler的压缩的简单分析

获取原文
获取原文并翻译 | 示例

摘要

In this paper we present a new technique for worst-case analysis of compression algorithms which are based on the Burrows-Wheeler Transform. We deal mainly with the algorithm purposed by Burrows and Wheeler in their first paper on the subject, called BWO. This algorithm consists of the following three steps: 1) Compute the Burrows-Wheeler transform of the text, 2) Convert the transform into a sequence of integers using the move-to-front algorithm, 3) Encode the integers using Arithmetic code or any order-0 encoding (possibly with run-length encoding). We prove a strong upper bound on the worst-case compression ratio of this algorithm. This bound is significantly better than bounds known to date and is obtained via simple analytical techniques. Specifically, we show that for any input string s, and μ > 1, the length of the compressed string is bounded by μ • |s|H_k(s) + log(ζ(μ)) • |s| + g_k where H_k is the k-th order empirical entropy, g_k is a constant depending only on k and on the size of the alphabet, and ζ(μ) = 1/(1~μ) + 1/(2~μ) + ... is the standard zeta function. As part of the analysis we prove a result on the compressibility of integer sequences, which is of independent interest. Finally, we apply our techniques to prove a worst-case bound on the compression ratio of a compression algorithm based on the Burrows-Wheeler transform followed by distance coding, for which worst-case guarantees have never been given. We prove that the length of the compressed string is bounded by 1.7286 • |s|H_k(s) + g_k. This bound is better than the bound we give for BW0.
机译:在本文中,我们提出了一种基于Burrows-Wheeler变换的压缩算法最坏情况分析的新技术。我们主要处理Burrows和Wheeler在其有关该主题的第一篇论文中提出的算法,称为BWO。此算法包括以下三个步骤:1)计算文本的Burrows-Wheeler变换,2)使用前移算法将变换转换为整数序列,3)使用算术代码或任何其他方法对整数进行编码0级编码(可能使用游程长度编码)。我们证明了该算法在最坏情况下的压缩率的上限。该界限明显优于迄今已知的界限,可以通过简单的分析技术获得。具体来说,我们表明,对于任何输入字符串s,且μ> 1,压缩字符串的长度受μ•| s | H_k(s)+ log(ζ(μ))•| s |限制。 + g_k其中H_k是第k阶经验熵,g_k是仅取决于k和字母大小的常数,并且ζ(μ)= 1 /(1〜μ)+ 1 /(2〜μ) + ...是标准的zeta函数。作为分析的一部分,我们证明了整数序列的可压缩性的结果,该结果具有独立的意义。最后,我们运用我们的技术来证明基于Burrows-Wheeler变换并随后进行距离编码的压缩算法的压缩率的最坏情况界限,但从未给出最坏情况的保证。我们证明压缩字符串的长度以1.7286•| s | H_k(s)+ g_k为界。这个界限比我们给BW0的界限更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号