【24h】

Most Burrows-Wheeler Based Compressors Are Not Optimal

机译:大多数基于Burrows-Wheeler的压缩机都不是最佳选择

获取原文
获取原文并翻译 | 示例

摘要

We present a technique for proving lower bounds on the compression ratio of algorithms which are based on the Burrows-Wheeler Transform (BWT). We study three well known BWT-based compressors: the original algorithm suggested by Burrows and Wheeler; BWT with distance coding; and BWT with run-length encoding. For each compressor, we show a Markov source such that for asymptotically-large text generated by the source, the compression ratio divided by the entropy of the source is a constant greater than 1. This constant is 2 - ∈, 1.26, and 1.29, for each of the three compressors respectively. Our technique is robust, and can be used to prove similar claims for most BWT-based compressors (with a few notable exceptions). This stands in contrast to statistical compressors and Lempel-Ziv-style dictionary compressors, which are long known to be optimal, in the sense that for any Markov source, the compression ratio divided by the entropy of the source asymptotically tends to 1. We experimentally corroborate our theoretical bounds. Furthermore, we compare BWT-based compressors to other compressors and show that for "realistic" Markov sources they indeed perform bad and often worse than other compressors. This is in contrast with the well known fact that on English text, BWT-based compressors are superior to many other types of compressors.
机译:我们提出了一种用于证明基于Burrows-Wheeler变换(BWT)的算法的压缩率下限的技术。我们研究了三种基于BWT的著名压缩器:Burrows和Wheeler建议的原始算法;带距离编码的BWT;和带有游程长度编码的BWT。对于每个压缩器,我们都显示一个马尔可夫源,以便对于源生成的渐近大文本,压缩率除以源熵是一个大于1的常数。该常数为2-∈,1.26和1.29,分别用于三个压缩机。我们的技术是可靠的,可用于证明大多数基于BWT的压缩机的类似主张(少数值得注意的例外)。这与统计压缩器和Lempel-Ziv式字典压缩器相反,后者长期以来一直是最佳的,在某种意义上,对于任何马尔可夫源,压缩率除以源的熵渐近趋于1。证实了我们的理论界限。此外,我们将基于BWT的压缩机与其他压缩机进行了比较,结果表明,对于“现实的”马尔可夫源,它们的性能确实比其他压缩机差,而且往往比其他压缩机差。这与众所周知的事实相反,即在英文文本中,基于BWT的压缩机优于许多其他类型的压缩机。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号