首页> 外文会议>Data compression conference >Grammatical Ziv-Lempel Compression: Achieving PPM-Class Text Compression Ratios with LZ-Class Decompression Speed
【24h】

Grammatical Ziv-Lempel Compression: Achieving PPM-Class Text Compression Ratios with LZ-Class Decompression Speed

机译:语法Ziv-Lempel压缩:以LZ类解压缩速度实现PPM类文本压缩率

获取原文

摘要

GLZA is a free, open-source, enhanced grammar-based compressor that constructs a low entropy grammar amenable to entropy coding, using a greedy hill-climbing search guided by estimates of encoded string lengths; the estimates are efficiently computed incrementally during (parallelized) suffix tree construction in a batched iterative repeat replacement cycle. The grammar-coded symbol stream is further compressed by order-1 Markov modeling of trailing/leading subsymbols and selective recency modeling, MTF-coding only symbols that tend to recur soon. This combination results in excellent compression ratios—similar to PPMC's for small files, averaging within about five percent of PPMd's for large text files (1 MB – 10 MB)—with fast decompression on one core or two. Compression time and memory use are not dramatically higher than for similarly high-performance asymmetrical compressors of other kinds. GLZA is on the Pareto frontier for text compression ratio and decompression speed on a variety of benchmarks (LTCB, Calgary, Canterbury, Large Canterbury, Silesia, Maximum Compression, World Compression Challenge), compressing better and/or decompressing faster than its competitors (PPM, LZ77-Markov, BWT, etc.), with better compression ratios than previous grammar-based compressors such as RePair, Sequitur, Offline 3 (Greedy), Sequential/grzip, and IRR-S.
机译:GLZA是一种免费的,开放源代码的,基于增强语法的压缩器,它使用贪婪的爬山式搜索(以估计的字符串长度为指导)来构造适合熵编码的低熵语法;在(并行)后缀树构建过程中,以批处理的迭代重复替换周期有效地递增估算值。语法编码的符号流通过尾随/前导子符号的1阶马尔可夫建模和选择性新近建模进一步压缩,其中MTF编码仅倾向于很快复现的符号。这种组合可实现出色的压缩率(类似于小型文件的PPMC,平均约为大型文本文件(1 MB – 10 MB)的PPMd的百分之五),并在一两个内核上实现快速解压缩。压缩时间和内存使用没有比其他类似的高性能非对称压缩器高很多。 GLZA在各种基准(LTCB,卡尔加里,坎特伯雷,大坎特伯雷,西里西亚,最大压缩,世界压缩挑战)上,在文本压缩率和解压缩速度方面均处于帕累托前沿,比其竞争对手(PPM)压缩更好和/或解压缩得更快。 ,LZ77-Markov,BWT等),比以前的基于语法的压缩器(如RePair,Sequitur,Offline 3(Greedy),Sequential / grzip和IRR-S)具有更好的压缩率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号