首页> 外文OA文献 >動的な接尾辞木を用いた反辞書データ圧縮
【2h】

動的な接尾辞木を用いた反辞書データ圧縮

机译:使用动态后缀树的抗字典数据压缩

摘要

Data compression is particularly useful in systems where resources arescarce, e.g., limited bandwidth in communication systems and the capacityof storage systems. A wide variety of data compression algorithms have beenproposed for inherently digital data such as text and digitized audio, imageand video.To compress an input string efficiently, data compression algorithms typicallyuse a dictionary to construct statistical models and replace substringswith indices in the dictionary. The dictionary is the set of all substrings ofan input string, and it is represented by a tree representation such as a suffixtree in many applications.In 2000, Crochemore et al. [CMRS00] proposed an off-line lossless datacompression algorithm using an antidictionary of an input binary string.The antidictionary is the set of all minimal strings that never appear inthe string. They showed that their method, called Data Compression usingAntidictionaries (DCA), achieves compression ratios that are as goodas the Lempel-Ziv (LZ) algorithms [ZL77, ZL78]. In 2005, to improve thecompression ratios, Ohkawa, Harada and Yamamoto [OHY05] applied anarithmetic coding to an off-line DCA method for binary strings. In otherwords, the authors used antidictionaries as statistical models for an arithmeticcoding. It was shown by simulation that their method, called Ohkawa-Harada-Yamamoto (OHY) method, achieves better compression ratios thanthe DCA method.Both a dictionary and an antidictionary are useful for data compression.The combination of the antidictionary and the dictionary are expected toprovide efficient statistical models for arithmetic coding.The main goal of this thesis is to achieve: Construction of an antidictionary using a suffix tree in linear time andspace. An on-line DCA method using dynamic suffix trees in linear time andspace. An on-line arithmetic coding based on antidictionaries using dynamicsuffix trees in linear time and space. One-pass lossless data compression for ElectroCardioGrams (ECGs)using antidictionaries.Traditional construction algorithm of an antidictionary using suffix treesrequires quadratic computational time with respect to a given string length.To reduce this computational time, we introduce a new kind of pointer calledan MF-link in the suffix tree. By using MF-links, the proposed constructionalgorithm operates in linear time and space. We prove that the proposed algorithmworks in linear time and space with respect to the string length, andexperimental results show that the proposed algorithm is fast and memoryefficient.On-line DCA methods can achieve better compression ratios than off-lineDCA methods. However, on-line traditional DCA methods need quadraticcomputational time with respect to the string length, since the method requiresupdating the antidictionary and its encoder when a new symbol isread. To reduce this computational time, we show useful conditions to implementthe on-line DCA using only suffix trees without constructing the antidictionary.By using these conditions, we propose an on-line DCA methodusing dynamic suffix trees without updating antidictionaries. Moreover, weapply arithmetic coding to the proposed algorithm. We prove that the proposedalgorithms work in linear time and space with respect to the stringlength. Experimental results show that the proposed method applied with anarithmetic coding achieves better compression ratios than traditional on-lineDCA for almost all files on Calgary Corpus [Cal], and this approach givesa 3% decrease in compressed file size relative the on-line DCA method anda 1% decrease in the size relative to the OHY by simulation results for theCalgary Corpus.Finally, we propose a one-pass ECG lossless compression method usingantidictionaries. The proposed algorithm constructs an encoder of the DCAfrom the substring of ECG, called the learning string, by means of the propertythat each ECG signal is an almost periodic waveform. We show thatthe length of learning string needed to construct the antidictionary whosesize is almost the same as that of the entire string of ECG using the resultsof coupon collector’s problem. Experimental results show that the proposedalgorithm gives 15% decrease in compressed file size relative to the LZ algorithmsfor ECG files of the MIT-BIH Arrhythmia Database [MIT]. Moreover,it is shown that the proposed algorithm can implement to compress the ECGfiles for real-time by simulation results.
机译:数据压缩在资源稀少的系统中特别有用,例如,通信系统中的带宽有限以及存储系统的容量。对于诸如文本和数字化音频,图像和视频之类的固有数字数据,已经提出了各种各样的数据压缩算法。为了有效地压缩输入字符串,数据压缩算法通常使用字典来构建统计模型并用字典中的索引替换子字符串。字典是输入字符串的所有子字符串的集合,并且在许多应用程序中用树形表示形式(例如后缀树)表示。2000年,Crochemore等人提出。 [CMRS00]提出了一种使用输入二进制字符串的反字典的离线无损数据压缩算法。反字典是从未出现在字符串中的所有最小字符串的集合。他们证明了他们的方法,即使用Antidictionaries(DCA)进行数据压缩,可实现与Lempel-Ziv(LZ)算法[ZL77,ZL78]相同的压缩率。 2005年,为提高压缩率,大川,原田和山本[OHY05]将模拟编码应用于二进制字符串的离线DCA方法。换句话说,作者使用反词典作为算术编码的统计模型。通过仿真表明,他们的称为Ohkawa-Harada-Yamamoto(OHY)的方法比DCA方法具有更好的压缩率,字典和反字典都可用于数据压缩,并且预期将字典和反字典结合使用本文的主要目的是:在线性时间和空间上使用后缀树构造一个字典。一种在线性时间和空间中使用动态后缀树的在线DCA方法。一种基于字典的在线算术编码,使用动态后缀树在线性时空中进行。使用抗字典对ElectroCardioGrams(ECG)进行一次无损数据压缩。使用后缀树的抗字典的传统构造算法需要针对给定的字符串长度进行二次计算,为了减少这种计算时间,我们引入了一种称为MF-的新型指针后缀树中的链接。通过使用MF链接,建议的构造算法在线性时间和空间中运行。我们证明了该算法相对于字符串长度在线性时间和空间上是可行的,并且实验结果表明该算法是快速且高效的。在线DCA方法比离线DCA方法可以获得更好的压缩率。但是,在线传统DCA方法相对于字符串长度需要二次计算时间,因为该方法要求在读取新符号时更新解字典及其编码器。为了减少此计算时间,我们展示了仅使用后缀树而不构建反字典的在线DCA的有用条件。通过使用这些条件,我们提出了使用动态后缀树而不更新字典的在线DCA方法。此外,我们将算术编码应用于所提出的算法。我们证明了所提出的算法相对于字符串长度在线性时间和空间上有效。实验结果表明,与卡尔加里语料库[Cal]上的几乎所有文件相比,拟议的编码方法与传统的在线DCA相比具有更好的压缩率,与在线DCA方法相比,该方法压缩文件的大小减少了3%。卡尔加里语料库的仿真结果表明,相对于OHY,尺寸减小了1%。最后,我们提出了一种使用字典的ECG无损压缩方法。该算法利用每个ECG信号几乎都是周期波形的特性,从ECG子串(称为学习串)构造DCA编码器。我们显示了使用优惠券收集者问题的结果来构造尺寸与整个ECG字符串几乎相同的字典的学习字符串的长度。实验结果表明,相对于MIT-BIH心律失常数据库[MIT]的ECG文件的LZ算法,该算法使压缩文件大小减少了15%。此外,通过仿真结果表明,该算法可以实现实时压缩ECG文件。

著录项

  • 作者

    太田 隆博; Takahiro Ota;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号