【24h】

Word-Based Fixed and Flexible List Compression

机译:基于单词的固定和灵活列表压缩

获取原文
获取原文并翻译 | 示例

摘要

We present a dictionary based lossless text compression scheme where we keep frequent words in separate lists (list_n contains words of length n). We pursued two alternatives in terms of the lengths of the lists. In the "fixed" approach all lists have equal number of words whereas in the "flexible" approach no such constraint is imposed. Results clearly show that the "flexible" scheme is much better in all test cases possibly due to the fact that it can acco-modate short, medium or long word lists reflecting on the word length distributions of a particular language. Our approach encodes a word as a prefix (the length of the word) and the body of the word (as an index in the corresponding list). For prefix encoding we have employed both a static encoding and a dynamic encoding (Huffman) using the word length statistics of the source language. Dynamic prefix encoding clearly outperformed its static counterpart in all cases. A language with a higher average word length can, theoretically, benefit more from a word-list based compression approach as compared to one with a lower average word length. We have put this hypothesis to test using Turkish and English languages with average word lengths of 6.1 and 4.4, respectively. Our results strongly support the validity of this hypothesis.
机译:我们提出了一种基于字典的无损文本压缩方案,其中将频繁出现的单词保留在单独的列表中(list_n包含长度为n的单词)。就列表的长度而言,我们寻求了两种选择。在“固定”方法中,所有列表的单词数均相等,而在“灵活”方法中,则不施加此类约束。结果清楚地表明,“灵活”方案在所有测试案例中都要好得多,这可能是由于它可以适应反映特定语言的字长分布的短,中或长字列表。我们的方法将一个单词编码为前缀(单词的长度)和单词的主体(作为相应列表中的索引)。对于前缀编码,我们使用源语言的字长统计信息来采用静态编码和动态编码(Huffman)。在所有情况下,动态前缀编码明显胜过其静态副本。从理论上讲,与平均单词长度较低的语言相比,平均单词长度较高的语言可以从基于单词列表的压缩方法中受益更多。我们已经使用土耳其语和英语(平均单词长度分别为6.1和4.4)测试了该假设。我们的结果强烈支持了该假设的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号