Word-Based Fixed and Flexible List Compression

机译：基于单词的固定和灵活列表压缩

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a dictionary based lossless text compression scheme where we keep frequent words in separate lists (list_n contains words of length n). We pursued two alternatives in terms of the lengths of the lists. In the "fixed" approach all lists have equal number of words whereas in the "flexible" approach no such constraint is imposed. Results clearly show that the "flexible" scheme is much better in all test cases possibly due to the fact that it can acco-modate short, medium or long word lists reflecting on the word length distributions of a particular language. Our approach encodes a word as a prefix (the length of the word) and the body of the word (as an index in the corresponding list). For prefix encoding we have employed both a static encoding and a dynamic encoding (Huffman) using the word length statistics of the source language. Dynamic prefix encoding clearly outperformed its static counterpart in all cases. A language with a higher average word length can, theoretically, benefit more from a word-list based compression approach as compared to one with a lower average word length. We have put this hypothesis to test using Turkish and English languages with average word lengths of 6.1 and 4.4, respectively. Our results strongly support the validity of this hypothesis.

机译：我们提出了一种基于字典的无损文本压缩方案，其中将频繁出现的单词保留在单独的列表中（list_n包含长度为n的单词）。就列表的长度而言，我们寻求了两种选择。在“固定”方法中，所有列表的单词数均相等，而在“灵活”方法中，则不施加此类约束。结果清楚地表明，“灵活”方案在所有测试案例中都要好得多，这可能是由于它可以适应反映特定语言的字长分布的短，中或长字列表。我们的方法将一个单词编码为前缀（单词的长度）和单词的主体（作为相应列表中的索引）。对于前缀编码，我们使用源语言的字长统计信息来采用静态编码和动态编码（Huffman）。在所有情况下，动态前缀编码明显胜过其静态副本。从理论上讲，与平均单词长度较低的语言相比，平均单词长度较高的语言可以从基于单词列表的压缩方法中受益更多。我们已经使用土耳其语和英语（平均单词长度分别为6.1和4.4）测试了该假设。我们的结果强烈支持了该假设的有效性。

著录项

来源
《International Symposium on Computer and Information Sciences(ISCIS 2005); 20051026-28; Istanbul(TR)》|2005年|P.780-790|共11页
会议地点 Istanbul(TR)
作者
Ebru Celikel; Mehmet E. Dalkilic; Gokhan Dalkilic;
展开▼
作者单位

Ege University International Computer Institute, 35100 Bornova, Izmir, Turkey;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机的应用;
关键词

相似文献

外文文献
中文文献
专利

1. Multi-Stream Word-Based Compression Algorithm for Compressed Text Search [J] . Ozturk Emir, Mesut Altan, Diri Banu Arabian Journal for Science and Engineering . 2018,第12期

机译：基于多流词的压缩文本搜索算法
2. Boosting Text Compression with Word-Based Statistical Encoding [J] . Antonio Farina, Gonzalo Navarro, Jose R. Parama The Computer journal . 2012,第1期

机译：通过基于单词的统计编码提高文本压缩
3. Boosting Text Compression with Word-Based Statistical Encoding1 [J] . Antonio Fariña, Gonzalo Navarro, José R. Paramá Computer Journal, The . 2012,第1期

机译：使用基于单词的统计编码促进文本压缩 1
4. Word-Based Fixed and Flexible List Compression [C] . Ebru Celikel, Mehmet E. Dalkilic, Gokhan Dalkilic International Symposium on Computer and Information Sciences . 2005

机译：基于Word的固定和灵活列表压缩
5. Stress response of bovine artery and rat brain tissue due to combined translational shear and fixed unconfined compression. [D] . Leahy, Lauren. 2015

机译：牛动脉和大鼠脑组织的应力响应归因于平移剪切力和固定无侧限压缩。
6. Experimental investigation of performance characteristics of compression-ignition engine with biodiesel blends of Jatropha oil coconut oil at fixed compression ratio [O] . Yogendra Rathore, Dinesh Ramchandani, R.K. Pandey 2019

机译：麻风树油和椰子油生物柴油混合气在固定压缩比下压燃发动机性能特性的实验研究
7. A new word-based compression model allowing compressed pattern matching [O] . Halil Nusret BULUŞ, Aydın CARUS, Altan MESUT 2017

机译：一种新的基于词的压缩模型，允许压缩模式匹配

Word-Based Fixed and Flexible List Compression

摘要

著录项

相似文献

相关主题

期刊订阅