Alphabet Permutation for Differentially Encoding Text

机译：差异编码文本的字母排列

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

One degree of freedom not usually exploited in developing high-performance text-processing algorithms is the encoding of the underlying atomic character set. Here we consider a text compression method where the specific character set collating-sequence employed in encoding the text has a big impact on performance. We demonstrate that permuting the standard character collating-sequences yields a small win on Asian-language texts over gzip. We also show improved compression with our method for English texts, although not by enough to beat standard methods. However, we also design a class of artificial languages on which our method clearly beats gzip, often by an order of magnitude.

机译：通常在开发高性能文本处理算法方面不利用的一种自由度是底层原子字符集的编码。在这里，我们考虑一种文本压缩方法，其中编码文本中使用的特定字符集进行整理序列对性能产生了很大影响。我们证明，违反标准字符整理序列在GZIP上亚洲语言文本中产生小胜利。我们还显示了我们对英语文本的方法来提高压缩，虽然不是足以击败标准方法。然而，我们还设计了一类人工语语，我们的方法明确地击败了Gzip，通常按级数达到巨大的秩序。

著录项

来源
《International Conference on String Processing and Information Retrieval》|2004年||共2页
会议地点
作者
Gad M. Landau; Ofer Levi; Steven Skiena;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类数据备份与恢复;
关键词

相似文献

外文文献
中文文献
专利

1. 编委会成员(按字母顺序排列) [J] . 当代社会科学（英文） . 2021,第002期
2. 编委会成员(按字母顺序排列) [J] . 当代社会科学（英文） . 2021,第001期
3. State complexity of permutation on finite languages over a binary alphabet [J] . Cho Da-Jung, Goc Daniel, Han Yo-Sub, Theoretical computer science . 2017,第期

机译：在二进制字母表中有限语言的状态复杂性
4. Permutation approach to finite-alphabet stationary stochastic processes based on the duality between values and orderings [J] . T. Haruna, K. Nakajima The European Physical Journal Special Topics . 2013,第2期

机译：基于值与序对偶的有限字母平稳随机过程的置换方法
5. Permutation approach to finite-alphabet stationary stochastic processes based on the duality between values and orderings [J] . T. Haruna, K. Nakajima The European physical journal: Special topics . 2013,第2期

机译：基于值和顺序对偶的有限字母平稳随机过程的置换方法
6. Alphabet Permutation for Differentially Encoding Text [C] . Gad M. Landau, Ofer Levi, Steven Skiena International Conference on String Processing and Information Retrieval(SPIRE 2004); 20041005-08; Padova(IT) . 2004

机译：差异编码文本的字母排列
7. An 'Alphabet of Tales': The genre, background, date, and provenance of the text, with an annotated glossary. (Volumes I and II) [D] . Johnson, Elma L. 1993

机译：“故事的字母”：文字的体裁，背景，日期和出处，带注释的词汇表。（第一和第二卷）
8. Abstracts to be delivered at the 2014 Annual Conference of the Association of Medical Microbiology and Infectious Disease Canada April 3 to 5 Victoria British Columbia alphabetized according to the surname of the first author. Full-text abstracts can be accessed at www.pulsus.com [O] . 2014

机译：摘要将于4月3日至5日在加拿大医学微生物学和传染病协会2014年年会上发表并根据第一作者的姓氏按字母顺序排列。全文摘要可在www.pulsus.com上访问
9. Alphabet Permutation for Differentially Encoding Text [O] . Gad M. Landau, Ofer Levi, Steven Skiena 2004

机译：差异编码文本的字母排列

Alphabet Permutation for Differentially Encoding Text

摘要

著录项

相似文献

相关主题

期刊订阅