首页> 外文会议>International Conference on String Processing and Information Retrieval >Alphabet Permutation for Differentially Encoding Text
【24h】

Alphabet Permutation for Differentially Encoding Text

机译:差异编码文本的字母排列

获取原文

摘要

One degree of freedom not usually exploited in developing high-performance text-processing algorithms is the encoding of the underlying atomic character set. Here we consider a text compression method where the specific character set collating-sequence employed in encoding the text has a big impact on performance. We demonstrate that permuting the standard character collating-sequences yields a small win on Asian-language texts over gzip. We also show improved compression with our method for English texts, although not by enough to beat standard methods. However, we also design a class of artificial languages on which our method clearly beats gzip, often by an order of magnitude.
机译:通常在开发高性能文本处理算法方面不利用的一种自由度是底层原子字符集的编码。在这里,我们考虑一种文本压缩方法,其中编码文本中使用的特定字符集进行整理序列对性能产生了很大影响。我们证明,违反标准字符整理序列在GZIP上亚洲语言文本中产生小胜利。我们还显示了我们对英语文本的方法来提高压缩,虽然不是足以击败标准方法。然而,我们还设计了一类人工语语,我们的方法明确地击败了Gzip,通常按级数达到巨大的秩序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号