【24h】

Mapping Words into Codewords on PPM

机译:在PPM上将单词映射为代码字

获取原文
获取原文并翻译 | 示例

摘要

We describe a simple and efficient scheme which allows words to be managed in PPM modelling when a natural language text file is being compressed. The main idea for managing words is to assign them codes to make them easier to manipulate. A general technique is used to obtain this objective: a dictionary mapping on PPM modelling. In order to test our idea, we are implementing three prototypes: one implements the basic dictionary mapping on PPM, another implements the dictionary mapping with the separate alphabets model and the last one implements the dictionary with the spaceless words model. This technique can be applied directly or it can be combined with some word compression model. The results for files of 1 Mb. and over are better than those achieved by the character PPM which was taken as a base. The comparison between different prototypes shows that the best option is to use a word based PPM in conjunction with the spaceless word concept.
机译:我们描述了一种简单有效的方案,当压缩自然语言文本文件时,该方案允许在PPM建模中管理单词。管理单词的主要思想是为它们分配代码,以使其更易于操作。一种通用技术可用于实现此目标:在PPM建模上的字典映射。为了检验我们的想法,我们实现了三个原型:一个在PPM上实现基本的字典映射,另一个在单独的字母模型下实现字典映射,最后一个在无空格单词模型下实现字典。该技术可以直接应用,也可以与某些单词压缩模型结合使用。 1 Mb文件的结果。并且优于以PPM为基础的字符所实现的效果。不同原型之间的比较表明,最好的选择是将基于单词的PPM与无空间单词概念结合使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号