首页> 外文期刊>IEICE Transactions on fundamentals of electronics, communications & computer sciences >Application of a Word-Based Text Compression Method to Japanese and Chinese Texts

Application of a Word-Based Text Compression Method to Japanese and Chinese Texts


获取原文并翻译 | 示例


16-bit Asian language codes can not be compressed well by conventional 8-bit sampling text compression schemes. Previously, we reported the application of a word-based text compression method that uses 16-bit sampling for the compression of Japanese texts. This paper describes our further efforts in applying a word-based method with a static canonical Huffman encoder to both Japanese and Chinese texts. The method was proposed to support a multilingual environment, as we replaced the word-dictionary and the canonical Huffman code table for the respective language appropriately. A computer simulation showed that this method is effective for both languages. The obtained compression ratio was a little less than 0.5 without regarding the Markov context, and around 0.4 when accounting for the first order Markov context.
机译:传统的 8 位采样文本压缩方案无法很好地压缩 16 位亚洲语言代码。之前,我们报道了基于单词的文本压缩方法的应用,该方法使用 16 位采样来压缩日语文本。本文描述了我们在将基于单词的方法与静态规范霍夫曼编码器应用于日语和中文文本方面的进一步努力。该方法的提出是为了支持多语言环境,因为我们适当地替换了相应语言的单词词典和规范的霍夫曼码表。计算机模拟表明,该方法对两种语言都有效。在不考虑马尔可夫上下文的情况下,获得的压缩比略小于 0.5,而在考虑一阶马尔可夫上下文时,压缩比约为 0.4。




京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号