Application of a Word-Based Text Compression Method to Japanese and Chinese Texts

Shigeru YOSHIDA; Takashi MORIHARA; Hironori YAHAGINoriko ITANI

首页> 外文期刊>IEICE Transactions on fundamentals of electronics, communications & computer sciences >Application of a Word-Based Text Compression Method to Japanese and Chinese Texts

【24h】

Application of a Word-Based Text Compression Method to Japanese and Chinese Texts

机译：基于单词的文本压缩方法在日文和中文文本中的应用

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相关主题

摘要

16-bit Asian language codes can not be compressed well by conventional 8-bit sampling text compression schemes. Previously, we reported the application of a word-based text compression method that uses 16-bit sampling for the compression of Japanese texts. This paper describes our further efforts in applying a word-based method with a static canonical Huffman encoder to both Japanese and Chinese texts. The method was proposed to support a multilingual environment, as we replaced the word-dictionary and the canonical Huffman code table for the respective language appropriately. A computer simulation showed that this method is effective for both languages. The obtained compression ratio was a little less than 0.5 without regarding the Markov context, and around 0.4 when accounting for the first order Markov context.

机译：传统的 8 位采样文本压缩方案无法很好地压缩 16 位亚洲语言代码。之前，我们报道了基于单词的文本压缩方法的应用，该方法使用 16 位采样来压缩日语文本。本文描述了我们在将基于单词的方法与静态规范霍夫曼编码器应用于日语和中文文本方面的进一步努力。该方法的提出是为了支持多语言环境，因为我们适当地替换了相应语言的单词词典和规范的霍夫曼码表。计算机模拟表明，该方法对两种语言都有效。在不考虑马尔可夫上下文的情况下，获得的压缩比略小于 0.5，而在考虑一阶马尔可夫上下文时，压缩比约为 0.4。

著录项

来源
《IEICE Transactions on fundamentals of electronics, communications & computer sciences》 |2002年第12期|2933-2938|共6页
作者
Shigeru YOSHIDA; Takashi MORIHARA; Hironori YAHAGINoriko ITANI;
展开▼
作者单位

Peripheral Systems Laboratories, Fujitsu Laboratories Ltd., Atsugi-shi, 243-0197 Japan;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类无线电电子学、电信技术;
关键词
lossless; text compression; language; word-based;

机译：无损;文本压缩;语言;基于单词;

Application of a Word-Based Text Compression Method to Japanese and Chinese Texts

摘要

著录项

引文网络

相关主题

期刊订阅