【24h】

Using Clustering to Improve WLZ77 Compression

机译:使用群集以提高WLZ77压缩

获取原文

摘要

Many types of Information Retrieval Systems (IRS) are created and more and more documents are stored in them too. The fundamental process of IRS is building of textual database, and compression of the documents stored in the database. One possibility for compression of textual data is word-based compression. Several algorithms for word-based compression algorithms based on Huffman encoding, LZW or BWT algorithm was proposed. In this paper, we describe word-based compression method based on LZ77 algorithm. IRS can also perform cluster analysis of textual database to improve quality of answers to users' queries. The information retrieved from the clustering can be very helpful in compression. Word-based compression using information about cluster hierarchy is presented in this paper Experimental results which are provided at the end of the paper were achieved not only using well-known word-based compression algorithms WBW and WLZW but also using quite new WLZ77 algorithm.
机译:创建了许多类型的信息检索系统(IRS),越来越多的文档也存储在其中。 IRS的基本过程是构建文本数据库,并压缩存储在数据库中的文档。压缩文本数据的一种可能性是基于词的压缩。提出了一种基于霍夫曼编码,LZW或BWT算法的基于词基压缩算法的几种算法。在本文中,我们描述了基于LZ77算法的基于词的压缩方法。 IRS还可以对文本数据库进行群集分析,以提高用户查询的答案的质量。从群集中检索的信息可能非常有用压缩。使用关于簇层次的信息的基于词的压缩在本文的实验结果中介绍,该实验结果在纸张结束时不仅使用了众所周知的基于WORD的压缩算法WBW和WLZW,还可以使用相当多的WLZ77算法来实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号