首页> 外文会议>ACM conference on information and knowledge management >Improved Index Compression Techniques for Versioned Document Collections
【24h】

Improved Index Compression Techniques for Versioned Document Collections

机译:改进了版本化文档集合的索引压缩技术

获取原文

摘要

Current Information Retrieval systems use inverted index structures for efficient query processing. Due to the extremely large size of many data sets, these index structures are usually kept in compressed form, and many techniques for optimizing compressed size and query processing speed have been proposed. In this paper, we focus on versioned document collections, that is, collections where each document is modified over time, resulting in multiple versions of the document. Consecutive versions of the same document are often similar, and several researchers have explored ideas for exploiting this similarity to decrease index size. We propose new index compression techniques for versioned document collections that achieve reductions in index size over previous methods. In particular, we first propose several bitwise compression techniques that achieve a compact index structure but that are too slow for most applications. Based on the lessons learned, we then propose additional techniques that come close to the sizes of the bitwise technique while also improving on the speed of the best previous methods.
机译:当前信息检索系统使用反相索引结构来高效查询处理。由于许多数据集的极大尺寸,所以这些索引结构通常以压缩形式保持,并且已经提出了许多用于优化压缩大小和查询处理速度的技术。在本文中,我们专注于版本化的文档集合,即收集每个文档随时间修改的集合,导致文档的多个版本。同一文档的连续版本通常是相似的,几位研究人员探索了利用这种相似性以减少索引大小的想法。我们为版本化的文档集合提出了新的索引压缩技术,以实现以前的方法缩减索引大小。特别是,我们首先提出了几种基准压缩技术,实现了紧凑的索引结构,但对于大多数应用来说,这太慢了。基于经验教训,我们提出了近额靠近按位技术的尺寸的额外技术,同时还提高了先前方法的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号