首页> 外文期刊>Journal of the American Society for Information Science >In Situ Generation of Compressed Inverted Files
【24h】

In Situ Generation of Compressed Inverted Files

机译:压缩反转文件的原位生成

获取原文
获取原文并翻译 | 示例
       

摘要

An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensable when Boolean or informal ranked queries are to be answered. Construction of the index is, however, a nontrivial task. Simple methods using in-memory data structures cannot be used for large collections because they require too much random access storage, and traditional disk-based methods require large amounts of temporary file space. This paper describes a new indexing algorithm designed to create large compressed inverted indexes in situ. It makes use of simple compression codes for the positive integers and an in-place external multi-way mergesort. The new technique has been used to invert a two-gigabyte text collection in under 4 hours, using less than 40 megabytes of temporary disk space, and less than 20 megabytes of main memory.
机译:反向索引针对出现在文档集合中的每个术语存储包含该术语的文档编号列表。当要回答布尔或非正式排名查询时,这样的索引必不可少。但是,索引的构建是一项艰巨的任务。使用内存中数据结构的简单方法不能用于大型集合,因为它们需要太多的随机访问存储,而传统的基于磁盘的方法需要大量的临时文件空间。本文介绍了一种新的索引算法,该算法旨在原位创建大型压缩反向索引。它对正整数使用简单的压缩代码,并使用就地外部多路合并排序。这项新技术已用于在不到4个小时的时间内反转2 GB的文本集,其中使用了不到40 MB的临时磁盘空间和不到20 MB的主内存。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号