In Situ Generation of Compressed Inverted Files

Alistair Moffat; Timothy A. H. Bell

首页> 外文期刊>Journal of the American Society for Information Science >In Situ Generation of Compressed Inverted Files

【24h】

In Situ Generation of Compressed Inverted Files

机译：压缩反转文件的原位生成

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensable when Boolean or informal ranked queries are to be answered. Construction of the index is, however, a nontrivial task. Simple methods using in-memory data structures cannot be used for large collections because they require too much random access storage, and traditional disk-based methods require large amounts of temporary file space. This paper describes a new indexing algorithm designed to create large compressed inverted indexes in situ. It makes use of simple compression codes for the positive integers and an in-place external multi-way mergesort. The new technique has been used to invert a two-gigabyte text collection in under 4 hours, using less than 40 megabytes of temporary disk space, and less than 20 megabytes of main memory.

机译：反向索引针对出现在文档集合中的每个术语存储包含该术语的文档编号列表。当要回答布尔或非正式排名查询时，这样的索引必不可少。但是，索引的构建是一项艰巨的任务。使用内存中数据结构的简单方法不能用于大型集合，因为它们需要太多的随机访问存储，而传统的基于磁盘的方法需要大量的临时文件空间。本文介绍了一种新的索引算法，该算法旨在原位创建大型压缩反向索引。它对正整数使用简单的压缩代码，并使用就地外部多路合并排序。这项新技术已用于在不到4个小时的时间内反转2 GB的文本集，其中使用了不到40 MB的临时磁盘空间和不到20 MB的主内存。

著录项

来源
《Journal of the American Society for Information Science》 |1995年第7期|p.537-550|共14页
作者
Alistair Moffat; Timothy A. H. Bell;
展开▼
作者单位

Department of Computer Science, The University of Melbourne, Parkville, Victoria 3052, Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类科学、科学研究;
关键词
入库时间 2022-08-18 00:56:09

相似文献

外文文献
中文文献
专利

1. Incremental Cluster-Based Retrieval Using Compressed Cluster-Skipping Inverted Files [J] . ISMAIL SENGOR ALTINGOVDE, ENGIN DEMIR, FAZLI CAN, ACM Transactions on Information Systems . 2008,第3期

机译：使用压缩的跳过簇的反向文件进行基于簇的增量检索
2. Compressing Inverted Files [J] . ANDREW TROTMAN Information retrieval . 2003,第1期

机译：压缩反转文件
3. Using Inverted Files to Compress Text [J] . Ristov Strahil Journal of computing and information technology . 2002,第3期

机译：使用反向文件压缩文本
4. Scalable data structure to compress next-generation sequencing files and its application to compressive genomics [C] . Sandino Vargas Pérez, Fahad Saeed IEEE International Conference on Bioinformatics and Biomedicine . 2017

机译：可压缩的数据结构可压缩下一代测序文件及其在压缩基因组学中的应用
5. High-performance computing algorithms for constructing inverted files on emerging multicore processors. [D] . Wei, Zheng. 2012

机译：用于在新兴的多核处理器上构造反向文件的高性能计算算法。
6. Immunohistochemistry and Fluorescence In Situ Hybridization Can Inform the Differential Diagnosis of Low-Grade Noninvasive Urothelial Carcinoma with an Inverted Growth Pattern and Inverted Urothelial Papilloma [O] . Juan-Juan Sun, Yong Wu, Yong-Ming Lu, -1

机译：免疫组织化学和荧光原位杂交技术可以为低度无创性尿路上皮癌生长型和尿道乳头状瘤的鉴别诊断提供依据。
7. Compressing Inverted Files using Modified LZW [O] . Vasileios Iosifidis, Christos Makris 2016

机译：使用修改的LZW压缩反转文件
8. Idhs 1410 Formatted File System: File Maintenance and File Generation Manual [R] . 1966

机译：Idhs 1410格式化文件系统：文件维护和文件生成手册

In Situ Generation of Compressed Inverted Files

摘要

著录项

相似文献

相关主题

期刊订阅