首页> 外文期刊>Information Processing & Management >A statistics-based approach to incrementally update inverted files
【24h】

A statistics-based approach to incrementally update inverted files

机译:基于统计的方法来增量更新反向文件

获取原文
获取原文并翻译 | 示例
           

摘要

Many information retrieval systems use the inverted file as indexing structure. The inverted file, however, requires inefficient reorganization when new documents are to be added to an existing collection. Most studies suggest dealing with this problem by sparing free space in an inverted file for incremental updates. In this paper, we propose a run-time statistics-based approach to allocate the spare space. This approach estimates the space requirements in an inverted file using only a little most recent statistical data on space usage and document update request rate. For best indexing speed and space efficiency, the amount of the spare space to be allocated is determined by adaptively balancing the trade-offs between reorganization reduction and space utilization. Experiment results show that the proposed space-sparing approach significantly avoids reorganization in updating an inverted file, and in the meantime, unused free space can be well controlled such that the file access speed is not affected. (C) 2003 Elsevier Ltd. All rights reserved.
机译:许多信息检索系统使用倒排文件作为索引结构。但是,当要将新文档添加到现有集合时,倒排文件要求重组效率低下。大多数研究建议通过在反向文件中保留可用空间以进行增量更新来解决此问题。在本文中,我们提出了一种基于运行时统计信息的方法来分配备用空间。这种方法仅使用有关空间使用情况和文档更新请求率的最新统计数据来估计反向文件中的空间需求。为了获得最佳的索引速度和空间效率,可通过自适应地平衡减少重组和空间利用之间的权衡来确定要分配的备用空间量。实验结果表明,所提出的空间节省方法在更新倒排文件时显着避免了重组,同时可以很好地控制未使用的可用空间,从而不影响文件的访问速度。 (C)2003 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号