首页> 外文OA文献 >Efficient in-memory extensible inverted file
【2h】

Efficient in-memory extensible inverted file

机译:高效的内存中可扩展倒排文件

摘要

The growing amount of on-line data demands efficient parallel and distributed indexing mechanisms to manage large resource requirements and unpredictable system failures. Parallel and distributed indices built using commodity hardware like personal computers (PCs) can substantially save cost because PCs are produced in bulk, achieving the scale of economy. However, PCs have limited amount of random access memory (RAM) and the effective utilization of RAM for in-memory inversion is crucial. This paper presents an analytical investigation and an empirical evaluation of storage-efficient inmemory extensible inverted files, which are represented by fixed- or variable-sized linked list nodes. The size of these linked list nodes is determined by minimizing the storage wastes or maximizing storage utilization under different conditions, which lead to different storage allocation schemes. Minimizing storage wastes also reduces the number of address indirections (i.e., chaining). We evaluated our storage allocation schemes using a number of reference collections. We found that the arrival rate scheme is the best in terms of both storage utilization and the mean number of chainings per term. The final storage utilization can be over 90% in our evaluation if there is a sufficient number of documents indexed. The mean number of chainings is not large (less than 2.6 for all the reference collections). We have also showed that our best storage allocation scheme can be used for our extensible compressed inverted file. The final storage utilization of the extensible compressed inverted file can be over 90% in our evaluation provided that there is a sufficient number of documents indexed. The proposed storage allocation schemes can also be used by compressed extensible inverted files with word positions.
机译:不断增长的在线数据量要求有效的并行和分布式索引机制来管理大量资源需求和不可预测的系统故障。使用个人计算机(PC)等商品硬件建立的并行和分布式索引可以大大节省成本,因为PC是批量生产的,从而实现了经济规模。但是,PC的随机存取存储器(RAM)数量有限,有效利用RAM进行内存内转换至关重要。本文介绍了存储有效的内存可扩展反向文件的分析研究和经验评估,这些文件由固定大小或可变大小的链表节点表示。这些链接列表节点的大小是通过在不同条件下将存储浪费最小化或将存储利用率最大化来确定的,这导致了不同的存储分配方案。使存储浪费最小化也减少了地址间接寻址(即,链接)的数量。我们使用许多参考馆藏评估了我们的存储分配方案。我们发现,就存储利用率和每个术语的平均链接数而言,到达率方案是最佳的。如果索引的文档数量足够,那么最终存储利用率在我们的评估中可能会超过90%。链接的平均数量不大(对于所有参考集合,均小于2.6)。我们还表明,最好的存储分配方案可用于可扩展的压缩反向文件。如果有足够数量的索引文件,可扩展压缩反向文件的最终存储利用率在我们的评估中可以超过90%。所提出的存储分配方案也可以用于具有单词位置的压缩可扩展反向文件。

著录项

  • 作者

    Luk RWP; Lam W;

  • 作者单位
  • 年度 2007
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号