首页> 外文会议>International conference on information and knowledge engineering >A Tree-based Inverted File for Fast Ranked-Document Retrieval
【24h】

A Tree-based Inverted File for Fast Ranked-Document Retrieval

机译:基于树的反转文件,用于快速排名 - 文档检索

获取原文

摘要

Inverted files are widely used to index documents in large-scale information retrieval systems. An inverted file consists of posting lists, which can be stored in either a document-identifier ascending order or a document-weight descending order. For an identifier-ascending-order posting list, retrieving ranked documents necessitates traversal of all postings, whereas for the weight-descending-order posting list, performing Boolean queries involves very complex processing. In this paper, we transform a posting list to a tree-based structure, called the n-key-heap posting tree, to speedup ranked-document retrieval for Boolean queries. In this structure, the orders of document identifiers and document weights are preserved simultaneously. To preserve the identifier order, the edge pointers are designed to maintain numerical order in the posting tree. To preserve the weight order, greater-weight postings are stored in higher tree nodes by the heap property. We model these criteria to a tree-construction problem and propose an efficient algorithm to construct an optimal posting tree having the minimal access time.
机译:倒置文件广泛用于索引大规模信息检索系统中的文档。反转文件由发布列表组成,可以以文档标识符升序或文档权重降序存储。对于标识符 - 升序发布列表,检索排名的文档需要遍历所有帖子,而对于执行布尔查询的重量下降订单列表涉及非常复杂的处理。在本文中,我们将发布列表转换为基于树的结构,称为n-key-eacp发布树,以加速排名 - 文档检索用于布尔查询。在这种结构中,文档标识符和文档权重的订单同时保留。为了保留标识符顺序,边缘指针旨在维持在发布树中的数字顺序。为了保持重量阶,通过堆属性将更大的帖子存储在较高的树节点中。我们将这些标准塑造到树施工问题,并提出了一种有效的算法来构造具有最小接入时间的最佳张贴树。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号