首页> 外文会议>IEEE International Conference on Advanced Information Networking and Applications >Approximating Document Frequency for Self-Index based Top-k Document Retrieval
【24h】

Approximating Document Frequency for Self-Index based Top-k Document Retrieval

机译:基于自索引的Top-k文档检索的近似文档频率

获取原文

摘要

Top-k document retrieval, which returns highly relevant documents relative to a query, is an essential task for many applications. One of the promising index frameworks is built by FM-index and wavelet tree for supporting efficient top-k document retrieval. The index, however, has difficulty on handling document frequency (DF) at search time because indexed terms are all substrings of a document collection. Previous works exhaustively search all the parts of the index, where most of the documents are not relevant, for DF calculation or store recalculated DF values in huge additional space. In this paper, we propose two methods to approximate DF of a query term by exploiting the information obtained from the process of traversing the index structures. Experimental results showed that our methods achieved almost equal effectiveness of exhaustive search while keeping search efficiency that time of our methods are about a half of the exhaustive search.
机译:Top-k文档检索(返回与查询相关的文档高度相关)是许多应用程序的基本任务。 FM-index和小波树构建了一种有前途的索引框架,用于支持高效的top-k文档检索。但是,索引很难在搜索时处理文档频率(DF),因为索引词是文档集合的所有子字符串。以前的工作详尽地搜索了索引中所有与文档无关的所有部分,以进行DF计算或将重新计算的DF值存储在巨大的额外空间中。在本文中,我们提出了两种方法,可以利用从遍历索引结构的过程中获得的信息来近似查询词的DF。实验结果表明,我们的方法达到了穷举搜索的几乎相等的效果,同时保持了搜索效率,即我们的方法的时间约为穷举搜索的一半。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号