首页> 外文期刊>IEICE transactions on information and systems >Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix — Pursuit of Enhanced Informational Search on the Web —
【24h】

Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix — Pursuit of Enhanced Informational Search on the Web —

机译:使用术语文档二进制矩阵对长查询进行有效的Top-k文档检索-追求增强的Web信息搜索能力-

获取原文
           

摘要

With the successful adoption of link analysis techniques such as PageRank and web spam filtering, current web search engines well support “navigational search”. However, due to the use of a simple conjunctive Boolean filter in addition to the inappropriateness of user queries, such an engine does not necessarily well support “informational search”. Informational search would be better handled by a web search engine using an informational retrieval model combined with enhancement techniques such as query expansion and relevance feedback. Moreover, the realization of such an engine requires a method to prosess the model efficiently. In this paper we propose a novel extension of an existing top-k query processing technique to improve search efficiency. We add to it the technique utilizing a simple data structure called a “term-document binary matrix,” resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the proposed method can speed up evaluation considerably compared with existing techniques especially when the number of query terms gets larger.
机译:随着链接分析技术(如PageRank和Web垃圾邮件过滤)的成功采用,当前的Web搜索引擎很好地支持“导航搜索”。但是,由于除了用户查询的不当性之外,还使用了简单的联合布尔过滤器,因此这种引擎不一定很好地支持“信息搜索”。网络搜索引擎使用信息检索模型并结合增强技术(例如查询扩展和相关性反馈)可以更好地处理信息搜索。而且,这种引擎的实现需要一种有效地处理模型的方法。在本文中,我们提出了对现有top-k查询处理技术的新颖扩展,以提高搜索效率。我们向其中添加了一种利用称为“术语文档二进制矩阵”的简单数据结构的技术,即使对查询进行了扩展,也可以更有效地评估前k个查询。我们在使用TREC GOV2数据集的实验评估以及附加到该数据集的评估查询的扩展版本的基础上证明,与现有技术相比,所提出的方法可以显着加快评估速度,尤其是在查询字词数量增加时。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号