首页> 外文期刊>ACM transactions on the web >Scalable and Efficient Web Search Result Diversification
【24h】

Scalable and Efficient Web Search Result Diversification

机译:可扩展且高效的Web搜索结果多样化

获取原文
获取原文并翻译 | 示例

摘要

It has been shown that top-k retrieval quality can be considerably improved by taking not only relevance but also diversity into account. However, currently proposed diversification approaches have not put much attention on practical usability in large-scale settings, such as modern web search systems. In this work, we make two contributions toward this goal. First, we propose a combination of optimizations and heuristics for an implicit diversification algorithm based on the desirable facility placement principle, and present two algorithms that achieve linear complexity without compromising the retrieval effectiveness. Instead of an exhaustive comparison of documents, these algorithms first perform a clustering phase and then exploit its outcome to compose the diverse result set. Second, we describe and analyze two variants for distributed diversification in a computing cluster, for large-scale IR where the document collection is too large to keep in one node. Our contribution in this direction is pioneering, as there exists no earlier work in the literature that investigates the effectiveness and efficiency of diversification on a distributed setup. Extensive evaluations on a standard TREC framework demonstrate a competitive retrieval quality of the proposed optimizations to the baseline algorithm while reducing the processing time by more than 80% and up to 97%, and shed light on the efficiency and effectiveness tradeoffs of diversification when applied on top of a distributed architecture.
机译:已经表明,不仅考虑相关性而且考虑多样性,可以极大地改善top-k检索质量。但是,当前提出的多样化方法并没有在大规模设置(例如现代Web搜索系统)中的实用性上投入太多关注。在这项工作中,我们为实现这一目标做出了两点贡献。首先,我们基于理想的设施放置原则,提出了一种针对隐式多样化算法的优化和启发式算法的组合,并提出了两种在不影响检索效率的情况下实现线性复杂度的算法。这些算法不是对文档进行详尽的比较,而是先执行聚类阶段,然后利用其结果组成各种结果集。其次,我们描述并分析了计算集群中分布式多样化的两种变体,即针对大型IR,其中文档集合太大而无法保存在一个节点中。我们在这个方向上的贡献是开创性的,因为文献中没有较早的工作来研究分布式环境下多元化的有效性和效率。在标准TREC框架上进行的广泛评估表明,与基准算法相比,所提出的优化具有竞争性的检索质量,同时将处理时间减少了80%以上至97%,并阐明了将其应用于多元化时的效率和有效性之间的权衡分布式体系结构的顶部。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号