首页> 外文期刊>Information Processing & Management >Revisiting the cluster-based paradigm for implicit search result diversification
【24h】

Revisiting the cluster-based paradigm for implicit search result diversification

机译:重新研究基于集群的范例以实现隐式搜索结果的多样化

获取原文
获取原文并翻译 | 示例
       

摘要

To cope with ambiguous and/or underspecified queries,search result diversification(SRD) is a key technique that has attracted a lot of attention. This paper focuses onimplicit SRD, where the subtopics underlying a query areunknown. Many existing methods appeal to the greedy strategy for generating diversified results. A common practice is using a heuristic criterion for making the locally optimal choice at each round. As a result, it is difficult to know whether the failures are caused by the optimization criterion or the setting of parameters. Different from previous studies, we formulate implicit SRD as a process of selecting and rankingkexemplar documents through integer linear programming (ILP). The key idea is that:for a specific query, we expect to maximize the overall relevance of the k exemplar documents. Meanwhile, we wish to maximize the representativeness of the selected exemplar documents with respect to the non-selected documents. Intuitively, if the selected exemplar documents concisely represent the entire set of documents, the novelty and diversity will naturally arise.Moreover, we propose two approachesILP4ID(Integer Linear Programming for Implicit SRD) andAP4ID(Affinity Propagation for Implicit SRD) for solving the proposed formulation of implicit SRD. In particular, ILP4ID appeals to the strategy of bound-and-branch and is able to obtain the optimal solution. AP4ID being an approximate method transforms the target problem as a maximum-a-posteriori inference problem, and the message passing algorithm is adopted to find the solution. Furthermore, we investigate the differences and connections between the proposed models and prior models by casting them as different variants of the cluster-based paradigm for implicit SRD. To validate the effectiveness and efficiency of the proposed approaches, we conduct a series of experiments on four benchmark TREC diversity collections. The experimental results demonstrate that: (1) The proposed methods, especiallyILP4ID, can achieve substantially improved performance over the state-of-the-art unsupervised methods for implicit SRD. (2) Theinitial runs, the number of input documents, query types, the ways of computing document similarity, the pre-defined cluster numberandthe optimization algorithmsignificantly affect the performance of diversification models. Careful examinations of these factors are highly recommended in the development of implicit SRD methods. Based on the in-depth study of different types of methods for implicit SRD, we provide additional insight into the cluster-based paradigm for implicit SRD. In particular, how the methods relying on greedy strategies impact the performance of implicit SRD, and how a particular diversification model should be fine-tuned.
机译:为了应对模棱两可和/或指定不明确的查询,搜索结果多样化(SRD)是一项引起了广泛关注的关键技术。本文着重于隐式SRD,其中查询所基于的子主题是未知的。许多现有的方法都吸引了贪婪策略以产生多种结果。通常的做法是使用启发式标准在每个回合中做出局部最优选择。结果,很难知道故障是由优化标准还是由参数设置引起的。与以前的研究不同,我们将隐式SRD公式化为通过整数线性规划(ILP)选择和排序示例文档的过程。关键思想是:对于特定查询,我们希望最大化k个示例文档的整体相关性。同时,我们希望最大化所选样本文件相对于非所选文件的代表性。直观地讲,如果所选择的示例文档简洁地代表了整个文档集,那么自然就会产生新颖性和多样性。此外,我们提出了两种方法ILP4ID(用于隐式SRD的整数线性规划)和AP4ID(用于隐式SRD的亲和力传播)来解决所提出的公式隐式SRD。特别是,ILP4ID吸引了绑定和分支的策略,并且能够获得最佳解决方案。 AP4ID是一种近似方法,将目标问题转换为最大后验推断问题,并采用消息传递算法来找到解决方案。此外,我们通过将它们转换为隐式SRD的基于群集的范式的不同变体,来研究所提出的模型与先前模型之间的差异和联系。为了验证所提出方法的有效性和效率,我们对四个基准TREC多样性集合进行了一系列实验。实验结果表明:(1)与隐式SRD的最新无监督方法相比,所提出的方法(尤其是ILP4ID)可以显着提高性能。 (2)初始运行,输入文档的数量,查询类型,文档相似度的计算方式,预定义的簇数和优化算法对多样化模型的性能有重大影响。强烈建议在隐式SRD方法的开发中仔细检查这些因素。基于对隐式SRD的不同类型方法的深入研究,我们提供了对基于群集的隐式SRD范式的进一步了解。特别是,依赖贪婪策略的方法如何影响隐式SRD的性能,以及应如何微调特定的多元化模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号