Revisiting the cluster-based paradigm for implicit search result diversification

Hai-Tao Yu; Adam Jatowt; Roi Blanco; Hideo Joho; Joemon M. Jose; Long Chen; Fajie Yuan

首页> 外文期刊>Information Processing & Management >Revisiting the cluster-based paradigm for implicit search result diversification

【24h】

Revisiting the cluster-based paradigm for implicit search result diversification

机译：重新研究基于集群的范例以实现隐式搜索结果的多样化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

To cope with ambiguous and/or underspecified queries,search result diversification(SRD) is a key technique that has attracted a lot of attention. This paper focuses onimplicit SRD, where the subtopics underlying a query areunknown. Many existing methods appeal to the greedy strategy for generating diversified results. A common practice is using a heuristic criterion for making the locally optimal choice at each round. As a result, it is difficult to know whether the failures are caused by the optimization criterion or the setting of parameters. Different from previous studies, we formulate implicit SRD as a process of selecting and rankingkexemplar documents through integer linear programming (ILP). The key idea is that:for a specific query, we expect to maximize the overall relevance of the k exemplar documents. Meanwhile, we wish to maximize the representativeness of the selected exemplar documents with respect to the non-selected documents. Intuitively, if the selected exemplar documents concisely represent the entire set of documents, the novelty and diversity will naturally arise.Moreover, we propose two approachesILP4ID(Integer Linear Programming for Implicit SRD) andAP4ID(Affinity Propagation for Implicit SRD) for solving the proposed formulation of implicit SRD. In particular, ILP4ID appeals to the strategy of bound-and-branch and is able to obtain the optimal solution. AP4ID being an approximate method transforms the target problem as a maximum-a-posteriori inference problem, and the message passing algorithm is adopted to find the solution. Furthermore, we investigate the differences and connections between the proposed models and prior models by casting them as different variants of the cluster-based paradigm for implicit SRD. To validate the effectiveness and efficiency of the proposed approaches, we conduct a series of experiments on four benchmark TREC diversity collections. The experimental results demonstrate that: (1) The proposed methods, especiallyILP4ID, can achieve substantially improved performance over the state-of-the-art unsupervised methods for implicit SRD. (2) Theinitial runs, the number of input documents, query types, the ways of computing document similarity, the pre-defined cluster numberandthe optimization algorithmsignificantly affect the performance of diversification models. Careful examinations of these factors are highly recommended in the development of implicit SRD methods. Based on the in-depth study of different types of methods for implicit SRD, we provide additional insight into the cluster-based paradigm for implicit SRD. In particular, how the methods relying on greedy strategies impact the performance of implicit SRD, and how a particular diversification model should be fine-tuned.

机译：为了应对模棱两可和/或指定不明确的查询，搜索结果多样化（SRD）是一项引起了广泛关注的关键技术。本文着重于隐式SRD，其中查询所基于的子主题是未知的。许多现有的方法都吸引了贪婪策略以产生多种结果。通常的做法是使用启发式标准在每个回合中做出局部最优选择。结果，很难知道故障是由优化标准还是由参数设置引起的。与以前的研究不同，我们将隐式SRD公式化为通过整数线性规划（ILP）选择和排序示例文档的过程。关键思想是：对于特定查询，我们希望最大化k个示例文档的整体相关性。同时，我们希望最大化所选样本文件相对于非所选文件的代表性。直观地讲，如果所选择的示例文档简洁地代表了整个文档集，那么自然就会产生新颖性和多样性。此外，我们提出了两种方法ILP4ID（用于隐式SRD的整数线性规划）和AP4ID（用于隐式SRD的亲和力传播）来解决所提出的公式隐式SRD。特别是，ILP4ID吸引了绑定和分支的策略，并且能够获得最佳解决方案。 AP4ID是一种近似方法，将目标问题转换为最大后验推断问题，并采用消息传递算法来找到解决方案。此外，我们通过将它们转换为隐式SRD的基于群集的范式的不同变体，来研究所提出的模型与先前模型之间的差异和联系。为了验证所提出方法的有效性和效率，我们对四个基准TREC多样性集合进行了一系列实验。实验结果表明：（1）与隐式SRD的最新无监督方法相比，所提出的方法（尤其是ILP4ID）可以显着提高性能。（2）初始运行，输入文档的数量，查询类型，文档相似度的计算方式，预定义的簇数和优化算法对多样化模型的性能有重大影响。强烈建议在隐式SRD方法的开发中仔细检查这些因素。基于对隐式SRD的不同类型方法的深入研究，我们提供了对基于群集的隐式SRD范式的进一步了解。特别是，依赖贪婪策略的方法如何影响隐式SRD的性能，以及应如何微调特定的多元化模型。

著录项

来源
《Information Processing & Management》 |2018年第4期|507-528|共22页
作者
Hai-Tao Yu; Adam Jatowt; Roi Blanco; Hideo Joho; Joemon M. Jose; Long Chen; Fajie Yuan;
展开▼
作者单位

Faculty of Library, Information and Media Science, University of Tsukuba;

Department of Social Informatics, Graduate School of Informatics, Kyoto University;

IRLab, Computer Science Department, University of A Coruña;

Research Center for Knowledge Communities, Faculty of Library, Information and Media Science, University of Tsukuba;

School of Computing Science, University of Glasgow;

School of Computing Science, University of Glasgow;

School of Computing Science, University of Glasgow;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cluster-based IR; Implicit SRD; Integer linear programming; Affinity propagation;

机译：基于簇的红外;隐式SRD;整数线性规划;亲和力传播;
入库时间 2022-08-17 23:20:02

相似文献

外文文献
中文文献
专利

1. A Mean-Variance Analysis Based Approach for Search Result Diversification in Federated Search [J] . Ghansah Benjamin, Wu Shengli International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems . 2016,第2期

机译：基于均值方差分析的联合搜索中搜索结果多样化方法
2. Covert Visual Search: Revisiting the Guided Search Paradigm [J] . Ulrich Engelke, Andreas Duenser, Anthony Zeater International journal of cognitive informatics and natural intelligence . 2014,第3期

机译：隐性视觉搜索：重新介绍引导搜索范式
3. Low-cost, bottom-up measures for evaluating search result diversification [J] . Dou Zhicheng, Yang Xue, Li Diya, Information retrieval . 2020,第1期

机译：低成本，自下而上的评估措施，用于评估搜索结果多样化
4. Predicting the Size of Candidate Document Set for Implicit Web Search Result Diversification [C] . Yasar Baris Ulu, Ismail Sengor Altingovde European Conference on IR Research . 2020

机译：预测隐式Web搜索结果多样化的候选文档集的大小
5. Automatic search interface clustering and search result processing in metasearch engine [D] . Lu, Yiyao 2011

机译：元搜索引擎中的自动搜索界面聚类和搜索结果处理
6. Root resorption revisited: The paradigm of force effect on root resorption: Is a ‘paradigm shift needed in order to learn more about the phenomenon? [O] . Naphtali Brezniak, Atalia Wasserstein 2019

机译：Root Broverting Revisited：强制对根部吸取的范例 - 是一个范式班次所需的以便更多地了解这一现象？
7. Revisiting the cluster-based paradigm for implicit search result diversification [O] . Hai-Tao Yu, Adam Jatowt, Roi Blanco, 2018

机译：重新审视基于群集的范例，用于隐式搜索结果多样化
8. Exploiting Ontologies for Search Result Diversification. [R] . Zheng, W., Fang, H. 2012

机译：利用本体搜索结果多样化。

Revisiting the cluster-based paradigm for implicit search result diversification

摘要

著录项

相似文献

相关主题

期刊订阅