...
首页> 外文期刊>Information Processing & Management >Neural embedding-based specificity metrics for pre-retrieval query performance prediction
【24h】

Neural embedding-based specificity metrics for pre-retrieval query performance prediction

机译:用于预检索查询性能预测的神经嵌入的特异性指标

获取原文
获取原文并翻译 | 示例
           

摘要

In information retrieval, the task of query performance prediction (QPP) is concerned with determining in advance the performance of a given query within the context of a retrieval model. QPP has an important role in ensuring proper handling of queries with varying levels of difficulty. Based on the extant literature, query specificity is an important indicator of query performance and is typically estimated using corpus-specific frequency-based specificity metrics However, such metrics do not consider term semantics and inter-term associations. Our work presented in this paper distinguishes itself by proposing a host of corpus-independent specificity metrics that are based on pre-trained neural embeddings and leverage geometric relations between terms in the embedding space in order to capture the semantics of terms and their in-terdependencies. Specifically, we propose three classes of specificity metrics based on pre-trained neural embeddings: neighborhood-based, graph-based, and cluster-based metrics. Through two extensive and complementary sets of experiments, we show that the proposed specificity metrics (1) are suitable specificity indicators, based on the gold standards derived from knowledge hierarchies (Wikipedia category hierarchy and DMOZ taxonomy), and (2) have better or competitive performance compared to the state of the art QPP metrics, based on both TREC ad hoc collections namely Robust'04, Gov2 and ClueWeb'09 and ANTIQUE question answering collection. The proposed graph-based specificity metrics, especially those that capture a larger number of inter-term associations, proved to be the most effective in both query specificity estimation and QPP. We have also publicly released two test collections (i.e. specificity gold standards) that we built from the Wikipedia and DMOZ knowledge hierarchies.
机译:在信息检索中,查询性能预测(QPP)的任务涉及在检索模型的上下文中提前确定给定查询的性能。 QPP在确保正确处理具有不同难度水平的疑问方面具有重要作用。基于扩展文献,查询特异性是查询性能的重要指标,通常使用特定于语料库的频率的特异性度量来估计,但是这些度量不考虑术语语义和间间关联。我们本文提出的工作通过提出基于预先训练的神经嵌入的主管独立的特异性指标,并利用嵌入空间之间的术语之间的几何关系,以捕获术语和in-terdepencies的语义。具体而言,我们提出了三类基于预先训练的神经嵌入的特异性指标:基于邻域,基于图形的和基于群集的度量。通过两个广泛和互补的实验组,我们表明,所提出的特异性指标(1)是合适的特异性指标,基于知识等级(维基百科类别层次结构和DMOZ分类),(2)具有更好或更有竞争力的金色标准与现有技术的QPP指标相比,基于TREC AD HOC集合,即Robust'04,GOV2和Clueweb'09和古董问题回答收集。所提出的基于图形的特异性指标,尤其是捕获更多数量的间间关联的特异性指标被证明是查询特异性估算和QPP中最有效的。我们还公开发布了我们由维基百科和DMOZ知识层次结构构成的两项测试收集(即特异性黄金标准)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号