首页> 外文期刊>Information Processing & Management >Contextualized query expansion via unsupervised chunk selection for text retrieval
【24h】

Contextualized query expansion via unsupervised chunk selection for text retrieval

机译:通过无监督的块选择进行文本检索的无监督块的查询扩展

获取原文
获取原文并翻译 | 示例
       

摘要

When ranking a list of documents relative to a given query, the vocabulary mismatches could compromise the performance, as a result of the different language used in the queries and the documents. Though the BERT-based re-ranker have significantly advanced the state-of-the-art, such mismatch still exist. Moreover, recent works demonstrated that it is non-trivial to use the established query expansion methods to boost the performance of BERT-based re-rankers. Henceforth, this paper proposes a novel query expansion model using unsupervised chunk selection, coined as BERT-QE. In particular, BERT-QE consists of three phases. After performing the first-round re-ranking in phase one, BERT-QE leverages the strength of the BERT model to select relevant text chunks from feedback documents in phase two and uses them for the final re-ranking in phase three. Furthermore, different variants of BERT-QE are thoroughly investigated for a better trade-off between effectiveness and efficiency, including the uses of smaller BERT variants and of recently proposed late interaction methods. On the standard TREC Robust04 and GOV2 test collections, the proposed BERT-QE model significantly outperforms BERT-Large models. Actually, the best variant of BERT-QE can outperform BERT-Large significantly on shallow metrics with less than 1% extra computations.
机译:在相对于给定查询排列文档列表时,词汇错匹配可能会损害性能,因为查询和文档中使用的不同语言。虽然基于BERT的RE-Ranker显着提出了最先进的,但这种不匹配仍然存在。此外,最近的作品表明,使用已建立的查询扩展方法是不普遍的,以提高基于BERT的重新排名符的性能。此后,本文提出了一种使用无监督的块选择的新型查询扩展模型,被称为BERT-QE。特别是,BERT-QE由三个阶段组成。在执行第一阶段的第一轮重新排序之后,BERT-QE利用BERT模型的强度来选择来自阶段两相的反馈文档的相关文本块,并将它们用于第三阶段的最终重新排名。此外,在有效性和效率之间进行彻底研究BERT-QE的不同变体,包括较小伯特变体的用途和最近提出的后期相互作用方法。在标准的TREC ROBUST04和GOV2测试集合上,所提出的BERT-QE模型显着优于BERT-MIGHT型号。实际上,BERT-QE的最佳变体可以在较小的额外计算较小的度量上显着优于良好的频率。

著录项

  • 来源
    《Information Processing & Management》 |2021年第5期|102672.1-102672.19|共19页
  • 作者单位

    University of Chinese Academy of Sciences Beijing China Institute of Software Chinese Academy of Sciences Beijing China;

    Amazon Alexa Berlin Germany;

    University of Chinese Academy of Sciences Beijing China Institute of Software Chinese Academy of Sciences Beijing China;

    Institute of Software Chinese Academy of Sciences Beijing China;

    Institute of Software Chinese Academy of Sciences Beijing China;

    Max Planck Institute for Informatics Saarbruecken Germany;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Contextualized query expansion; Document re-ranking; Pre-trained text representations;

    机译:语境化查询扩展;文件重新排名;预先接受的文本表示;
  • 入库时间 2022-08-19 02:25:57

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号