【24h】

Adaptive Term Frequency Normalization for BM25

机译:BM25的自适应术语频率标准化

获取原文
获取外文期刊封面目录资料

摘要

A key component of BM25 contributing to its success is its sub-linear term frequency (TF) normalization formula. The scale and shape of this TF normalization component is controlled by a parameter k_1, which is generally set to a term-independent constant. We hypothesize and show empirically that in order to optimize retrieval performance, this parameter should be set in a term-specific way. Following this intuition, we propose an information gain measure to directly estimate the contributions of repeated term occurrences, which is then exploited to fit the BM25 function to predict a term-specific k_1. Our experiment results show that the proposed approach, without needing any training data. can efficiently and automatically estimate a term-specific k_1, and is more effective and robust than the standard BM25.
机译:BM25的关键组成部分贡献其成功的是其子线性术语频率(TF)标准化公式。该TF归一化分量的比例和形状由参数k_1控制,该参数k_1通常被设置为无关的常数。我们假设并经验展示,为了优化检索性能,应以特定的方式设置此参数。在这种直觉之后,我们提出了一种信息增益措施,直接估计重复术语出现的贡献,然后利用BM25功能来预测特定于术语的K_1。我们的实验结果表明,拟议的方法,无需任何培训数据。可以有效和自动估计术语特定的K_1,并且比标准BM25更有效和鲁棒。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号