首页> 外文会议>International Symposium on String Processing and Information Retrieval >Term Impacts as Normalized Term Frequencies for BM25 Similarity Scoring
【24h】

Term Impacts as Normalized Term Frequencies for BM25 Similarity Scoring

机译:术语影响为BM25相似性评分的标准化术语频率

获取原文

摘要

The BM25 similarity computation has been shown to provide effective document retrieval. In operational terms, the formulae which form the basis for BM25 employ both term frequency and document length normalization. This paper considers an alternative form of normalization using document-centric impacts, and shows that the new normalization simplifies BM25 and reduces the number of tuning parameters. Motivation is provided by a preliminary analysis of a document collection that shows that impacts are more likely to identify documents whose lengths resemble those of the relevant judgments. Experiments on TREC data demonstrate that impact-based BM25 is as good as or better than the original term frequency-based BM25 in terms of retrieval effectiveness.
机译:已显示BM25相似性计算提供有效的文档检索。在操作项中,形成BM25基础的公式采用术语频率和文档长度标准化。本文考虑使用以文档为中心的影响,表明新的归一化简化了BM25并减少了调谐参数的数量。通过对文件收集的初步分析提供了动机,这些文件集合表明,影响更有可能识别其长度类似于相关判决的文件。 TREC数据的实验证明基于影响的BM25与原始术语基于频率的BM25在检索效能方面一样好或更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号