首页> 外文会议>Advances in Information Retrieval >Probabilistic Document Length Priors for Language Models
【24h】

Probabilistic Document Length Priors for Language Models

机译:语言模型的概率文档长度先验

获取原文
获取原文并翻译 | 示例

摘要

This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in a probabilistic fashion and portrays a novel way of considering document length. Furthermore, we developed a new way of combining document length priors with the query likelihood estimation based on the risk of accepting the latter as a score. This prior has been combined with a document retrieval language model that uses Jelinek-Mercer (JM), a smoothing technique which does not take into account document length. The combination of the prior boosts the retrieval performance, so that it outperforms a LM with a document length dependent smoothing component (Dirich-let prior) and other state of the art high-performing scoring function (BM25). Improvements are significant, robust across different collections and query sizes.
机译:本文解决了为信息检索的语言建模(LM)方法事先设计新文档的问题。先验是基于术语统计的,它以概率的方式得出,并描绘了一种考虑文档长度的新颖方法。此外,我们基于将文档长度先验与接受概率作为分数的风险,开发了一种将文档长度先验与查询似然估计相结合的新方法。该先验技术已与使用Jelinek-Mercer(JM)的文档检索语言模型结合,该模型是一种不考虑文档长度的平滑技术。先验的组合提高了检索性能,因此其性能优于具有文档长度相关的平滑组件(Dirich-let的先验)和其他现有技术的高性能评分功能(BM25)的LM。改进是显着的,并且跨不同的集合和查询大小都具有健壮性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号