Probabilistic Document Length Priors for Language Models

机译：语言模型的概率文档长度先验

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in a probabilistic fashion and portrays a novel way of considering document length. Furthermore, we developed a new way of combining document length priors with the query likelihood estimation based on the risk of accepting the latter as a score. This prior has been combined with a document retrieval language model that uses Jelinek-Mercer (JM), a smoothing technique which does not take into account document length. The combination of the prior boosts the retrieval performance, so that it outperforms a LM with a document length dependent smoothing component (Dirich-let prior) and other state of the art high-performing scoring function (BM25). Improvements are significant, robust across different collections and query sizes.

机译：本文解决了为信息检索的语言建模（LM）方法事先设计新文档的问题。先验是基于术语统计的，它以概率的方式得出，并描绘了一种考虑文档长度的新颖方法。此外，我们基于将文档长度先验与接受概率作为分数的风险，开发了一种将文档长度先验与查询似然估计相结合的新方法。该先验技术已与使用Jelinek-Mercer（JM）的文档检索语言模型结合，该模型是一种不考虑文档长度的平滑技术。先验的组合提高了检索性能，因此其性能优于具有文档长度相关的平滑组件（Dirich-let的先验）和其他现有技术的高性能评分功能（BM25）的LM。改进是显着的，并且跨不同的集合和查询大小都具有健壮性。

著录项

来源
《Advances in Information Retrieval》|2008年|P.394-405|共12页
会议地点 Glasgow(GB);Glasgow(GB)
作者
Roi Blanco; Alvaro Barreiro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Smoothing document language models with probabilistic term count propagation [J] . Azadeh Shakery, ChengXiang Zhai Information retrieval . 2008,第2期

机译：通过概率术语计数传播平滑文档语言模型
2. An analysis on document length retrieval trends in language modeling smoothing [J] . David E. Losada, Leif Azzopardi Information retrieval . 2008,第2期

机译：语言建模平滑中文档长度检索趋势分析
3. A similarity between probabilistic tree languages: application to XML document families [J] . Carrasco RC., Rico-Juan JR. Pattern Recognition: The Journal of the Pattern Recognition Society . 2003,第9期

机译：概率树语言之间的相似之处：应用于XML文档族
4. Probabilistic Document Length Priors for Language Models [C] . Roi Blanco, Alvaro Barreiro European Conference on IR Research . 2008

机译：语言模型的概率文档长度指数
5. Computer-assisted transformation of design documents from a natural language description to structured modeling languages. [D] . Chen, Lei. 2008

机译：计算机辅助设计文档从自然语言描述到结构化建模语言的转换。
6. From the Cover: Automated reconstruction of ancient languages using probabilistic models of sound change [O] . Alexandre Bouchard-Côté, David Hall, Thomas L. Griffiths, 2013

机译：从封面开始：使用声音变化的概率模型自动重建古代语言
7. Probabilistic document length priors for language models [O] . Roi Blanco, Alvaro Barreiro 2008

机译：语言模型的概率文档长度先验

Probabilistic Document Length Priors for Language Models

摘要

著录项

相似文献

相关主题

期刊订阅