首页> 外文期刊>ACM Transactions on Information Systems >Assessing Multivariate Bernoulli Models for Information Retrieval
【24h】

Assessing Multivariate Bernoulli Models for Information Retrieval

机译:评估用于信息检索的多元Bernoulli模型

获取原文
获取原文并翻译 | 示例
       

摘要

Although the seminal proposal to introduce language modeling in information retrieval was based on a multivariate Bernoulli model, the predominant modeling approach is now centered on multinomial models. Language modeling for retrieval based on multivariate Bernoulli distributions is seen inefficient and believed less effective than the multinomial model. In this article, we examine the multivariate Bernoulli model with respect to its successor and examine its role in future retrieval systems. In the context of Bayesian learning, these two modeling approaches are described, contrasted, and compared both theoretically and computationally. We show that the query likelihood following a multivariate Bernoulli distribution introduces interesting retrieval features which may be useful for specific retrieval tasks such as sentence retrieval. Then, we address the efficiency aspect and show that algorithms can be designed to perform retrieval efficiently for multivariate Bernoulli models, before performing an empirical comparison to study the behaviorial aspects of the models. A series of comparisons is then conducted on a number of test collections and retrieval tasks to determine the empirical and practical differences between the different models. Our results indicate that for sentence retrieval the multivariate Bernoulli model can significantly outperform the multinomial model. However, for the other tasks the multinomial model provides consistently better performance (and in most cases significantly so). An analysis of the various retrieval characteristics reveals that the multivariate Bernoulli model tends to promote long documents whose nonquery terms are informative. While this is detrimental to the task of document retrieval (documents tend to contain considerable nonquery content), it is valuable for other tasks such as sentence retrieval, where the retrieved elements are very short and focused.
机译:尽管在信息检索中引入语言建模的开创性建议是基于多元伯努利模型的,但主要的建模方法现在集中在多项模型上。基于多元伯努利分布进行检索的语言模型效率低下,并且被认为不如多项式模型有效。在本文中,我们就其后继者检查了多元伯努利模型,并考察了其在未来检索系统中的作用。在贝叶斯学习的上下文中,从理论上和计算上对这两种建模方法进行了描述,对比和比较。我们表明,遵循多元Bernoulli分布的查询可能性引入了有趣的检索功能,这些功能可能对特定的检索任务(例如句子检索)有用。然后,我们讨论效率方面的问题,并表明在进行实证比较以研究模型的行为方面之前,可以设计算法来对多元伯努利模型进行有效的检索。然后,对许多测试集合和检索任务进行一系列比较,以确定不同模型之间的经验和实践差异。我们的结果表明,对于句子检索,多元伯努利模型可以明显优于多项模型。但是,对于其他任务,多项式模型始终提供更好的性能(在大多数情况下,性能显着提高)。对各种检索特征的分析表明,多元伯努利模型倾向于推广长文档,这些文档的非查询术语是有信息的。虽然这不利于文档检索的任务(文档往往包含大量的非查询内容),但对于诸如句子检索之类的其他任务却很有价值,在这些任务中,所检索的元素非常简短且集中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号