Assessing Multivariate Bernoulli Models for Information Retrieval

DAVID E. LOSADA; LEIF AZZOPARDI

首页> 外文期刊>ACM Transactions on Information Systems >Assessing Multivariate Bernoulli Models for Information Retrieval

【24h】

Assessing Multivariate Bernoulli Models for Information Retrieval

机译：评估用于信息检索的多元Bernoulli模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Although the seminal proposal to introduce language modeling in information retrieval was based on a multivariate Bernoulli model, the predominant modeling approach is now centered on multinomial models. Language modeling for retrieval based on multivariate Bernoulli distributions is seen inefficient and believed less effective than the multinomial model. In this article, we examine the multivariate Bernoulli model with respect to its successor and examine its role in future retrieval systems. In the context of Bayesian learning, these two modeling approaches are described, contrasted, and compared both theoretically and computationally. We show that the query likelihood following a multivariate Bernoulli distribution introduces interesting retrieval features which may be useful for specific retrieval tasks such as sentence retrieval. Then, we address the efficiency aspect and show that algorithms can be designed to perform retrieval efficiently for multivariate Bernoulli models, before performing an empirical comparison to study the behaviorial aspects of the models. A series of comparisons is then conducted on a number of test collections and retrieval tasks to determine the empirical and practical differences between the different models. Our results indicate that for sentence retrieval the multivariate Bernoulli model can significantly outperform the multinomial model. However, for the other tasks the multinomial model provides consistently better performance (and in most cases significantly so). An analysis of the various retrieval characteristics reveals that the multivariate Bernoulli model tends to promote long documents whose nonquery terms are informative. While this is detrimental to the task of document retrieval (documents tend to contain considerable nonquery content), it is valuable for other tasks such as sentence retrieval, where the retrieved elements are very short and focused.

机译：尽管在信息检索中引入语言建模的开创性建议是基于多元伯努利模型的，但主要的建模方法现在集中在多项模型上。基于多元伯努利分布进行检索的语言模型效率低下，并且被认为不如多项式模型有效。在本文中，我们就其后继者检查了多元伯努利模型，并考察了其在未来检索系统中的作用。在贝叶斯学习的上下文中，从理论上和计算上对这两种建模方法进行了描述，对比和比较。我们表明，遵循多元Bernoulli分布的查询可能性引入了有趣的检索功能，这些功能可能对特定的检索任务（例如句子检索）有用。然后，我们讨论效率方面的问题，并表明在进行实证比较以研究模型的行为方面之前，可以设计算法来对多元伯努利模型进行有效的检索。然后，对许多测试集合和检索任务进行一系列比较，以确定不同模型之间的经验和实践差异。我们的结果表明，对于句子检索，多元伯努利模型可以明显优于多项模型。但是，对于其他任务，多项式模型始终提供更好的性能（在大多数情况下，性能显着提高）。对各种检索特征的分析表明，多元伯努利模型倾向于推广长文档，这些文档的非查询术语是有信息的。虽然这不利于文档检索的任务（文档往往包含大量的非查询内容），但对于诸如句子检索之类的其他任务却很有价值，在这些任务中，所检索的元素非常简短且集中。

著录项

来源
《ACM Transactions on Information Systems》 |2008年第3期|p.193-238|共46页
作者
DAVID E. LOSADA; LEIF AZZOPARDI;
展开▼
作者单位

Departamento de Electronica y Computation, Universidad de Santiago de Compostela, Spain;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
information retrieval; language models; multinomial; multi- variate Bernoulli;

机译：信息检索;语言模型;多项式;多元伯努利;
入库时间 2022-08-18 00:45:56

相似文献

外文文献
中文文献
专利

1. Relaxed Multivariate Bernoulli Distribution and Its Applications to Deep Generative Models [J] . Xi Wang, Junming Yin JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：轻松的多变量Bernoulli分配及其在深发电模型的应用
2. Loglinear representations of multivariate Bernoulli Rasch models [J] . Hessen D.J. The British journal of mathematical and statistical psychology . 2011,第2期

机译：多元Bernoulli Rasch模型的对数线性表示
3. Multivariate bernoulli mixture models with application to postmortem tissue studies in schizophrenia. [J] . Sun Z, Rosen O, Sampson AR Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2007,第3期

机译：多变量伯努利混合物模型在精神分裂症的死后组织研究中的应用。
4. N-gram pattern recognition using multivariate-Bernoulli model with smoothing methods for text classification [C] . Zeynep Hilal Kilimci, Selim Akyokuş Signal Processing and Communication Application Conference . 2016

机译：使用多元伯努利模型和平滑方法的N元语法模式识别文本分类
5. Comparative Spectral Analysis of Flexible Structure Models: The Euler-Bernoulli Beam Model, the Rayleigh Beam Model, and the Timoshenko Beam Model [D] . Nguyen, Anhhong Rose 2017

机译：柔性结构模型的比较光谱分析：Euler-Bernoulli梁模型，Rayleigh梁模型和Timoshenko梁模型
6. Multivariate Bernoulli Mixture Models with Application to Postmortem Tissue Studies in Schizophrenia [O] . Zhuoxin Sun, Ori Rosen, Allan R. Sampson -1

机译：多元伯努利混合物模型在精神分裂症的死后组织研究中的应用
7. A randomised approach for NARX model identification based on a multivariate Bernoulli distribution [O] . Bianchi, F., Falsone, A., Prandini, M., 2017

机译：基于多元伯努利分布的NARX模型识别随机方法
8. Maximum likelihood estimation and the multivariate Bernoulli distribution: An application to reliability [R] . Kvam, P. H. 1994

机译：最大似然估计和多变量伯努利分布：可靠性的应用

Assessing Multivariate Bernoulli Models for Information Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅