首页> 外文期刊>ACM Transactions on Information Systems >A Study of Smoothing Methods for Language Models Applied to Information Retrieval
【24h】

A Study of Smoothing Methods for Language Models Applied to Information Retrieval

机译:信息检索语言模型的平滑方法研究

获取原文
获取原文并翻译 | 示例

摘要

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea of these approaches is to estimate a language model for each document, and to then rank documents by the likelihood of the query according to the estimated language model. A central issue in language model estimation is smoothing, the problem of adjusting the maximum likelihood estimator to compensate for data sparaeness. In this article, we study the problem of language model smoothing and its influence on retrieval performance. "We examine the sensitivity of retrieval performance to the smoothing parameters and compare several popular smoothing methods on different test collections. Experimental results show that not only is the retrieval performance generally sensitive to the smoothing parameters, but also the sensitivity patten is affected by the query type, with performance being more sensitive to smoothing for verbose queries than for keyword queries. Verbose queries also generally require more aggressive smoothing to achieve optimal performance. This suggests that smoothing plays two different role- to make the estimated document language model more accurate and to "explain" the noninformative words in the query. In order to decouple these two distinct roles of smoothing, we propose a two-stage smoothing strategy, which yields better sensitivity patterns and facilitates the setting of smoothing parameters automatically. We further propose methods for estimating the smoothing parameters automatically. Evaluation on five different databases and four types of queries indicates that the two-stage smoothing method with the proposed parameter estimation methods consistently gives retrieval performance that is close to -or better than- the best results achieved using a single smoothing method and exhaustive parameter search on the test data.
机译:用于信息检索的语言建模方法具有吸引力和前景,因为它们将检索问题与语言模型估计的问题联系在一起,而语言模型估计已在语音识别等其他应用领域中进行了广泛研究。这些方法的基本思想是估计每个文档的语言模型,然后根据估计的语言模型通过查询的可能性对文档进行排名。语言模型估计中的一个中心问题是平滑,即调整最大似然估计器以补偿数据稀疏性的问题。在本文中,我们研究了语言模型平滑的问题及其对检索性能的影响。 “我们检查了检索性能对平滑参数的敏感性,并在不同的测试集上比较了几种流行的平滑方法。实验结果表明,不仅检索性能通常对平滑参数敏感,而且查询的样式也会受到影响类型,详细查询的性能比关键字查询对平滑更敏感,详细查询通常也需要更积极的平滑才能达到最佳性能,这表明平滑起着两种不同的作用-使估算的文档语言模型更准确,并且为了解释查询中的非信息性单词,为了使这两个不同的平滑作用脱钩,我们提出了一种两阶段平滑策略,该策略可产生更好的灵敏度模式并便于自动设置平滑参数,并进一步提出了估算方法自动平滑参数。对五个不同数据库和四种查询类型的分析表明,采用建议的参数估计方法的两阶段平滑方法始终提供与使用单个平滑方法和详尽的参数搜索所获得的最佳结果相近或优于其的检索性能。在测试数据上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号