...
首页> 外文期刊>ACM Transactions on Information Systems >Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness
【24h】

Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness

机译:基于测量随机性差异的信息检索概率模型

获取原文
获取原文并翻译 | 示例
           

摘要

We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose-Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document-query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model.
机译:我们介绍并创建一个框架来推导信息检索的概率模型。这些模型是通过语言模型方法获得的IR的非参数模型。通过测量实际项分布与随机过程中获得的项之间的差异,得出项加权模型。在随机过程中,我们研究了二项分布和Bose-Einstein统计量。我们定义了两种类型的术语频率归一化,用于在文档查询匹配过程中调整术语权重。第一个规范化假设文档的长度相同,并且一旦被接受为观察文档的良好描述符,就使用观察术语来测量信息增益。第二种规范化与文档长度和其他统计信息有关。将这两种归一化方法相继应用于基本模型以获得加权公式。结果表明,我们的框架产生了不同的非参数模型,形成了标准tf-idf模型的基准替代品。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号