首页> 外文会议>Advances in Information Retrieval >Query-Based Inter-document Similarity Using Probabilistic Co-relevance Model
【24h】

Query-Based Inter-document Similarity Using Probabilistic Co-relevance Model

机译:基于概率关联模型的基于查询的文档间相似度

获取原文
获取原文并翻译 | 示例

摘要

Inter-document similarity is the critical information which determines whether or not the cluster-based retrieval improves the baseline. However, a theoretical work on inter-document similarity has not been investigated, even though such work can provide a principle to define a more improved similarity in a well-motivated direction. To support this theory, this paper starts from pursuing an ideal inter-document similarity that optimally satisfies the cluster-hypothesis. We propose a probabilistic principle of inter-document similarities; the optimal similarity of two documents should be proportional to the probability that they are co-relevant to an arbitrary query. Based on this principle, the study of the inter-document similarity is formulated to attack the estimation problem of the co-relevance model of documents. Furthermore, we obtain that the optimal inter-document similarity should be defined using queries as its basic unit, not terms, namely a query-based similarity. We strictly derive a novel query-based similarity from the co-relevance model, without any heuristics. Experimental results show that the new query-based inter-document similarity significantly improves the previously-used term-based similarity in the context of Voorhee's evaluation measure.
机译:文档之间的相似性是决定基于群集的检索是否改善基线的关键信息。但是,尚未研究有关文档间相似性的理论工作,即使此类工作可以提供一个原则,以在动机良好的方向上定义更好的相似性。为了支持这一理论,本文从追求理想的文档间相似度开始,该相似度可以最佳地满足聚类假设。我们提出了文档间相似性的概率原则;两个文档的最佳相似度应与它们与任意查询相关的概率成正比。基于这一原理,对文档间相似度进行了研究,以解决文档的相关度模型的估计问题。此外,我们获得了最佳文档间相似度应使用查询作为其基本单位而不是术语(即基于查询的相似度)来定义。我们严格地从互相关模型中得出一种新颖的基于查询的相似性,而没有任何启发式方法。实验结果表明,在Voorhee评估方法的背景下,新的基于查询的文档间相似度显着提高了以前使用的基于术语的相似度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号