...
首页> 外文期刊>Expert systems with applications >Improving retrievability with improved cluster-based pseudo-relevance feedback selection
【24h】

Improving retrievability with improved cluster-based pseudo-relevance feedback selection

机译:通过改进的基于聚类的伪相关反馈选择来提高可检索性

获取原文
获取原文并翻译 | 示例
           

摘要

High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. Findability is hindered by two aspects, namely the inherent bias favoring some types of documents over others introduced by the retrieval model, and the failure to correctly capture and interpret the context of conventionally rather short queries. In this paper, we analyze the bias impact of different retrieval models and query expansion strategies. We furthermore propose a novel query expansion strategy based on document clustering to identify dominant relevant documents. This helps to overcome limitations of conventional query expansion strategies that suffer strongly from the noise introduced by imperfect initial query results for pseudo-relevance feedback documents selection. Experiments with different collections of patent documents suggest that clustering based document selection for pseudo-relevance feedback is an effective approach for increasing the findability of individual documents and decreasing the bias of a retrieval system.
机译:在某些截止级别内文档的高度可查找性被认为是在面向召回的应用领域(例如专利或法律文档检索)中的重要因素。可发现性受到两个方面的阻碍,即,固有的偏向于使某些类型的文档胜过由检索模型引入的其他类型,以及无法正确捕获和解释传统上比较短的查询的上下文。在本文中,我们分析了不同检索模型和查询扩展策略的偏见影响。我们还提出了一种基于文档聚类的新颖查询扩展策略,以识别主要的相关文档。这有助于克服常规查询扩展策略的局限性,这些局限性会因伪相关反馈文档选择的不完善初始查询结果而带来的噪声而受到严重影响。不同专利文献集合的实验表明,基于聚类的伪相关反馈文献选择是增加单个文献的可发现性并降低检索系统偏差的有效方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号