首页> 外文期刊>Knowledge and information systems >Information retrieval with concept-based pseudo-relevance feedback in MEDLINE
【24h】

Information retrieval with concept-based pseudo-relevance feedback in MEDLINE

机译:MEDLINE中基于概念的伪相关反馈的信息检索

获取原文
获取原文并翻译 | 示例
           

摘要

Although using domain specific knowledge sources for information retrieval yields more accurate results compared to pure keyword-based methods, more improvements can be achieved by considering both relations between concepts in an ontology and also their statistical dependencies over the corpus. In this paper, an innovative approach named concept-based pseudo-relevance feedback is introduced for improving accuracy of biomedical retrieval systems. Proposed method uses a hybrid retrieval algorithm for discovering relevancy between queries and documents which is based on a combination of keyword- and concept-based approaches. It also uses a pseudo-relevance feedback mechanism for expanding initial queries with auxiliary biomedical concepts extracted from top-ranked results of hybrid information retrieval. Using concept-based similarities makes it possible for the system to detect related documents to users' queries, which are semantically close to each other while not necessarily sharing common keywords. In addition, expanding initial queries with concepts introduced by pseudo-relevance feedback captures those relations between queries and documents, which rely on statistical dependencies between concepts they contain. As a matter of fact, these relations may remain undetected, examining merely existing links between concepts in an external knowledge source. Proposed approach is evaluated using OHSUMED test collection and standard evaluation methods from text retrieval conference (TREC). Experimental results on MEDLINE documents (in OHSUMED collection) show 21% improvement over keyword-based approach in terms of mean average precision, which is a noticeable gain.
机译:尽管与基于纯关键字的方法相比,使用领域特定的知识源进行信息检索会产生更准确的结果,但可以通过考虑本体中概念之间的关系以及它们对语料库的统计依赖性来实现更多的改进。在本文中,介绍了一种创新的方法,称为基于概念的伪相关反馈,用于提高生物医学检索系统的准确性。所提出的方法使用混合检索算法来发现查询和文档之间的相关性,该算法基于基于关键字和基于概念的方法的组合。它还使用伪相关性反馈机制,利用从混合信息检索的最高排名结果中提取的辅助生物医学概念扩展初始查询。使用基于概念的相似性可使系统检测与用户查询相关的文档,这些文档在语义上彼此接近,而不必共享公共关键字。此外,使用伪相关反馈引入的概念扩展初始查询会捕获查询和文档之间的那些关系,这些关系依赖于它们所包含的概念之间的统计依赖性。实际上,仅检查外部知识源中概念之间的现有联系,就可能不会发现这些关系。使用OHSUMED测试集合和文本检索会议(TREC)的标准评估方法对提议的方法进行评估。在MEDLINE文档(在OHSUMED集合中)上的实验结果显示,与基于关键字的方法相比,在平均平均精度方面提高了21%,这是一个明显的进步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号