首页> 外文期刊>Information retrieval >Mining document, concept, and term associations for effective biomedical retrieval: introducing MeSH-enhanced retrieval models
【24h】

Mining document, concept, and term associations for effective biomedical retrieval: introducing MeSH-enhanced retrieval models

机译:挖掘文档,概念和术语关联以进行有效的生物医学检索:引入MeSH增强的检索模型

获取原文
获取原文并翻译 | 示例
       

摘要

Manually assigned subject terms, such as Medical Subject Headings (MeSH) in the health domain, describe the concepts or topics of a document. Existing information retrieval models do not take full advantage of such information. In this paper, we propose two MeSH-enhanced (ME) retrieval models that integrate the concept layer (i.e. MeSH) into the language modeling framework to improve retrieval performance. The new models quantify associations between documents and their assigned concepts to construct conceptual representations for the documents, and mine associations between concepts and terms to construct generative concept models. The two ME models reconstruct two essential estimation processes of the relevance model (Lavrenko and Croft 2001) by incorporating the document-concept and the concept-term associations. More specifically, in Model 1, language models of the pseudo-feedback documents are enriched by their assigned concepts. In Model 2, concepts that are related to users' queries are first identified, and then used to reweight the pseudo-feedback documents according to the document-concept associations. Experiments carried out on two standard test collections show that the ME models outperformed the query likelihood model, the relevance model (RM3), and an earlier ME model. A detailed case analysis provides insight into how and why the new models improve/worsen retrieval performance. Implications and limitations of the study are discussed. This study provides new ways to formally incorporate semantic annotations, such as subject terms, into retrieval models. The findings of this study suggest that integrating the concept layer into retrieval models can further improve the performance over the current state-of-the-art models.
机译:手动分配的主题词,例如健康领域中的医学主题词(MeSH),描述了文档的概念或主题。现有的信息检索模型不能充分利用这些信息。在本文中,我们提出了两个MeSH增强(ME)检索模型,这些模型将概念层(即MeSH)集成到语言建模框架中以提高检索性能。新模型量化了文档及其分配的概念之间的关联以构建文档的概念表示,并挖掘了概念与术语之间的关联以构建生成性概念模型。这两个ME模型通过合并文档概念和概念术语关联,重构了相关性模型的两个基本估计过程(Lavrenko和Croft 2001)。更具体地说,在模型1中,伪反馈文档的语言模型通过其分配的概念得以丰富。在模型2中,首先确定与用户查询有关的概念,然后根据文档概念的关联来对伪反馈文档进行加权。在两个标准测试集合上进行的实验表明,ME模型优于查询似然模型,相关性模型(RM3)和早期的ME模型。详细的案例分析可洞悉新模型如何以及为何改善/恶化检索性能。讨论了研究的意义和局限性。这项研究提供了将语义注释(例如主题词)正式纳入检索模型的新方法。这项研究的结果表明,将概念层集成到检索模型中可以进一步提高性能,优于当前的最新模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号