首页> 外文期刊>IEICE Transactions on Information and Systems >Simple Weighting Techniques For Query Expansion In Biomedical Document Retrieval
【24h】

Simple Weighting Techniques For Query Expansion In Biomedical Document Retrieval

机译:生物医学文献检索中查询扩展的简单加权技术

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.
机译:在本文中,我们提出了两种加权技术来提高生物医学文档检索中查询扩展的性能,特别是当查询中的短生物医学术语以其同义词多词术语扩展时。当查询包含不同长度的同义词时,传统的IR模型会对包含较长术语的文档进行高度排名,因为较长的术语有更多机会与查询匹配。但是,这样的偏爱显然是不合适的,并且常常产生不令人满意的结果。为了缓解偏差加权问题,我们设计了一种对长多词生物医学术语中的查询词权重进行标准化的方法,以及通过使用反向术语频率(在查询域中估算出的一种新颖的统计数据)来区分术语的方法。在MEDLINE语料库上的实验结果表明,我们的两种简单技术通过调整扩展查询中对长多词术语的不充分偏好来提高检索性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号