首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Multi-field query expansion is effective for biomedical dataset retrieval
【2h】

Multi-field query expansion is effective for biomedical dataset retrieval

机译:多字段查询扩展对生物医学数据集检索有效

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one.
机译:在针对生物医学数据集信息检索的bioCADDIE挑战的背景下,我们提出了一种通过查询重构来检索具有异构模式的生物医学数据集的方法。特别地,所提出的方法将初始查询转换为多字段查询,然后用可能在相关数据集中出现的术语丰富该查询。我们比较并评估了两种查询扩展策略,一种基于Rocchio方法,另一种基于生物医学词典。然后,我们对用于生物医学检索的bioCADDIE数据集进行了我们方法的全面比较评估。我们证明了与两个基线相比,我们的多字段查询方法的有效性,MAP从0.2171和0.2669改善到0.2996。我们还展示了查询扩展的好处,其中Rocchio扩展方法将两个基线的MAP从0.2171和0.2669改进为0.335。我们显示,Rocchio查询扩展方法略胜于以生物医学词典为术语源的方法,与MAP相比,改进了大约3%。但是,基于生物医学词典的查询扩展方法所需的资源少得多,因为它不需要计算任何相关性反馈集或查询的任何初始执行。因此,在效率,执行时间和检索准确性之间进行权衡时,我们认为基于生物医学词典的查询扩展方法为打算大规模使用的原型生物医学数据搜索引擎提供了最佳性能。在正式的bioCADDIE挑战结果中,尽管我们的方法在infNDCG评估指标上排名第七,但在P @ 10和NDCG方面排名第二。因此,相对于其他竞争对手的方法,此处提出的方法提供了总体良好的检索性能。因此,本文中的观察结果应该有益于数据发现索引原型的开发或现有原型的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号