首页> 外文期刊>Database >Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation
【24h】

Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation

机译:在PubMed查询中发现生物医学语义关系以进行信息检索和数据库管理

获取原文
       

摘要

Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.' With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted.
机译:从文献中鉴定相关论文是生物固化中的常见任务。当前大多数生物医学文献搜索系统主要依靠匹配的用户关键字。另一方面,语义搜索试图通过了解用户关键字中的实体和上下文关系来提高搜索准确性。但是,过去的研究主要集中在语义识别生物实体(例如化学物质,疾病和基因)上,而很少发现语义关系。在这项工作中,我们旨在以自动化和无监督的方式发现PubMed查询中的生物医学语义关系。具体来说,我们专注于提取和理解PubMed用户用来表示实体之间的语义关系的上下文信息(或上下文模式),例如“ CHEMICAL-1与CHEMICAL-2”。随着自动命名实体识别的进步,我们首先在PubMed查询中标记实体,然后将标记的实体用作识别模式语义的知识。更具体地说,我们将PubMed查询转换为涉及参与实体的上下文模式,随后通过潜在语义分析(LSA)将其投影到潜在主题,以避免数据稀疏和特殊性问题。最后,我们基于LSA主题分布挖掘语义相似的上下文模式或语义关系。我们的两个单独的化学-化学(CC)和化学-疾病(CD)关系评估实验表明,该方法明显优于基线方法,该方法仅通过参与实体的相似性来测量模式语义。与标准化的贴现累积收益(nDCG)(一种排名质量的标准衡量标准)相比,我们的方法在CC和CD任务上实现的最高性能分别接近0.9和0.85。这些结果表明,我们的方法可以有效地识别并返回涵盖各种生物实体关系的排序语义模式。为了评估给定关系的自动排名模式在语义搜索中的潜在效用,我们对PubMed中常见的语义关系进行了初步研究,并基于事后人类相关性评估观察了文献检索效率的提高。保证在大型测试和实际场景中进行进一步调查。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号