首页> 外文OA文献 >Neural sentence embedding models for semantic similarity estimation in the biomedical domain
【2h】

Neural sentence embedding models for semantic similarity estimation in the biomedical domain

机译:神经句子嵌入生物医学域中语义相似性估计的模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Abstract Background Neural network based embedding models are receiving significant attention in the field of natural language processing due to their capability to effectively capture semantic information representing words, sentences or even larger text elements in low-dimensional vector space. While current state-of-the-art models for assessing the semantic similarity of textual statements from biomedical publications depend on the availability of laboriously curated ontologies, unsupervised neural embedding models only require large text corpora as input and do not need manual curation. In this study, we investigated the efficacy of current state-of-the-art neural sentence embedding models for semantic similarity estimation of sentences from biomedical literature. We trained different neural embedding models on 1.7 million articles from the PubMed Open Access dataset, and evaluated them based on a biomedical benchmark set containing 100 sentence pairs annotated by human experts and a smaller contradiction subset derived from the original benchmark set. Results Experimental results showed that, with a Pearson correlation of 0.819, our best unsupervised model based on the Paragraph Vector Distributed Memory algorithm outperforms previous state-of-the-art results achieved on the BIOSSES biomedical benchmark set. Moreover, our proposed supervised model that combines different string-based similarity metrics with a neural embedding model surpasses previous ontology-dependent supervised state-of-the-art approaches in terms of Pearson’s r (r = 0.871) on the biomedical benchmark set. In contrast to the promising results for the original benchmark, we found our best models’ performance on the smaller contradiction subset to be poor. Conclusions In this study, we have highlighted the value of neural network-based models for semantic similarity estimation in the biomedical domain by showing that they can keep up with and even surpass previous state-of-the-art approaches for semantic similarity estimation that depend on the availability of laboriously curated ontologies, when evaluated on a biomedical benchmark set. Capturing contradictions and negations in biomedical sentences, however, emerged as an essential area for further work.
机译:摘要基于神经网络的嵌入模型在自然语言处理领域,由于其能力而有效地捕获代表单词,句子或甚至更大的文本元素在低维矢量空间中的能力。虽然当前用于评估生物医学出版物的文本语句的语义相似性的最先进模型取决于愈合本体的可用性,但无监督的神经嵌入模型只需要大型文本语料库作为输入,并且不需要手动策策。在这项研究中,我们调查了当前最先进的神经句子嵌入模型的疗效,以获得生物医学文献的句子语义相似性估算。我们从考研开放存取数据集170万页训练有素的文章不同的神经嵌入模型,并评估他们基于含有人类专家注释的100句对生物医学基准组,并从原来的基准组中得出的一个小矛盾子集。结果实验结果表明,随着Pearson相关性为0.819,我们基于段落分布式存储器算法的最佳无监督模型优于生物医学基准组上的先前最先进的结果。此外,我们拟议的监督模型与神经嵌入式模型结合不同的基于字符串的相似度指标,在生物医学基准组上的Pearson的R(r = 0.871)方面超越了先前的本体依赖性监督的最新方法。与原始基准的有希望的结果相比,我们发现我们在较小的矛盾子集上的最佳模型性能差。结论在这项研究中,我们突出了生物医学领域的语义相似性估计的基于神经网络的模型的价值,通过表示它们可以跟上甚至超越以前的最先进的方法,依赖于语义相似性估算在生物医学基准集合评估时,在艰苦的策划本体的可用性上。然而,捕捉生物医学句子中的矛盾和否定,作为进一步工作的重要领域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号