首页> 外文会议>Workshop on biomedical natural language processing >Evaluating distributed word representations for capturing semantics of biomedical concepts
【24h】

Evaluating distributed word representations for capturing semantics of biomedical concepts

机译:评估用于捕获生物医学概念的语义的分布式字表示

获取原文

摘要

Recently there is a surge in interest in learning vector representations of words using huge corpus in unsupervised manner. Such word vector representations, also known as word embedding, have been shown to improve the performance of machine learning models in several NLP tasks. However efficiency of such representation has not been systematically evaluated in biomedical domain. In this work our aim is to compare the performance of two state-of-the-art word embedding methods, namely word2vec and GloVe on a basic task of reflecting semantic similarity and relatedness of biomedical concepts. For this, vector representations of all unique words in the corpus of more than 1 million full-length research articles in biomedical domain are obtained from the two methods. We observe that parameters of these models do affect their ability to capture lexico-semantic properties and word2vec with particular language modeling seems to perform better than others.
机译:最近,在无监督的方式使用巨大的语料库,有兴趣的兴趣学习矢量表示。已经显示出这样的文字矢量表示,也称为Word嵌入,以提高几个NLP任务中机器学习模型的性能。然而,在生物医学域中尚未系统地评估这种代表的效率。在这项工作中,我们的目标是比较两个最先进的单词嵌入方法的性能,即Word2Vec和手套上反映生物医学概念的语义相似性和相关性的基本任务。为此,从两种方法获得了生物医学域中超过100万的全长研究文章的语料库中所有独特单词的矢量表示。我们观察到这些模型的参数确实影响了他们捕获词汇语义属性的能力,并且具有特定语言建模的Word2VEC似乎比其他语言建模更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号