首页> 外文会议>IEEE International Conference on Healthcare Informatics >Analyzing Multiple Medical Corpora Using Word Embedding
【24h】

Analyzing Multiple Medical Corpora Using Word Embedding

机译:使用词嵌入分析多个医疗语料库

获取原文

摘要

Neural language models, such as word embedding, can effectively embed words into vector spaces and preserve linguistic regularities and semantic relationships. However, few researchers have shown their effectiveness on medical terms and relationships. In this paper, we study the applicability of word2vec, a well-known technique for word embedding, to embed medical terms and relations based on different medical text corpora, including biomedical abstracts of scientific papers, health-related discussion forums, and a commonly available general-purpose information resource. We empirically evaluate the applicability of this approach by studying how the word embedding projects certain classes of medical terms and relations to the word space and analyzing the differences between the three corpora for embedding medical terms and relations. Results show that the corpus of health-related discussion forum posts, authored by lay persons and medical novices, trains a comparable word embedding for popular medical terms, when compared against a professionally authored corpus of published biomedical abstracts.
机译:诸如词嵌入之类的神经语言模型可以有效地将词嵌入向量空间中,并保留语言规律性和语义关系。但是,很少有研究者在医学术语和人际关系上证明其有效性。在本文中,我们研究了word2vec(一种著名的词嵌入技术)是否适用于基于不同医学文本语料库(包括科学论文的生物医学摘要,与健康相关的讨论论坛以及常见的医学文献)来嵌入医学术语和关系的适用性。通用信息资源。我们通过研究单词嵌入如何将某些类别的医学术语和关系投射到单词空间并分析三种语料库之间嵌入医学术语和关系的差异,从经验上评估这种方法的适用性。结果表明,与专业出版的生物医学摘要语料库相比,由非专业人士和医学新手撰写的健康相关讨论论坛帖子的语料库可以训练出类似的词来嵌入流行医学术语。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号