首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Medical Concept Normalization by Encoding Target Knowledge
【24h】

Medical Concept Normalization by Encoding Target Knowledge

机译:通过编码目标知识的医学概念标准化

获取原文
           

摘要

Medical concept normalization aims to map a variable length message such as, ’unable to sleep’ to an entry in a target medical lexicon, such as ’Insomnia’. Current approaches formulate medical concept normalization as a supervised text classification problem. This formulation has several drawbacks. First, creating training data requires manually mapping medical concept mentions to their corresponding entries in a target lexicon. Second, these models fail to map a mention to the target concepts which were not encountered during the training phase. Lastly, these models have to be retrained from scratch whenever new concepts are added to the target lexicon. In this work we propose a method which overcomes these limitations. We first use various text and graph embedding methods to encode medical concepts into an embedding space. We then train a model which transforms concept mentions into vectors in this target embedding space. Finally, we use cosine similarity to find the nearest medical concept to a given input medical concept mention. Our model scales to millions of target concepts and trivially accommodates growing target lexicon size without incurring significant computational cost. Experimental results show that our model outperforms the previous state-of-the-art by 4.2{%} and 6.3{%} classification accuracy across two benchmark datasets. We also present a variety of studies to evaluate the robustness of our model under different training conditions.
机译:医学概念标准化旨在映射可变长度的消息,例如“无法睡眠”到目标医疗词典中的条目,例如“失眠”。目前的方法将医学概念标准化标准为监督文本分类问题。该配方具有几个缺点。首先,创建培训数据需要手动映射到目标词典中的相应条目的医学概念。其次,这些模型未能提及在训练阶段不遇到的目标概念。最后,每当新概念添加到目标词典时,必须从划痕中重新训练这些模型。在这项工作中,我们提出了一种克服了这些限制的方法。我们首先使用各种文本和图形嵌入方法来编码医疗概念到嵌入空间。然后,我们训练一个模型,将概念改变为此目标嵌入空间的向量。最后,我们使用余弦相似度来找到给定的输入医学概念的最近的医学概念。我们的模型尺寸为数百万个目标概念,并且在没有强大的计算成本的情况下延长目标词典大小。实验结果表明,我们的模型优于前一个最先进的4.2 {%}和6.3 {%}在两个基准数据集中的分类准确性。我们还提出了各种研究,以评估我们模型的鲁棒性在不同的培训条件下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号