首页> 外文会议>IEEE International Symposium on Computer-Based Medical Systems >STMC: Semantic Tag Medical Concept Using Word2Vec Representation
【24h】

STMC: Semantic Tag Medical Concept Using Word2Vec Representation

机译:STMC:使用Word2Vec表示法的语义标签医疗概念

获取原文

摘要

In this paper we propose a recognition system of medical concepts from free text clinical reports. Our approach tries to recognize also concepts which are named with local terminology, with medical writing scripts, short words, abbreviations and even spelling mistakes. We consider a clinical terminology ontology (Snomed-CT), as a dictionary of concepts. In a first step we obtain an embedding model using word2vec methodology from a big corpus database of clinical reports. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space, and so the geometrical similarity can be considered a measure of semantic relation. We have considered 615513 emergency clinical reports from the Hospital "Rafael Méndez" in Lorca, Murcia. In these reports there are a lot of local language of the emergency domain, medical writing scripts, short words, abbreviations and even spelling mistakes. With the model obtained we represent the words and sentences as vectors, and by applying cosine similarity we identify which concepts of the ontology are named in the text. Finally, we represent the clinical reports (EHR) like a bag of concepts, and use this representation to search similar documents. The paper illustrates 1) how we build the word2vec model from the free text clinical reports, 2) How we extend the embedding from words to sentences, and 3) how we use the cosine similarity to identify concepts. The experimentation, and expert human validation, shows that: a) the concepts named in the text with the ontology terminology are well recognized, and b) others concepts that are not named with the ontology terminology are also recognized, obtaining a high precision and recall measures.
机译:在本文中,我们提出了一种基于自由文本临床报告的医学概念识别系统。我们的方法还试图识别以本地术语命名的概念,包括医学写作脚本,短单词,缩写词甚至拼写错误。我们将临床术语本体(Snomed-CT)视为概念的词典。第一步,我们使用word2vec方法从大型临床报告数据库中获得嵌入模型。词向量位于向量空间中,以便在语料库中共享公共上下文的词在空间中彼此紧邻,因此,几何相似度可以视为语义关系的一种度量。我们已经考虑了穆尔西亚洛卡“ RafaelMéndez”医院的615513紧急临床报告。在这些报告中,有很多紧急领域的本地语言,医疗书面文字,短单词,缩写词甚至是拼写错误。使用获得的模型,我们将单词和句子表示为向量,并且通过应用余弦相似度,我们可以确定文本中命名了本体的哪些概念。最后,我们将临床报告(EHR)像一堆概念一样表示,并使用此表示来搜索相似的文档。本文说明了1)我们如何从自由文本临床报告中构建word2vec模型,2)如何将嵌入从单词扩展到句子,以及3)如何使用余弦相似度来识别概念。实验和专家的人工验证表明:a)文本中使用本体术语命名的概念得到了很好的认识,并且b)其他未使用本体术语命名的概念也得到了识别,从而获得了较高的精确度和召回率措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号