首页> 美国卫生研究院文献>AMIA Summits on Translational Science Proceedings >Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS

Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS




We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in . The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract and normalize UMLS recognizable terms. When annotated with Semantic Network types, the corpus had a lexical ambiguity of 1.57 (=total types for unique lexemes / total unique lexemes) and a word occurrence ambiguity of 1.96 (=total type occurrences / total word occurrences). A set of semantic preference rules was developed and applied to completely eliminate ambiguity in semantic type assignment. The lexicon covered 95.95% UMLS-recognizable terms in our corpus. A total of 20 UMLS semantic types, representing about 17% of all the distinct semantic types assigned to corpus lexemes, covered about 80% of the vocabulary of our corpus.
机译:我们描述了一种基于语料库的方法来使用UMLS知识源创建语义词典。我们从中包含的临床试验总结的资格标准部分中提取了10,000个句子。 UMLS Metathesaurus和SPECIALIST词法工具用于提取和规范化UMLS可识别的术语。当用语义网络类型注释时,语料库的词义歧义为1.57(=唯一词位的总类型/总唯一词位)和单词出现歧义为1.96(=总类型出现/总单词出现)。开发了一组语义偏好规则,并将其应用于完全消除语义类型分配中的歧义。该词典涵盖了我们语料库中99.55%UMLS可识别的术语。共有20种UMLS语义类型,约占分配给语料库词素的所有不同语义类型的17%,涵盖了我们语料库的词汇量的约80%。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号