首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >A Lexical Approach to Identifying Subtype Inconsistencies in Biomedical Terminologies
【24h】

A Lexical Approach to Identifying Subtype Inconsistencies in Biomedical Terminologies

机译:一种识别生物医学术语中亚型不一致的词汇方法

获取原文

摘要

We introduce a lexical-based inference approach for identifying subtype (or is a relation) inconsistencies in biomedical terminologies. Given a terminology, we first represent the name of each concept in the terminology as a sequence of words. We then generate hierarchically-linked and -unlinked pairs of concepts, such that the two concepts in a pair have the same number of words, and contain at least one word in common and a fixed number n of different words (n = 1, 2, 3, 4, 5). The linked and unlinked concept-pairs further infer corresponding linked and unlinked term-pairs, respectively. If a linked concept-pair and an unlinked concept-pair infer the same term-pair, we consider this as a potential subtype inconsistency, which may indicate a missing subtype relation or an incorrect subtype relation. We applied this approach to Gene Ontology (GO), National Cancer Institute thesaurus (NCIt) and SNOMED CT. A total of 4,841 potential subtype inconsistencies were found in GO, 2,677 in NCIt, and 53,782 in SNOMED CT. Domain experts evaluated a random sample of 211 potential inconsistencies in GO, and verified that 124 of them are valid (i.e., a precision of 58.77% for detecting subtype inconsistencies in GO). We also performed a preliminary study on the extent to which external knowledge in the Unified Medical Language System (UMLS) can provide supporting evidence for validating the detected potential inconsistencies: 0.54% (=26/4841) for GO, 11.43% (=306/2677) for NCIt, and 3.61% (=1940/53782) for SNOMED CT. Results indicate that our lexical-based inference approach is a promising way to identify subtype inconsistencies and facilitates the quality improvement of biomedical terminologies.
机译:我们介绍了一种基于词汇的推断方法,用于识别生物医学术语中的子类型(或是关系)不一致。鉴于术语,我们首先将术语中的每个概念的名称称为一系列单词。然后,我们生成分层链接和链接的概念,使得一对中的两个概念具有相同数量的单词,并且包含至少一个字的单词和不同单词的固定数n(n = 1,2 ,3,4,5)。链接和未链接的概念对分别进行相应的链接和未链接的术语对。如果链接概念对和解释概念对推断出相同的术语对,则认为这是一个潜在的子类型不一致,这可以指示缺少的子类型关系或不正确的子类型关系。我们将这种方法应用于基因本体(GO),国家癌症研究所词库(NCIT)和Snomed CT。在NCIT的GO,2,677中,总共发现了4,841个潜在的亚型不一致,SNOMED CT中为53,782。域名专家评估了211次潜在不一致的随机样本,并核实其中124人有效(即,用于检测到GO亚型不一致的58.77%的精度)。我们还对统一医疗语言系统(UMLS)中的外部知识(UMLS)的程度进行了初步研究,可以提供验证检测到的潜在不一致的支持证据:0.54%(= 26/4841),11.43%(= 306 / 2677)对于NCIT,3.61%(= 1940/53782)的SnoMed CT。结果表明,基于词汇的推断方法是识别亚型不一致的有希望的方式,并促进生物医学术语的质量改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号