首页> 外文期刊>BMC Bioinformatics >ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
【24h】

ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus

机译:ContextD:一种算法,用于识别荷兰临床语料库中医学术语的上下文属性

获取原文
           

摘要

Background In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists’ letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners’ entries and a regular expression based temporality module. Results The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. Conclusions The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development.
机译:背景技术为了从电子病历中提取有意义的信息,例如体征和症状,诊断和治疗,重要的是要考虑所识别信息的上下文属性:否定,暂时性和经验者。关于这些上下文属性的自动识别的大多数工作都是在英语临床文本上完成的。这项研究介绍了ContextD,英语ConText算法对荷兰语的改编以及荷兰语临床语料库。我们创建了一个荷兰临床语料库,其中包含四种类型的匿名临床文件:全科医生的病历,专家的来信,放射学报告和出院信。使用从统一医学语言系统中提取的荷兰语医学术语列表,我们确定了语料库中具有完全匹配的医学术语。所标识的术语带有否定,暂时性和经验者属性的注释。为了适应ConText算法,我们将英语触发词翻译为荷兰语,并添加了一些常规和特定于文档的增强功能,例如针对普通医生条目的否定规则和基于正则表达式的时间性模块。结果ContextD算法利用41个独特的触发器来识别临床语料库中的上下文属性。对于否定属性,对于不同的文档类型,该算法获得的F分数从87%到93%。对于体验者财产,F分数为99%至100%。对于时间属性的历史值和假设值,F分数分别为26%至54%和13%至44%。结论ContextD在识别所有荷兰临床文档类型的否定和经验者属性值方面显示出良好的性能。准确识别临时性被证明是困难的,需要进一步的工作。匿名和带注释的荷兰临床语料库可以作为进一步开发算法的有用资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号