首页> 外文期刊>Journal of biomedical informatics. >Automatic recognition of disorders, findings, Pharmaceuticals and body structures from clinical text: An annotation and machine learning study
【24h】

Automatic recognition of disorders, findings, Pharmaceuticals and body structures from clinical text: An annotation and machine learning study

机译:从临床文本自动识别疾病,发现,药物和身体结构:注释和机器学习研究

获取原文
获取原文并翻译 | 示例
       

摘要

Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder + Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that the approaches used for English are also suitable for Swedish clinical text. However, a small proportion of the errors made by the model are less likely to occur in English text, showing that results might be improved by further tailoring the system to clinical Swedish. The entity recognition results for the individual entities Disorder and Finding show that it is meaningful to separate the general category Medical Problem into these two more granular entity types, e.g. for knowledge mining of co-morbidity relations and disorder-finding relations.
机译:在健康记录的叙述文本中自动识别临床实体对于构建用于患者护理文档记录的应用程序以及以医学知识提取的形式进行二次使用非常有用。有许多关于英语临床文本的命名实体识别研究,但对其他语言的临床文本进行的工作较少。这项研究是根据瑞典的健康记录进行的,重点关注与构建患者概况和产生医学假设极为相关的四个实体,即疾病,发现,药物和身体结构。这项研究有两个目的:探索先前应用于英语临床文本的命名实体识别方法在瑞典语相似文本上的表现如何;并评估将先前在许多研究中使用过的更笼统的类别“医学问题”划分为两个更详细的实体(“疾病”和“发现”)是否有意义。瑞典内科急诊室的临床注释被注释为四个选定的实体类别,并且测量了两对注释者之间的注释者之间的一致性,得出的平均F分数为:紊乱0.79,发现0.66,发现0.90。药物,身体结构为0.80。此后,将开发的语料库的子集用于找到合适的特征以训练条件随机场模型。最终,使用最佳功能和设置在此子集上训练了一个新模型,并评估了该模型泛化为支持的数据的能力。该最终模型的F分数对于障碍而言为0.81,对于发现而言为0.69,对于药物而言为0.88,对于身体结构而言为0.85,对于疾病与发现的组合类别而言为0.78。获得的结果与对英语临床文本进行的类似研究相一致或略低,其中许多是使用较大的训练数据集进行的,结果表明,用于英语的方法也适用于瑞典临床文本。但是,该模型所产生的一小部分错误不太可能出现在英文文本中,这表明通过进一步根据临床瑞典语定制该系统可以改善结果。单个实体“疾病和发现”的实体识别结果表明,将一般类别的“医学问题”划分为这两个更详细的实体类型(例如,用于并发关系和发现障碍关系的知识挖掘。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号