首页> 美国卫生研究院文献>International Journal of Environmental Research and Public Health >Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules
【2h】

Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules

机译:结合领域词典和规则改进中文电子病历的命名实体识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Electronic medical records are an integral part of medical texts. Entity recognition of electronic medical records has triggered many studies that propose many entity extraction methods. In this paper, an entity extraction model is proposed to extract entities from Chinese Electronic Medical Records (CEMR). In the input layer of the model, we use word embedding and dictionary features embedding as input vectors, where word embedding consists of a character representation and a word representation. Then, the input vectors are fed to the bidirectional long short-term memory to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags. We performed experiments on body classification task, and the F1 values reached 90.65%. We also performed experiments on anatomic region recognition task, and the F1 values reached 93.89%. On both tasks, our model had higher performance than state-of-the-art models, such as Bi-LSTM-CRF, Bi-LSTM-Attention, and Vote. Through experiments, our model has a good effect when dealing with small frequency entities and unknown entities; with a small training dataset, our method showed 2–4% improvement on F1 value compared to the basic Bi-LSTM-CRF models. Additionally, on anatomic region recognition task, besides using our proposed entity extraction model, 12 rules we designed and domain dictionary were adopted. Then, in this task, the weighted F1 value of the three specific entities extraction reached 84.36%.
机译:电子病历是医学文本的组成部分。电子病历的实体识别引发了许多研究,提出了许多实体提取方法。本文提出了一种实体提取模型,以从中国电子病历(CEMR)中提取实体。在模型的输入层中,我们使用词嵌入和字典特征嵌入作为输入向量,其中词嵌入由字符表示和词表示组成。然后,输入向量被馈送到双向长短期存储器以捕获上下文特征。最后,使用条件随机字段来捕获相邻标签之间的依赖关系。我们对人体分类任务进行了实验,F1值达到了90.65%。我们还进行了解剖区域识别任务的实验,F1值达到93.89%。在这两个任务上,我们的模型都具有比最新模型(如Bi-LSTM-CRF,Bi-LSTM-Attention和Vote)更高的性能。通过实验,我们的模型在处理小频率实体和未知实体时有很好的效果。与少量的训练数据集相比,我们的方法显示出与基本Bi-LSTM-CRF模型相比F1值提高了2-4%。另外,在解剖区域识别任务上,除了使用我们提出的实体提取模型外,我们还设计了12条规则和域字典。然后,在此任务中,提取的三个特定实体的加权F1值达到84.36%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号