首页> 外文会议>SIGBioMed Workshop on Biomedical Language Processing >Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts
【24h】

Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts

机译:探索中国医学文本的词分割和医学概念识别

获取原文

摘要

Chinese word segmentation (CWS) and medical concept recognition are two fundamental tasks to process Chinese electronic medical records (EMRs) and play important roles in downstream tasks for understanding Chinese EMRs. One challenge to these tasks is the lack of medical domain datasets with high-quality annotations, especially medical-related tags that reveal the characteristics of Chinese EMRs. In this paper, we collected a Chinese EMR corpus, namely, ACEMR, with human annotations for Chinese word segmentation and EMR-related tags. On the ACEMR corpus, we run well-known models (i.e., BiLSTM,. BERT, and ZEN) and existing state-of-the-art systems (e.g., WMSeg and TwASP) for CWS and medical concept recognition. Experimental results demonstrate the necessity of building a dedicated medical dataset and show that models that leverage extra resources achieve the best performance for both tasks, which provides certain guidance for future studies on model selection in the medical domain.
机译:中文字分割(CWS)和医学概念认可是处理中国电子医疗记录(EMRS)的两个基本任务,并在下游任务中发挥重要作用,以了解中国EMR。 对这些任务的一个挑战是缺乏具有高质量注释的医疗域数据集,特别是医疗相关标签,揭示了中国EMR的特征。 在本文中,我们收集了中国EMR语料库,即ACEMR,具有用于中文字分和EMR相关标签的人为注释。 在ACEMR语料库上,我们经营着名的模型(即,Bilstm,Bert和Zen)以及用于CWS和医学概念识别的现有最先进的系统(例如,WMSEG和Twasp)。 实验结果表明,建立专用医疗数据集的必要性,并显示利用额外资源的模型来实现两项任务的最佳性能,这为未来的模型选择在医疗领域中的研究提供了一定的指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号