首页> 外文会议>International conference of the CLEF Association >Concept Recognition in French Biomedical Text Using Automatic Translation
【24h】

Concept Recognition in French Biomedical Text Using Automatic Translation

机译:使用自动翻译的法语生物医学文本中的概念识别

获取原文

摘要

We describe the development of a concept recognition system for French documents and its application in task 1b of the 2015 CLEF eHealth challenge. This community challenge included recognition of entities in a French medical corpus, normalization of the recognized entities, and normalization of entity mentions that had been manually annotated. Normalization had to be based on the Unified Medical Language System (UMLS). We addressed all three subtasks by a dictionary-based approach using Peregrine, our open-source indexing engine. To increase the coverage of our initial French terminology, we explored the use of two automatic translators, Google Translate and Microsoft Translator, to translate English UMLS terms into French. The corpus consisted of 1665 titles of French Medline abstracts and 6 French drug labels of the European Medicines Agency (EMEA). The corpus was manually annotated with concepts from the UMLS, and split in an equally-sized training and test set. The best performance on the training set was obtained with a terminology that contained the intersection of the translated terms in combination with several post-processing steps to reduce the number of false-positive detections. When evaluated on the test set, our system achieved F-scores of 0.756 and 0.665 for entity recognition on the EMEA documents and Medline titles, respectively. For subsequent entity normalization, the F-scores were 0.711 and 0.587. Entity normalization given the manually annotated entity mentions resulted in F-scores of 0.872 and 0.671. Our system obtained the highest F-scores among the systems that participated in the challenge.
机译:我们描述了针对法国文件的概念识别系统的开发及其在2015 CLEF eHealth挑战任务1b中的应用。社区挑战包括对法国医疗语料库中实体的识别,对所识别实体的规范化以及对手动注释的实体提及的规范化。标准化必须基于统一医学语言系统(UMLS)。我们使用开源索引引擎Peregrine通过基于字典的方法解决了这三个子任务。为了扩大我们最初的法语术语的覆盖面,我们探索了使用两个自动翻译器Google Translate和Microsoft Translator将英语UMLS术语翻译成法语。语料库由1665个法国Medline摘要标题和欧洲药品管理局(EMEA)的6个法国药品标签组成。语料库由UMLS中的概念手动注释,并分成相等大小的培训和测试集。使用包含翻译后的术语的交集以及几个后处理步骤以减少假阳性检测次数的术语,可以获得训练集上的最佳性能。在测试集上进行评估时,我们的系统在EMEA文件和Medline标题上的实体识别分别获得0.756和0.665的F分数。对于后续实体规范化,F得分是0.711和0.587。给定手动注释的实体,实体归一化导致F分数为0.872和0.671。在参加挑战赛的系统中,我们的系统获得了最高的F分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号