首页> 外文会议>International conference of the CLEF initiative >Entity Recognition in Parallel Multi-lingual Biomedical Corpora: The CLEF-ER Laboratory Overview
【24h】

Entity Recognition in Parallel Multi-lingual Biomedical Corpora: The CLEF-ER Laboratory Overview

机译:并行多语言生物医学基层实体识别:CLEF-ER实验室概述

获取原文

摘要

The identification and normalisation of biomedical entities from the scientific literature has a long tradition and a number of challenges have contributed to the development of reliable solutions. Increasingly patient records are processed to align their content with other biomedical data resources, but this approach requires analysing documents in different languages across Europe . The CLEF-ER challenge has been organized by the Mantra project partners to improve entity recognition (ER) in multilingual documents. Several corpora in different languages, i.e. Medline titles, EMEA documents and patent claims, have been prepared to enable ER in parallel documents. The participants have been ask to annotate entity mentions with concept unique identifiers (GUIs) in the documents of their preferred non-English language. The evaluation determines the number of correctly identified entity mentions against a silver standard (Task A) and the performance measures for the identification of CUIs in the non-English corpora. The participants could make use of the prepared terminological resources for entity normalisation and of the English silver standard corpora (SSCs) as input for concept candidates in the non-English documents. The participants used different approaches including translation techniques and word or phrase alignments apart from lexical lookup and other text mining techniques. The performances for task A and B was lower for the patent corpus in comparison to Medline titles and EMEA documents. In the patent documents, chemical entities were identified at higher performance, whereas the other two document types cover a higher portion of medical terms. The number of novel terms provided from all corpora is currently under investigation. Altogether, the CLEF-ER challenge demonstrates the performances of annotation solutions in different languages against an SSC.
机译:来自科学文学的生物医学实体的识别和规范化具有悠久的传统,若干挑战导致了可靠的解决方案的发展。越来越多的患者记录被处理以使其内容与其他生物医学数据资源保持一致,但这种方法需要在欧洲的不同语言分析文档。 Mantra项目合作伙伴组织了Clef-ER挑战,以改善多种语言文件中的实体识别(ER)。已经准备好的不同语言的几种语言,即Medline冠军,EMEA文件和专利权利要求,以使并行文件中的ER。参与者已要求在他们首选的非英语文档中向概念唯一标识符(GUI)注释委托实体提出。评估确定了针对银标准(任务A)的正确确定的实体提到的数量以及在非英文语料库中识别鉴定的性能措施。参与者可以利用实体​​归一化的准备术语资源和英语银牌标准Corpora(SSC)作为非英语文件中概念候选人的投入。与会者使用不同的方法,包括翻译技术和词语或短语对齐,除了词汇查找和其他文本挖掘技术。与Medline标题和EMEA文件相比,专利语料库的任务A和B的性能降低。在专利文献中,化学实体以更高的性能确定,而另外两种文件类型涵盖了更高的医学术语。所有基层提供的新颖术语的数量目前正在调查中。完全,Clef-ER挑战表明,以不同语言对SSC的注释解决方案的表演。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号