...
【24h】

CALBC SILVER STANDARD CORPUS

机译:CALBC银标准公司

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The CALBC initiative aims to provide a large-scale biomedical text corpus that containssemantic annotations for named entities of different kinds. The generation of this corpusrequires that the annotations from different automatic annotation systems be harmo-nized. In the first phase, the annotation systems from five participants (EMBL-EBI,EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All anno-tations were delivered in a common annotation format that included concept identifiersin the boundary assignments and that enabled comparison and alignment of the results.During the harmonization phase, the results produced from those different systems wereintegrated in a single harmonized corpus ("silver standard" corpus) by applying a vot-ing scheme. We give an overview of the processed data and the principles of harmo-nization — formal boundary reconciliation and semantic matching of named entities.Finally, all submissions of the participants were evaluated against that silver standardcorpus. We found that species and disease annotations are better standardized amongstthe partners than the annotations of genes and proteins. The raw corpus is now availablefor additional named entity annotations. Parts of it will be made available later on fora public challenge. We expect that we can improve corpus building activities both interms of the numbers of named entity classes being covered, as well as the size of thecorpus in terms of annotated documents.
机译:CALBC计划旨在提供大规模的生物医学文本语料库,其中包含针对不同种类的命名实体的语义注释。该语料库的生成需要协调来自不同自动注释系统的注释。在第一阶段,收集了来自五个参与者(EMBL-EBI,EMC鹿特丹,NLM,JULIE Lab Jena和Linguamatics)的注释系统。所有注释均以通用注释格式提供,该注释格式在边界分配中包括概念标识符,并且可以对结果进行比较和对齐。在协调阶段,将从那些不同系统产生的结果集成到一个统一的语料库中(“银标准” ”的语料库)。我们概述了处理后的数据以及协调化的原则-正式的边界和解和命名实体的语义匹配。最后,所有参与者的提交内容都根据该银标准语料库进行了评估。我们发现,伴侣之间的物种和疾病注释比基因和蛋白质注释更好地标准化。现在,原始语料库可用于其他命名实体注释。稍后将在公开挑战赛中提供部分内容。我们期望我们可以在涉及的命名实体类数量以及注释文档的规模方面改善语料库构建活动。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号