首页> 外文期刊>BMC Bioinformatics >A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools
【24h】

A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

机译:全文期刊文章集是一种强大的评估工具,可揭示生物医学自然语言处理工具的性能差异

获取原文
           

摘要

Background We introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus. Results Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data. Conclusions The finding that some systems were able to train high-performing models based on this corpus is additional evidence, beyond high inter-annotator agreement, that the quality of the CRAFT corpus is high. The overall poor performance of various systems indicates that considerable work needs to be done to enable natural language processing systems to work well when the input is full-text journal articles. The CRAFT corpus provides a valuable resource to the biomedical natural language processing community for evaluation and training of new models for biomedical full text publications.
机译:背景我们介绍了97种全文本生物医学出版物的语料库,即科罗拉多州丰富注释全文(CRAFT)语料库。我们进一步评估了在该语料库上执行句子拆分,标记化,句法分析和命名实体识别的现有工具的性能。结果当使用公开可用的模型或规则集进行测试时,许多生物医学自然语言处理系统证明其先前发布的结果与它们在CRAFT语料库上的性能之间存在很大差异。可训练的系统根据此数据构建高性能模型的能力差异很大。结论除了在注释者之间的高度同意外,发现某些系统能够基于该语料库训练高性能模型的发现是CRAFT语料库质量很高的补充证据。各种系统的整体性能不佳表明,当输入的是全文期刊文章时,需要做大量工作才能使自然语言处理系统正常工作。 CRAFT语料库为生物医学自然语言处理社区提供了宝贵资源,用于评估和培训生物医学全文出版物的新模型。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号