首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task
【2h】

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task

机译:评估生物医学关系提取中的最新技术:BioCreative V化学疾病关系(CDR)任务概述

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Manually curating chemicals, diseases and their relationships is significantly important to biomedical research, but it is plagued by its high cost and the rapid growth of the biomedical literature. In recent years, there has been a growing interest in developing computational approaches for automatic chemical-disease relation (CDR) extraction. Despite these attempts, the lack of a comprehensive benchmarking dataset has limited the comparison of different techniques in order to assess and advance the current state-of-the-art. To this end, we organized a challenge task through BioCreative V to automatically extract CDRs from the literature. We designed two challenge tasks: disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. To assist system development and assessment, we created a large annotated text corpus that consisted of human annotations of chemicals, diseases and their interactions from 1500 PubMed articles. 34 teams worldwide participated in the CDR task: 16 (DNER) and 18 (CID). The best systems achieved an F-score of 86.46% for the DNER task—a result that approaches the human inter-annotator agreement (0.8875)—and an F-score of 57.03% for the CID task, the highest results ever reported for such tasks. When combining team results via machine learning, the ensemble system was able to further improve over the best team results by achieving 88.89% and 62.80% in F-score for the DNER and CID task, respectively. Additionally, another novel aspect of our evaluation is to test each participating system’s ability to return real-time results: the average response time for each team’s DNER and CID web service systems were 5.6 and 9.3 s, respectively. Most teams used hybrid systems for their submissions based on machining learning. Given the level of participation and results, we found our task to be successful in engaging the text-mining research community, producing a large annotated corpus and improving the results of automatic disease recognition and CDR extraction.>Database URL:
机译:手动管理化学物质,疾病及其关系对生物医学研究非常重要,但是它的高成本和生物医学文献的迅速发展困扰着它。近年来,人们对开发用于自动化学疾病关联(CDR)提取的计算方法的兴趣日益增长。尽管进行了这些尝试,但是缺乏全面的基准测试数据集限制了不同技术的比较,以便评估和改进当前的最新技术水平。为此,我们通过BioCreative V组织了一项挑战任务,以自动从文献中提取CDR。我们设计了两个挑战性任务:疾病命名实体识别(DNER)和化学诱导疾病(CID)关系提取。为了协助系统开发和评估,我们创建了一个带有注释的大型文本语料库,该数据集由1500种PubMed文章中有关化学物质,疾病及其相互作用的人类注释组成。全球34个团队参加了CDR任务:16(DNER)和18(CID)。最好的系统在DNER任务中的F分数达到86.46%(接近人类注释者之间的协议(0.8875)),在CID任务中的F分数达到57.03%,这是有史以来最高的结果任务。通过机器学习来组合团队结果时,集成系统能够通过分别为DNER和CID任务获得F分数的88.89%和62.80%来进一步改善最佳团队结果。此外,我们评估的另一个新颖之处在于测试每个参与系统返回实时结果的能力:每个团队的DNER和CID网络服务系统的平均响应时间分别为5.6和9.3秒。大多数团队基于加工学习将混合系统用于其提交。考虑到参与程度和结果,我们发现我们的任务成功地吸引了文本挖掘研究社区的参与,产生了一个大型带注释的语料库,并改善了疾病自动识别和CDR提取的结果。>数据库URL:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号