首页> 外文期刊>Database >An overview of the BioCreative 2012 Workshop Track III: interactive text mining task
【24h】

An overview of the BioCreative 2012 Workshop Track III: interactive text mining task

机译:BioCreative 2012 Workshop Track III概述:交互式文本挖掘任务

获取原文
           

摘要

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (~1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.
机译:在许多数据库中,生物管理主要涉及文献管理,这通常涉及检索相关文章,提取将转换为注释的信息以及识别新传入的文献。随着生物学文献数量的增加,使用文本挖掘来辅助生物固化变得越来越重要。许多小组已经从计算机科学/语言学的角度开发了用于文本挖掘的工具,并且有许多计划从文献中对生物学的某些方面进行管理。一些生物固化工作已经利用了文本挖掘工具,但是还没有很多基础广泛的系统性研究来研究文本挖掘工具的哪些方面有助于其对管理任务的实用性。在这里,我们报告了将文本挖掘工具开发人员和数据库生物管理员聚集在一起以测试工具的实用性和可用性的工作。六个代表各种生物固化任务的文本挖掘系统参加了正式评估,并招募了合适的生物固化剂进行测试。该评估的性能结果表明,某些系统能够通过比手动管理显着加快(约1.7到2.5倍)的固化任务来提高固化效率。此外,与手动策展集上的性能相比,某些系统能够提高注释的准确性。就批注者之间的一致而言,导致某些系统出现显着差异的因素包括生物管理员在给定策展任务方面的专业知识,策画固有的难度以及对批注准则的关注。完成任务后,要求注释者完成一项调查,以帮助确定各种系统的优缺点。这项调查的分析突出显示了任务完成对生物固化员对系统的整体体验的重要性,而不管系统在设计,易学性和可用性上的得分如何。此外,在此过程中,还分析了一些策略,这些策略用于完善注释准则和系统文档,使工具适应最终用户的需求和查询类型,以及在效率,用户界面,结果导出和传统评估指标方面评估性能。这个任务。该分析将有助于计划在BioCreative IV中进行更深入的研究。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号