首页> 外文期刊>BMC Bioinformatics >Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases
【24h】

Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases

机译:通过BioGRID和MINT交互数据库对2010年BioCreative Challenge III文本挖掘竞赛进行基准测试

获取原文
           

摘要

BackgroundThe vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data.ResultsThe Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms.ConclusionThe development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.
机译:背景技术在主要的生物医学文献中发表的大量数据代表了对单个数据元素的自动提取和编码的挑战。仅依靠专业策展人手动提取的生物学数据库无法全面注释散布在整个生物医学文献中的信息。基于自然语言处理(NLP)系统的有效工具的开发对于选择相关出版物,识别数据属性和部分自动注释至关重要。 Biocreative 2010 Challenge III的任务之一是致力于评估NLP系统的开发,该系统用于识别用于蛋白质和蛋白质相互作用(PPI)数据的策画和提取的物品。结果Biocreative 2010竞赛解决了三个任务:基因标准化,文章分类和交互方法识别。 BioGRID和MINT蛋白质相互作用数据库都参与了用于基因归一化的测试发布集的生成,为文章分类标注了开发和测试集,并为相互作用方法分类制定了测试集。这些测试数据集成为评估数据提取算法的黄金标准。结论开发有效的PPI数据提取工具是实现生物医学文献全面整理的必要步骤。首先,NLP系统可以通过完善包含PPI数据的候选出版物列表来促进专家策划。更雄心勃勃的是,NLP方法可能能够直接从全文文章中提取相关信息,以供专业策展人快速检查。生物数据库与NLP系统开发人员之间的紧密合作将继续促进这两个学科的长期目标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号