首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine
【2h】

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine

机译:BioCreative VI精密医学专栏概述:挖掘蛋白质相互作用和精密医学的突变

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein–protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.
机译:Precision Medicine Initiative是一项多中心的工作,旨在根据个体患者数据(临床,基因组序列和功能基因组数据)以及整合了基因组注释,疾病关联研究,电子健康的大型知识库(KBs)中的信息制定个性化治疗方案记录和其他数据类型。生物医学文献为填充这些KB,报告遗传和分子相互作用提供了丰富的基础,这些相互作用为细胞调节系统提供了支架,并详细说明了遗传变异在这些相互作用中的影响。 BioCreative VI Precision Medicine Track的目的是提取这种特殊类型的信息,并分为两个任务:(i)文件分类任务,重点是确定包含经过实验验证的受基因突变影响的蛋白质间相互作用的科学文献,以及(ii)关系提取任务,重点是提取受影响的相互作用(蛋白质对)。为了帮助系统开发人员和任务参与者,为此任务手动注释了一个大型PubMed文档集。全球有10个团队为文档分类任务提供了22种不同的文本挖掘模型,全球有6个团队为关系提取任务提供了14种不同的文本挖掘系统。将文本挖掘系统的预测结果与人工注释进行比较时,对于分类任务而言,最佳F得分为69.06%,最佳精度为62.89%,最佳召回率为98.0%,最佳平均精度为72.5%。对于关系提取任务,当考虑同源基因时,最佳F分数为37.73%,最佳精度为46.5%,最佳召回率为54.1%。提交的系统探索了广泛的方法,从传统的基于规则的,统计和机器学习系统到最新的深度学习方法。考虑到参与程度和个人团队的成果,我们发现精密医学轨道成功地吸引了文本挖掘研究社区。同时,这条赛道产生了由BioGRID策展人开发并与精密医学有关的5509 PubMed文档的人工注释语料库。该数据集可供社区免费使用,并且特定的交互已集成到BioGRID数据集中。此外,这项挑战提供了自动识别描述受突变影响的PPI的PubMed文章以及从这些文章中提取受影响关系的第一个结果。但是,要使计算机辅助的精确医学文本挖掘成为主流,还需要取得很大的进步。未来的工作应侧重于解决剩余的技术挑战,并将文本挖掘工具的实际好处纳入现实世界中与精确医学信息相关的策展中。

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号