首页> 外文期刊>BMC Bioinformatics >Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature
【24h】

Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature

机译:从大规模的自由文本生物医学文献中比较知识驱动的方法与有监督的机器学习方法以大规模提取药物副作用的关系

获取原文
       

摘要

Background Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature. Data and methods For the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and s. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from s. Results On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning. Conclusion In summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.
机译:背景技术研究药物副作用(drug-SE)关联的系统方法正在成为药物靶标发现和药物重新定位的活跃研究领域。但是,不存在全面的药物-SE关联知识库。在这项研究中,我们提出了一种新颖的知识驱动(KD)方法,可从已出版的生物医学文献中有效提取大量的药物-SE对。数据和方法对于文本语料库,我们使用了21,354,075 MEDLINE记录(119,085,682句子)。首先,我们使用源自FDA药品标签的已知药物-SE关联作为先验知识,以自动查找与SE相关的句子和s。然后,我们从MEDLINE语句中总共提取了49,575对药物-SE对,从s中提取了180,454对。结果平均而言,KD方法的精度为0.335,召回率为0.509,F1为0.392,明显优于基于SVM的机器学习方法(精度:0.135,召回率:0.900,F1:0.233) F1分数提高了73.0%。通过综合分析,我们证明了较高水平的表型药物-SE关系反映了较低水平的遗传,基因组和化学药物机制。此外,我们显示提取的药物-SE对可直接用于药物重新定位。结论综上所述,我们自动构建了大规模的较高水平的药物表型关系知识,在计算药物发现方面具有巨大的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号