首页> 外文会议>Joint workshop on unsupervised and semi-supervised learning in NLP 2012 >Improving Distantly Supervised Extraction of Drug-Drug and Protein-Protein Interactions
【24h】

Improving Distantly Supervised Extraction of Drug-Drug and Protein-Protein Interactions

机译:改善对药物和蛋白质-蛋白质相互作用的监督提取

获取原文
获取原文并翻译 | 示例

摘要

Relation extraction is frequently and successfully addressed by machine learning methods. The downside of this approach is the need for annotated training data, typically generated in tedious manual, cost intensive work. Distantly supervised approaches make use of weakly annotated data, like automatically annotated corpora. Recent work in the biomedical domain has applied distant supervision for protein-protein interaction (PPI) with reasonable results making use of the IntAct database. Such data is typically noisy and heuristics to filter the data are commonly applied. We propose a constraint to increase the quality of data used for training based on the assumption that no self-interaction of real-world objects are described in sentences. In addition, we make use of the University of Kansas Proteomics Service (KUPS) database. These two steps show an increase of 7 percentage points (pp) for the PPI corpus AIMed. We demonstrate the broad applicability of our approach by using the same workflow for the analysis of drug-drug interactions, utilizing relationships available from the drug database DrugBank. We achieve 37.31 % in F_1 measure without manually annotated training data on an independent test set.
机译:关系提取通常通过机器学习方法来成功解决。这种方法的缺点是需要注释的培训数据,这些数据通常是在繁琐的人工,成本密集型工作中生成的。远程监督的方法利用了弱注释的数据,例如自动注释的语料库。生物医学领域的最新工作已利用IntAct数据库对蛋白质-蛋白质相互作用(PPI)应用了远程监督,并获得了合理的结果。这样的数据通常是嘈杂的,并且通常采用启发式方法来过滤数据。我们基于句子中没有描述现实世界对象的自交互的假设,提出了一种提高训练数据质量的约束。此外,我们利用了堪萨斯大学蛋白质组学服务(KUPS)数据库。这两个步骤显示了AIMed的PPI语料库提高了7个百分点(pp)。我们通过使用相同的工作流程来分析药物与药物之间的相互作用,并利用药物数据库DrugBank中的可用关系,证明了我们方法的广泛适用性。如果不使用独立测试集上的人工注释训练数据,则F_1度量值可达到37.31%。

著录项

  • 来源
  • 会议地点 Avignon(FR)
  • 作者单位

    Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Schloss Birlinghoven 53754 Sankt Augustin Germany,Bonn-Aachen Center for Information Technology Dahlmannstrasse 2 53113 Bonn Germany;

    Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Schloss Birlinghoven 53754 Sankt Augustin Germany;

    Computer Science Institut Humboldt-Universitaet Unter den Linden 6 10099 Berlin Germany;

    Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Schloss Birlinghoven 53754 Sankt Augustin Germany,Bonn-Aachen Center for Information Technology Dahlmannstrasse 2 53113 Bonn Germany;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号