首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
【2h】

Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering

机译:使用句子结构分析和特征工程从文献中提取化学-蛋白质相互作用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this study, we propose a computational method to automatically extract chemical–protein interactions (CPIs) from a given text. Our method extracts CPI pairs and CPI triplets from sentences, where a CPI pair consists of a chemical compound and a protein name, and a CPI triplet consists of a CPI pair along with an interaction word describing their relationship. We extracted a diverse set of features from sentences that were used to build multiple machine learning models. Our models contain both simple features, which can be directly computed from sentences, and more sophisticated features derived using sentence structure analysis techniques. For example, one set of features was extracted based on the shortest paths between the CPI pairs or among the CPI triplets in the dependency graphs obtained from sentence parsing. We designed a three-stage approach to predict the multiple categories of CPIs. Our method performed the best among systems that use non-deep learning methods and outperformed several deep-learning-based systems in the track 5 of the BioCreative VI challenge. The features we designed in this study are informative and can be applied to other machine learning methods including deep learning.
机译:有关化合物与蛋白质之间相互作用的信息对于理解生物过程的调节和治疗药物的开发是必不可少的。从生物医学文献中手动提取此类信息非常耗时和资源。在这项研究中,我们提出了一种计算方法,可以从给定的文本中自动提取化学-蛋白质相互作用(CPI)。我们的方法从句子中提取CPI对和CPI三联体,其中CPI对由化合物和蛋白质名称组成,而CPI三元组由CPI对以及描述它们之间关系的交互词组成。我们从用于构建多个机器学习模型的句子中提取了多种功能。我们的模型既包含可直接从句子直接计算的简单特征,又包含使用句子结构分析技术得出的更复杂的特征。例如,根据从句子解析获得的依存关系图中的CPI对之间或CPI三胞胎之间的最短路径,提取一组特征。我们设计了一种三阶段方法来预测CPI的多个类别。在使用非深度学习方法的系统中,我们的方法表现最好,并且在BioCreative VI挑战的第5条轨道上,其性能优于几个基于深度学习的系统。我们在此研究中设计的功能是有益的,可以应用于包括深度学习在内的其他机器学习方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号