首页> 外文期刊>Bioinformatics >AutoBind: automatic extraction of protein-ligand-binding affinity data from biological literature
【24h】

AutoBind: automatic extraction of protein-ligand-binding affinity data from biological literature

机译:AutoBind:从生物学文献中自动提取蛋白质-配体结合亲和力数据

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Determination of the binding affinity of a proteinlig- and complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. Besides, collection of comprehensive datasets for protein-ligand complexes and their corresponding binding affinities is crucial in developing accurate scoring functions for the prediction of the binding affinities of previously unknown protein-ligand complexes. In the past decades, several databases of protein-ligand-binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection. Result: We have created a new database of protein-ligand-binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22% precision and 79.07% recall on the testing corpus. Currently, 13 616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17 221 articles.
机译:动机:确定蛋白质与复合物的结合亲和力对于定量确定特定小分子是否会结合靶蛋白很重要。此外,收集蛋白质-配体复合物及其对应结合亲和力的全面数据集对于开发准确的评分功能,以预测先前未知的蛋白质-配体复合物的结合亲和力至关重要。在过去的几十年中,通过从文献中目视提取,建立了多个蛋白质-配体结合亲和力数据库。但是,这种方法很耗时,而且大多数数据库每年仅更新几次。因此,迫切需要用于结合亲和力收集的高精度的自动提取方法。结果:我们基于自动信息检索,创建了一个新的蛋白质-配体结合亲和力数据数据库AutoBind。我们首先汇编了1586篇文章的集合,其中已手动标记了绑定亲和力。基于此带注释的集合,我们设计了四个句子模式,这些模式用于扫描全文文章,以及一种计分功能以对与我们的模式匹配的句子进行排名。所提出的句子模式可以有效地识别全文文章中的绑定亲和力。我们的评估表明,AutoBind在测试语料库上实现了84.22%的精度和79.07%的召回率。目前,已有13 221篇文章在AutoBind中保存了13 616种蛋白质-配体复合物和相应的结合亲和力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号