首页> 外文期刊>BMC Bioinformatics >Predicting protein-ligand binding residues with deep convolutional neural networks
【24h】

Predicting protein-ligand binding residues with deep convolutional neural networks

机译:用深度卷积神经网络预测蛋白质-配体结合残基

获取原文
获取外文期刊封面目录资料

摘要

Ligand-binding proteins play key roles in many biological processes. Identification of protein-ligand binding residues is important in understanding the biological functions of proteins. Existing computational methods can be roughly categorized as sequence-based or 3D-structure-based methods. All these methods are based on traditional machine learning. In a series of binding residue prediction tasks, 3D-structure-based methods are widely superior to sequence-based methods. However, due to the great number of proteins with known amino acid sequences, sequence-based methods have considerable room for improvement with the development of deep learning. Therefore, prediction of protein-ligand binding residues with deep learning requires study. In this study, we propose a new sequence-based approach called DeepCSeqSite for ab initio protein-ligand binding residue prediction. DeepCSeqSite includes a standard edition and an enhanced edition. The classifier of DeepCSeqSite is based on a deep convolutional neural network. Several convolutional layers are stacked on top of each other to extract hierarchical features. The size of the effective context scope is expanded as the number of convolutional layers increases. The long-distance dependencies between residues can be captured by the large effective context scope, and stacking several layers enables the maximum length of dependencies to be precisely controlled. The extracted features are ultimately combined through one-by-one convolution kernels and softmax to predict whether the residues are binding residues. The state-of-the-art ligand-binding method COACH and some of its submethods are selected as baselines. The methods are tested on a set of 151 nonredundant proteins and three extended test sets. Experiments show that the improvement of the Matthews correlation coefficient (MCC) is no less than 0.05. In addition, a training data augmentation method that slightly improves the performance is discussed in this study. Without using any templates that include 3D-structure data, DeepCSeqSite significantlyoutperforms existing sequence-based and 3D-structure-based methods, including COACH. Augmentation of the training sets slightly improves the performance. The model, code and datasets are available at https://github.com/yfCuiFaith/DeepCSeqSite .
机译:配体结合蛋白在许多生物学过程中起着关键作用。蛋白质-配体结合残基的鉴定对于理解蛋白质的生物学功能很重要。现有的计算方法可以大致分为基于序列的方法或基于3D结构的方法。所有这些方法都基于传统的机器学习。在一系列结合残基预测任务中,基于3D结构的方法要远远优于基于序列的方法。但是,由于具有已知氨基酸序列的蛋白质数量众多,基于序列的方法随着深度学习的发展而具有很大的改进空间。因此,通过深度学习预测蛋白质-配体结合残基需要进行研究。在这项研究中,我们提出了一种新的基于序列的方法,称为DeepCSeqSite,用于从头算蛋白质-配体结合残基。 DeepCSeqSite包括标准版和增强版。 DeepCSeqSite的分类器基于深度卷积神经网络。几个卷积层彼此堆叠以提取分层特征。有效上下文范围的大小随卷积层数的增加而扩展。残差之间的长距离依赖关系可以通过较大的有效上下文范围捕获,并且堆叠多层可以精确控制最大的依赖关系长度。最终将提取的特征通过一对一的卷积核和softmax进行组合,以预测残基是否为结合残基。选择了最先进的配体结合方法COACH及其某些子方法作为基线。该方法在一组151种非冗余蛋白和三个扩展的测试组上进行了测试。实验表明,马修斯相关系数(MCC)的提高不小于0.05。此外,本研究还讨论了一种训练数据增强方法,该方法可以稍微改善性能。在不使用任何包含3D结构数据的模板的情况下,DeepCSeqSite的性能明显优于现有的基于序列和基于3D结构的方法,包括COACH。训练集的扩充会稍微改善性能。该模型,代码和数据集可从https://github.com/yfCuiFaith/DeepCSeqSite获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号