首页> 外文期刊>BMC Bioinformatics >Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins
【24h】

Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins

机译:机器学习评分功能,用于识别与已知和新型蛋白质对接的配体的天然姿势

获取原文
           

摘要

Background Molecular docking is a widely-employed method in structure-based drug design. An essential component of molecular docking programs is a scoring function (SF) that can be used to identify the most stable binding pose of a ligand, when bound to a receptor protein, from among a large set of candidate poses. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited docking power (or ability to successfully identify the correct pose) has been a major impediment to cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with physicochemical and geometrical features characterizing protein-ligand complexes to predict the native or near-native pose of a ligand docked to a receptor protein's binding site. We assess the docking accuracies of these new ML SFs as well as those of conventional SFs in the context of the 2007 PDBbind benchmark dataset on both diverse and homogeneous (protein-family-specific) test sets. Further, we perform a systematic analysis of the performance of the proposed SFs in identifying native poses of ligands that are docked to novel protein targets. Results and conclusion We find that the best performing ML SF has a success rate of 80% in identifying poses that are within 1 ? root-mean-square deviation from the native poses of 65 different protein families. This is in comparison to a success rate of only 70% achieved by the best conventional SF, ASP, employed in the commercial docking software GOLD. In addition, the proposed ML SFs perform better on novel proteins that they were never trained on before. We also observed steady gains in the performance of these scoring functions as the training set size and number of features were increased by considering more protein-ligand complexes and/or more computationally-generated poses for each complex.
机译:背景技术分子对接是基于结构的药物设计中广泛使用的方法。分子对接程序的基本组成部分是评分功能(SF),可用于从大量候选姿势中识别与受体蛋白结合时配体最稳定的结合姿势。尽管在开发基于力场,知识或经验的常规SF方面付出了巨大的努力,但它们有限的对接能力(或成功识别正确姿势的能力)已成为成本效益好的药物开发的主要障碍。因此,在这项工作中,我们探索了一系列采用不同机器学习(ML)方法并结合表征蛋白质-配体复合物的理化和几何特征的新型SF,以预测与受体对接的配体的天然或近天然姿势蛋白的结合位点。我们在2007 PDBbind基准数据集的背景下,评估了这些新的ML SF以及常规SF的对接精度,该对接精度既适用于多样化测试,也适用于同类(特定于蛋白质家族)测试集。此外,我们对建议的SF的性能进行系统的分析,以识别与新蛋白靶标对接的配体的天然位姿。结果与结论我们发现,表现最好的ML SF在识别1≤1的姿势时具有80%的成功率。与65个不同蛋白质家族的固有姿势的均方根偏差。相比之下,商用对接软件GOLD中采用的最佳传统SF,ASP的成功率仅为70%。此外,拟议的ML SFs在以前从未训练过的新型蛋白质上表现更好。我们还观察到了这些评分功能的性能稳定增长,因为通过考虑更多的蛋白质-配体复合物和/或每种复合物的更多计算生成的姿势来增加训练集的大小和特征数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号