...
首页> 外文期刊>Journal of chemical information and modeling >Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach
【24h】

Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach

机译:基于配体效率的训练可改善机器学习方法中配体和药物靶蛋白生物活性的预测

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Machine learning methods based on ligand-protein interaction data in bioactivity databases are one of the current strategies for efficiently finding novel lead compounds as the first step in the drug discovery process. Although previous machine learning studies have succeeded in predicting novel ligand-protein interactions with high performance, all of the previous studies to date have been heavily dependent on the simple use of raw bioactivity data of ligand potencies measured by IC_(50), EC_(50), K_i, and K_d deposited in databases. ChEMBL provides us with a unique opportunity to investigate whether a machine-learning-based classifier created by reflecting ligand efficiency other than the IC_(50), EC_(50), K_i, and K_d values can also offer high predictive performance. Here we report that classifiers created from training data based on ligand efficiency show higher performance than those from data based on IC _(50) or K_i values. Utilizing GPCRSARfari and KinaseSARfari databases in ChEMBL, we created IC_(50)- or K_i-based training data and binding efficiency index (BEI) based training data then constructed classifiers using support vector machines (SVMs). The SVM classifiers from the BEI-based training data showed slightly higher area under curve (AUC), accuracy, sensitivity, and specificity in the cross-validation tests. Application of the classifiers to the validation data demonstrated that the AUCs and specificities of the BEI-based classifiers dramatically increased in comparison with the IC_(50)- or K_i-based classifiers. The improvement of the predictive power by the BEI-based classifiers can be attributed to (i) the more separated distributions of positives and negatives, (ii) the higher diversity of negatives in the BEI-based training data in a feature space of SVMs, and (iii) a more balanced number of positives and negatives in the BEI-based training data. These results strongly suggest that training data based on ligand efficiency as well as data based on classical IC_(50), EC_(50), K_d, and K_i values are important when creating a classifier using a machine learning approach based on bioactivity data.
机译:基于生物活性数据库中配体-蛋白质相互作用数据的机器学习方法是有效发现新型先导化合物作为药物发现过程的第一步的当前策略之一。尽管先前的机器学习研究已成功预测出高性能的新型配体-蛋白质相互作用,但迄今为止,所有先前的研究都严重依赖于简单使用IC_(50),EC_(50 ),K_i和K_d存放在数据库中。 ChEMBL为我们提供了一个独特的机会来研究通过反映除IC_(50),EC_(50),K_i和K_d值以外的配体效率而创建的基于机器学习的分类器是否也可以提供较高的预测性能。在这里我们报告说,根据基于配体效率的训练数据创建的分类器显示出比基于IC_(50)或K_i值的数据更高的性能。利用ChEMBL中的GPCRSARfari和KinaseSARfari数据库,我们创建了基于IC_(50)或K_i的训练数据和基于绑定效率指数(BEI)的训练数据,然后使用支持向量机(SVM)构造了分类器。来自基于BEI的训练数据的SVM分类器在交叉验证测试中显示出略高的曲线下面积(AUC),准确性,敏感性和特异性。将分类器应用于验证数据表明,与基于IC_(50)或K_i的分类器相比,基于BEI的分类器的AUC和特异性显着提高。基于BEI的分类器对预测能力的提高可归因于(i)正负分布更加分离,(ii)SVM的特征空间中基于BEI的训练数据中负的多样性更高, (iii)在基于BEI的培训数据中,正面和负面的数量更加均衡。这些结果强烈表明,基于配体效率的训练数据以及基于经典IC_(50),EC_(50),K_d和K_i值的数据在使用基于生物活性数据的机器学习方法创建分类器时非常重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号