首页> 外文期刊>Journal of chemical information and modeling >The Development of Target-Specific Machine Learning Models as Scoring Functions for Docking-Based Target Prediction
【24h】

The Development of Target-Specific Machine Learning Models as Scoring Functions for Docking-Based Target Prediction

机译:目标特定机器学习模型的开发作为基于对接的目标预测的评分功能

获取原文
获取原文并翻译 | 示例
           

摘要

The identification of possible targets for a known bioactive compound is of the utmost importance for drug design and development. Molecular docking is one possible approach for in-silico protein target prediction, whereas a molecule is docked into several different protein structures to identify potential targets. This reverse docking approach is hampered by the limitation of current scoring functions to correctly discriminate between targets and nontargets. In this work, a development of target-specific scoring functions is described that showed improved prediction performances for the correct target prediction of both actives and decoys on three validation data sets. In contrast to pure ligand-based approaches, that are in general faster and include a greater target space, docking-based approaches can cover also unknown chemical space that lies outside the known bioactivity data. These target-specific scoring functions are based on known bioactivity data retrieved from ChEMBL and supervised machine learning approaches. Neural Networks and Support Vector Machines (SVMs) models were trained for 20 different protein targets. Our protein–ligand interaction fingerprint PADIF (Protein Atom Score Contributions Derived Interaction Fingerprint) represents the input for training, whereas the PADIFs are calculated based on docking poses of active and inactive compounds. Different data sets of previously unseen molecules were used for the final evaluation and analysis of the prediction performance of the created models. For a single-target selectivity data set, the correct target model returns in most of the cases the highest probabilities scores for their active molecules and with statistically significant differences from the other targets. These probability scores were also predicted and successfully used to rank the targets for molecules of a multitarget data set with activity data described simultaneously for two, three, and four to seven protein targets.
机译:鉴定已知的生物活性化合物的可能靶标对于药物设计和发育至关重要。分子对接是硅蛋白靶预测中的一种可能的方法,而分子将停靠在几种不同的蛋白质结构中以识别潜在的靶标。通过限制当前评分函数来妨碍正确区分目标和非立方体的逆向对接方法是阻碍的。在这项工作中,描述了针对目标特定评分功能的发展,其显示了在三个验证数据集上的正确目标预测的改进的预测性能。与基于纯粹的配体的方法相比,这通常更快并且包括更大的目标空间,基于对接的方法可以覆盖在已知的生物活性数据之外的未知化学空间。这些特定于目标的评分功能基于从ChemBL和监督机器学习方法检索的已知生物活性数据。培训神经网络和支持向量机(SVMS)模型对于20种不同的蛋白质靶标。我们的蛋白质 - 配体相互作用指纹PADIF(蛋白质原子评分贡献导出的相互作用指纹)代表了训练的输入,而PADIFS是基于对接和无活性化合物的对接姿势计算的。先前未经检验分子的不同数据集用于最终评估和分析所产生模型的预测性能。对于单目标选择性数据集,在大多数情况下,正确的目标模型返回其活性分子的最高概率分数以及与其他目标的统计学上显着的差异。还预期这些概率评分并成功地用于将多元数据分子的分子对与同时描述的两种,三个和四到七种蛋白靶标的活动数据进行排序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号