首页> 外文期刊>Journal of chemical information and modeling >Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction-Using Deep Learning
【24h】

Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction-Using Deep Learning

机译:基于序列的绑定预测 - 使用深度学习评估交叉验证策略

获取原文
获取原文并翻译 | 示例
           

摘要

Binding prediction between targets and drug-like compounds through deep neural networks has generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how different cross-validation strategies applied to data from different molecular databases affect to the performance of binding prediction proteochemometrics models. These strategies are (1) random splitting, (2) splitting based on K-means clustering (both of actives and inactives), (3) splitting based on source database, and (4) splitting based both in the clustering and in the source database. These schemas are applied to a deep learning proteochemometrics model and to a simple logistic regression model to be used as baseline. Additionally, two different ways of describing molecules in the model are tested: (1) by their SMILES and (2) by three fingerprints. The classification performance of our deep learning-based proteochemometrics model is comparable to the state of the art. Our results show that the lack of generalization of these models is due to a bias in public molecular databases and that a restrictive cross-validation schema based on compound clustering leads to worse but more robust and credible results. Our results also show better performance when representing molecules by their fingerprints.
机译:通过深神经网络之间的目标和药物样化合物之间的结合预测产生了近年来的有希望的结果,优于基于机器学习的方法。但是,这些分类模型的泛化能力仍然是要解决的问题。在这项工作中,我们探讨了应用于来自不同分子数据库的数据的不同交叉验证策略影响绑定预测蛋白化学计量器模型的性能。这些策略是(1)随机分裂,(2)基于K-means聚类(ActiveS和渎比的含量)分裂,(3)基于源数据库的分裂,(4)基于群集和源中的拆分数据库。这些模式应用于深度学习蛋白化学计量器模型,并将简单的逻辑回归模型用作用作基线。另外,测试模型中分子的两种不同方式进行测试:(1)通过它们的微笑和(2)通过三个指纹。基于深度学习的Proteochemetics模型的分类性能与现有技术相当。我们的研究结果表明,这些模型的概率缺乏是由于公共分子数据库的偏差,并且基于复合聚类的限制交叉验证模式导致更糟糕但更强大,并且可信的结果。我们的结果还通过指纹代表分子时表现出更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号