首页> 外文期刊>Journal of chemical information and modeling >Evaluating Scalable Uncertainty Estimation Methods for Deep Learning-Based Molecular Property Prediction
【24h】

Evaluating Scalable Uncertainty Estimation Methods for Deep Learning-Based Molecular Property Prediction

机译:评估基于深度学习的分子特性预测的可扩展不确定性估计方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Advances in deep neural network (DNN)-based molecular property prediction have recently led to the development of models of remarkable accuracy and generalization ability, with graph convolutional neural networks (GCNNs) reporting state-of-the-art performance for this task. However, some challenges remain, and one of the most important that needs to be fully addressed concerns uncertainty quantification. DNN performance is affected by the volume and the quality of the training samples. Therefore, establishing when and to what extent a prediction can be considered reliable is just as important as outputting accurate predictions, especially when out-of-domain molecules are targeted. Recently, several methods to account for uncertainty in DNNs have been proposed, most of which are based on approximate Bayesian inference. Among these, only a few scale to the large data sets required in applications. Evaluating and comparing these methods has recently attracted great interest, but results are generally fragmented and absent for molecular property prediction. In this paper, we quantitatively compare scalable techniques for uncertainty estimation in GCNNs. We introduce a set of quantitative criteria to capture different uncertainty aspects and then use these criteria to compare MC-dropout, Deep Ensembles, and bootstrapping, both theoretically in a unified framework that separates aleatoric/epistemic uncertainty and experimentally on public data sets. Our experiments quantify the performance of the different uncertainty estimation methods and their impact on uncertainty-related error reduction. Our findings indicate that Deep Ensembles and bootstrapping consistently outperform MC-dropout, with different context-specific pros and cons. Our analysis leads to a better understanding of the role of aleatoric/epistemic uncertainty, also in relation to the target data set features, and highlights the challenge posed by out-of-domain uncertainty.
机译:最近,深神经网络(DNN)的进步最近导致了显着的准确性和泛化能力模型的发展,图表卷积神经网络(GCNNS)报告了本任务的最先进的性能。然而,一些挑战仍然存在,并且最重要的是需要充分解决的疑虑涉及不确定性量化。 DNN性能受到训练样本的卷和质量的影响。因此,建立何时以及在多大程度上可以被认为是可靠的,同样重要的是输出准确的预测,尤其是当域外分子靶向时。最近,已经提出了几种考虑了DNN中不确定性的方法,其中大多数基于近似贝叶斯推理。其中,仅在应用程序中所需的大数据集的范围内。评估和比较这些方法最近引起了极大的兴趣,但结果通常是分割和不存在的分子性质预测。在本文中,我们定量比较GCNNS中的不确定性估计的可扩展技术。我们介绍了一系列定量标准来捕获不同的不确定性方面,然后使用这些标准来比较MC-Dropout,Deep Seanembles和自举,理论上是在统一的框架中,以分离炼膜/认识的不确定性和实验在公共数据集上。我们的实验量化了不同的不确定性估计方法的性能及其对不确定性相关误差减少的影响。我们的调查结果表明,深度集合和自动启动始终如一地优于MC-Tropout,具有不同的特定于背景特定的优点和缺点。我们的分析导致更好地了解炼膜/认知不确定性的作用,也与目标数据集特征有关,并突出了域外不确定性所构成的挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号