首页> 外文期刊>Journal of chemical information and modeling >General Approach to Estimate Error Bars for Quantitative Structure-Activity Relationship Predictions of Molecular Activity
【24h】

General Approach to Estimate Error Bars for Quantitative Structure-Activity Relationship Predictions of Molecular Activity

机译:估算分子活性定量结构活性关系预测的误差条的一般方法

获取原文
获取原文并翻译 | 示例

摘要

Key requirements for quantitative structure-activity relationship (QSAR) models to gain acceptance by regulatory authorities include a defined domain of applicability (DA) and appropriate measures of goodness-of-fit, robustness, and predictivity. Hence, many DA metrics have been developed over the past two decades. The most intuitive are perhaps distance-to-model metrics, which are most commonly defined in terms of the mean distance between a molecule and its k nearest training samples. Detailed evaluations have shown that the variance of predictions by an ensemble of QSAR models may serve as a DA metric and can outperform distance-to-model metrics. Intriguingly, the performance of ensemble variance metric has led researchers to conclude that the error of predicting a new molecule does not depend on the input descriptors or machine-learning methods but on its distance to the training molecules. This implies that the distance to training samples may serve as the basis for developing a high-performance DA metric. In this article, we introduce a new Tanimoto distance-based DA metric called the sum of distance-weighted contributions (SDC), which takes into account contributions from all molecules in a training set. Using four acute chemical toxicity data sets of varying sizes and four other molecular property data sets, we demonstrate that SDC correlates well with the prediction error for all data sets regardless of the machine-learning methods and molecular descriptors used to build the QSAR models. Using the acute toxicity data sets, we compared the distribution of prediction errors with respect to SDC, the mean distance tok-nearest training samples, and the variance of random forest predictions. The results showed that the correlation with the prediction error was highest for SDC. We also demonstrate that SDC allows for the development of robust root mean squared error (RMSE) models and makes it possible to not only give a QSAR prediction but also provide an individu
机译:定量结构-活性关系(QSAR)模型获得监管机构认可的关键要求包括定义的适用范围(DA)以及拟合优度、稳健性和预测性的适当度量。因此,在过去的二十年中,许多DA指标都得到了发展。最直观的可能是到模型的距离度量,它最常见的定义是分子与其k个最近训练样本之间的平均距离。详细评估表明,QSAR模型集合的预测方差可以作为DA度量,并且可以优于模型距离度量。有趣的是,集合方差度量的性能使研究人员得出结论,预测新分子的误差不取决于输入描述符或机器学习方法,而是取决于它与训练分子的距离。这意味着到训练样本的距离可以作为开发高性能DA度量的基础。在本文中,我们介绍了一种新的基于Tanimoto距离的DA度量,称为距离加权贡献之和(SDC),它考虑了训练集中所有分子的贡献。使用四个不同大小的急性化学毒性数据集和四个其他分子特性数据集,我们证明,无论用于构建QSAR模型的机器学习方法和分子描述符如何,SDC与所有数据集的预测误差都具有良好的相关性。利用急性毒性数据集,我们比较了SDC预测误差的分布、最近训练样本的平均距离以及随机森林预测的方差。结果表明,SDC与预测误差的相关性最高。我们还证明,SDC允许发展稳健的均方根误差(RMSE)模型,并使其不仅可以给出QSAR预测,而且可以提供个性化的

著录项

  • 来源
  • 作者单位

    US Army Med Res &

    Mat Command US Dept Def Biotechnol High Performance Comp Software Applica Telemed &

    Adv Technol Res Ctr Ft Detrick MD 21702 USA;

    Def Threat Reduct Agcy Aberdeen Proving Ground MD 21010 USA;

    US Army Edgewood Chem Biol Ctr Operat Toxicol Aberdeen Proving Ground MD 21010 USA;

    US Army Med Res &

    Mat Command US Dept Def Biotechnol High Performance Comp Software Applica Telemed &

    Adv Technol Res Ctr Ft Detrick MD 21702 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 化学;化学工业;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号