Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection

Tetko IV; Sushko I; Pandey AK; Zhu H; Tropsha A; Papa E; Oberg T; Todeschini R; Fourches D; Varnek A

首页> 外文期刊>Journal of chemical information and modeling >Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection

【24h】

Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection

机译：关键的评估QSAR模型对梨形四膜虫的环境毒性：专注于适用范围和变量选择过度拟合

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based oil standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors. Moreover, the accuracy of prediction is mainly determined by the training set data distribution in the chemistry and activity spaces but not by QSAR approaches used to develop the models. We have shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented. The toxicity of 3182 and 48774 molecules from the EPA High Production Volume (HPV) Challenge Program and EINECS (European chemical Substances Information System), respectively, was predicted, and the accuracy of prediction was estimated. The developed models are available online at http://www.qspr.org site.

机译：预测准确性的估计是QSAR建模中的关键问题。 “与模型的距离”可以定义为一种度量，该度量定义在特定模型的上下文中针对给定属性的训练集分子与测试集化合物之间的相似性。它可以用许多不同的方式表示，例如使用Tanimoto系数，杠杆，模型空间中的相关性等。在本文中，我们使用了高斯分布的混合以及统计检验来评估关于模型的六种距离区分具有小和大预测误差的化合物的能力。对十二种QSAR模型进行了分析，该模型通过不同的机器学习方法和各种类型的描述词获得了对拟南芥的水毒性。从模型集合计算得出的基于模型的预测毒性油标准偏差的距离提供了最佳结果。对于使用对数P和最大受体超可离域性描述符开发的基于机理的模型，该距离还成功地区分了具有低和大预测误差的分子。因此，到模型度量的距离也可以通过估计其预测误差来用于增强机械QSAR模型。此外，预测的准确性主要取决于化学和活性空间中训练集的数据分布，而不取决于用于开发模型的QSAR方法。我们已经表明，对模型的不正确验证可能会导致对模型性能的错误估计，并提出了可以如何解决此问题的建议。分别从EPA高产量（HPV）挑战计划和EINECS（欧洲化学物质信息系统）中预测了3182和48774分子的毒性，并评估了预测的准确性。可以在http://www.qspr.org网站在线获得开发的模型。

著录项

来源
《Journal of chemical information and modeling》 |2008年第9期|共14页
作者
Tetko IV; Sushko I; Pandey AK; Zhu H; Tropsha A; Papa E; Oberg T; Todeschini R; Fourches D; Varnek A;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类化学;
关键词
NEURAL-NETWORKS; QSPR MODELS; ERROR ESTIMATION; VALIDATION; PREDICTION; SOLUBILITY; CONFIDENCE; REGRESSION; MOLECULES; ACCURACY;

机译：神经网络;QSPR模型;错误估计;验证;预测;可溶解性;置信度;回归;分子;准确性;

相似文献

外文文献
中文文献
专利

1. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection [J] . Tetko IV, Sushko I, Pandey AK, Journal of chemical information and modeling . 2008,第9期

机译：关键的评估QSAR模型对梨形四膜虫的环境毒性：专注于适用范围和变量选择过度拟合
2. QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis [J] . Kunal Roy, Rudra Narayan Das Journal of Hazardous Materials . 2010,第1a3期

机译：具有扩展的拓扑化学原子（ETA）指数的QSTR。 14.芳香醛对梨形四膜虫的毒性的QSAR建模
3. QSAR modelling of the toxicity to Tetrahymena pyriformis by balance of correlations [J] . Toropov A.A., Toropova A.P., Benfenati E., Molecular diversity . 2010,第4期

机译：相关性平衡通过QSAR模拟对梨形四膜虫的毒性
4. QSAR Study on Environmental Androgens Application of Variable Selection Method Based on Variable Interaction [C] . Zhongsheng YI, Fangting AI, Mingze XU, Conference on Environmental Pollution and Public Health . 2012

机译：基于变量交互的变量选择方法在环境雄激素应用中的QSAR研究
5. A rhetorical investigation of energy-related environmental issues and a proposed modeling of variables influencing the employment of domestic solar water heaters with a focus on mobilizing information. [D] . Garner, Lilla Gayle. 2001

机译：对与能源有关的环境问题进行的言辞调查，以及拟议的影响家用太阳能热水器使用的变量建模，重点是调动信息。
6. Comparison of applicability domains of QSAR models: application to the modelling of the environmental toxicity against Tetrahymena pyriformis [O] . Igor V Tetko, Alexander Tropsha, H Zhu, 2008

机译：QSAR模型的适用范围比较：在对梨形四膜虫的环境毒性建模中的应用
7. Comparison of applicability domains of QSAR models: application to the modelling of the environmental toxicity against Tetrahymena pyriformis [O] . Fourches D, Gramatica P, Öberg T, 2008

机译：QSAR模型适用范围的比较：在对<<< Tetrahymena pyriformis >> 的环境毒性建模中的应用
8. Summary of Research, Assessment, and Management Capabilities Applicable to the Fields of Hazardous Waste, Toxic Materials, and Environmental Contamination. [R] . 1989

机译：适用于危险废物，有毒物质和环境污染领域的研究，评估和管理能力摘要。

Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection

摘要

著录项

相似文献

相关主题

期刊订阅