Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling

Subhabrata Majumdar; Subhash C. Basak

首页> 外文期刊>Current computer-aided drug design >Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling

【24h】

Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling

机译：谨防外部验证！ - QSAR建模中使用的几种验证技术的比较研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background: Proper validation is an important aspect of QSAR modelling. Externalvalidation is one of the widely used validation methods in QSAR where the model is built on a subsetof the data and validated on the rest of the samples. However, its effectiveness for datasets with asmall number of samples but a large number of predictors remains suspect.Objective: Calculating hundreds or thousands of molecular descriptors using currently availablesoftware has become the norm in QSAR research, owing to computational advances in the past fewdecades. Thus, for n chemical compounds and p descriptors calculated for each molecule, the typicalchemometric dataset today has a high value of p but small n (i.e. n p). Motivated by the evidenceof inadequacies of external validation in estimating the true predictive capability of a statistical modelin recent literature, this paper performs an extensive and comparative study of this method with severalother validation techniques.Methodology: We compared four validation methods: Leave-one-out, K-fold, external and multi-splitvalidation, using statistical models built using the LASSO regression, which simultaneously performsvariable selection and modelling. We used 300 simulated datasets and one real dataset of 95congeneric amine mutagens for this evaluation.Results: External validation metrics have high variation among different random splits of the data,hence are not recommended for predictive QSAR models. LOO has the overall best performanceamong all validation methods applied in our scenario.Conclusion: Results from external validation are too unstable for the datasets we analyzed. Based onour findings, we recommend using the LOO procedure for validating QSAR predictive models built onhigh-dimensional small-sample data.

机译：背景：正确验证是QSAR建模的一个重要方面。 ExternalValidation是QSAR中的广泛使用的验证方法之一，其中模型构建在数据子集上并在其余的样本上验证。然而，它对具有ASMALL数量的数据集但大量预测器的数据集仍然是可疑的。目的：使用当前可用的软件计算数百或数千个分子描述符已成为QSAR研究中的常态，而过去几分钟。因此，对于针对每个分子计算的N化学化合物和P描述符，目前的典型化学计量数据集具有高价值，但小n（即n p）。通过外部验证的证据估算统计模型的真实预测能力的近期文献的真正预测能力，这篇论文对这种具有多种验证技术的方法进行了广泛和比较的研究。方法：我们比较了四种验证方法：休留 - 一次性，k折叠，外部和多拆分，使用使用套索回归构建的统计模型，同时执行可变选择和建模。我们使用了300个模拟数据集和一个Real DataSet为此评估。结果：外部验证度量在数据的不同随机分割之间具有高变化，因此不建议用于预测QSAR模型。 LOO具有在我们的场景中应用的所有验证方法的整体最佳性能.Conclusion：外部验证的结果对于我们分析的数据集来说太不稳定了。基于对OTOUR调查结果，我们建议使用LOO程序来验证QSAR预测模型构建的QSAR预测模型。

著录项

来源
《Current computer-aided drug design》 |2018年第4期|共8页
作者
Subhabrata Majumdar; Subhash C. Basak;
展开▼
作者单位

University of Florida Informatics Institute Gainesville Florida United States;

Department of Chemistry and Biochemistry University of Minnesota Duluth - Natural Resources;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类药学;
关键词
Cross validation; Leave One Out (LOO) cross validation; K-fold cross validation; external validation; LASSO; chemical mutagens; aromatic and heteroaromatic amines; Cross validation; Leave One Out (LOO) cross validation; K-fold cross validation; external validation; LASSO; chemical mutagens; aromatic and heteroaromatic amines;

机译：交叉验证;留出一个（厕所）交叉验证;k折叠交叉验证;外部验证;套索;化学诱变;芳香和杂芳族胺;交叉验证;留出一个（loo）交叉验证;k折叠交叉验证;外部验证;套索;化学诱变;芳族和杂芳胺;

相似文献

外文文献
中文文献
专利

1. Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling [J] . Subhabrata Majumdar, Subhash C. Basak Current computer-aided drug design . 2018,第4期

机译：谨防外部验证！ - QSAR建模中使用的几种验证技术的比较研究
2. Exploring the structural aspects of ureido-amino acid-based APN inhibitors: a validated comparative multi-QSAR modelling study (vol 21, pg 451, 2020) [J] . Banerjee S., Amin S. A., Baidya S. K., SAR and QSAR in Environmental Research . 2020,第4a6期

机译：探讨脲氨基酸基APN抑制剂的结构方面：验证的比较多QSAR建模研究（Vol 21，PG 451,2020）
3. Exploring the structural aspects of ureido-amino acid-based APN inhibitors: a validated comparative multi-QSAR modelling study [J] . Banerjee S., Amin S. A., Baidya S. K., SAR and QSAR in Environmental Research . 2020,第4a6期

机译：探讨脲氨基酸基APN抑制剂的结构方面：验证的比较多QSAR建模研究
4. Proper Statistical Modeling and Validation in QSAR: A Case Study in the Prediction of Rat Fat-Air Partitioning [C] . Subhash C. Basak, Denise Mills, Douglas M. Hawkins, International Conference of Computational Methods in Sciences and Engineering 2007(ICCMSE 2007); 20070925-30; Corfu(GR) . 2007

机译：QSAR中正确的统计建模和验证：以大鼠脂肪-空气分配预测为例
5. A comparative validation study of three personality inventories designed to assess the five-factor model of personality [D] . Milner, Lisa Michelle 1992

机译：旨在评估人格五因素模型的三种人格量表的比较验证研究
6. Development of predictive QSAR models for Vibrio fischeri toxicity of ionic liquids and their true external and experimental validation tests [O] . Rudra Narayan Das, Tânia E. Sintra, João A. P. Coutinho, 2016

机译：离子液体对费氏弧菌毒性的预测QSAR模型的开发及其真实的外部和实验验证测试
7. Development of predictive QSAR models for Vibrio fischeri toxicity of ionic liquids and their true external and experimental validation tests [O] . Das, Rudra Narayan, Sintra, Tânia E., Coutinho, João A.P., 2016

机译：离子液体对费氏弧菌毒性的预测QSAR模型的开发及其真实的外部和实验验证测试

Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling

摘要

著录项

相似文献

相关主题

期刊订阅