首页> 外文期刊>Current computer-aided drug design >Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling
【24h】

Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling

机译:谨防外部验证! - QSAR建模中使用的几种验证技术的比较研究

获取原文
获取原文并翻译 | 示例
           

摘要

Background: Proper validation is an important aspect of QSAR modelling. Externalvalidation is one of the widely used validation methods in QSAR where the model is built on a subsetof the data and validated on the rest of the samples. However, its effectiveness for datasets with asmall number of samples but a large number of predictors remains suspect.Objective: Calculating hundreds or thousands of molecular descriptors using currently availablesoftware has become the norm in QSAR research, owing to computational advances in the past fewdecades. Thus, for n chemical compounds and p descriptors calculated for each molecule, the typicalchemometric dataset today has a high value of p but small n (i.e. n p). Motivated by the evidenceof inadequacies of external validation in estimating the true predictive capability of a statistical modelin recent literature, this paper performs an extensive and comparative study of this method with severalother validation techniques.Methodology: We compared four validation methods: Leave-one-out, K-fold, external and multi-splitvalidation, using statistical models built using the LASSO regression, which simultaneously performsvariable selection and modelling. We used 300 simulated datasets and one real dataset of 95congeneric amine mutagens for this evaluation.Results: External validation metrics have high variation among different random splits of the data,hence are not recommended for predictive QSAR models. LOO has the overall best performanceamong all validation methods applied in our scenario.Conclusion: Results from external validation are too unstable for the datasets we analyzed. Based onour findings, we recommend using the LOO procedure for validating QSAR predictive models built onhigh-dimensional small-sample data.
机译:背景:正确验证是QSAR建模的一个重要方面。 ExternalValidation是QSAR中的广泛使用的验证方法之一,其中模型构建在数据子集上并在其余的样本上验证。然而,它对具有ASMALL数量的数据集但大量预测器的数据集仍然是可疑的。目的:使用当前可用的软件计算数百或数千个分子描述符已成为QSAR研究中的常态,而过去几分钟。因此,对于针对每个分子计算的N化学化合物和P描述符,目前的典型化学计量数据集具有高价值,但小n(即n p)。通过外部验证的证据估算统计模型的真实预测能力的近期文献的真正预测能力,这篇论文对这种具有多种验证技术的方法进行了广泛和比较的研究。方法:我们比较了四种验证方法:休留 - 一次性,k折叠,外部和多拆分,使用使用套索回归构建的统计模型,同时执行可变选择和建模。我们使用了300个模拟数据集和一个Real DataSet为此评估。结果:外部验证度量在数据的不同随机分割之间具有高变化,因此不建议用于预测QSAR模型。 LOO具有在我们的场景中应用的所有验证方法的整体最佳性能.Conclusion:外部验证的结果对于我们分析的数据集来说太不稳定了。基于对OTOUR调查结果,我们建议使用LOO程序来验证QSAR预测模型构建的QSAR预测模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号