The Better Predictive Model: High q~2 for the Training Set or Low Root Mean Square Error of Prediction for the Test Set?

Aynur O. Aptula; Nina G. Jeliazkova; Terry W. Schultz; Mark T. D. Cronin

首页> 外文期刊>QSAR & combinatorial science >The Better Predictive Model: High q~2 for the Training Set or Low Root Mean Square Error of Prediction for the Test Set?

【24h】

The Better Predictive Model: High q~2 for the Training Set or Low Root Mean Square Error of Prediction for the Test Set?

机译：更好的预测模型：训练集的q〜2高还是测试集的预测均方根误差低？

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The process of validation of computational models (e.g., QSARs) may become the most important step in their development. Different requirements for the reliability and predictability of QSAR models have been described in the literature. Despite these formal recommendations there are few practical rules as to when to cease adding variables to a QSAR (i.e., what is an appropriate level of complexity of the model). In this work the influence of model complexity to statistical fit and error have been investigated using toxicity data for 200 phenols to the ciliated protozoan Tetrahymena pyriformis when applying a test set of a further 50 compounds. The results from this investigation showed that some important factors play a role in the definition of a good and reliable QSAR. These include the fact that q2 is not a good criterion for a model predictivity; that outliers should not necessarily be deleted as this may reduce the chemical space of the model; the number of descriptors in a multivariate model should be chosen carefully to avoid model under- and over-estimation; and that an appropriate number of dimensions is required for PLS modelling.

机译：计算模型（例如QSAR）的验证过程可能会成为其开发中最重要的一步。文献中已经描述了对QSAR模型的可靠性和可预测性的不同要求。尽管有这些正式建议，但是关于何时停止向QSAR中添加变量的实用规则很少（即，模型的适当复杂程度是多少）。在这项工作中，当使用另外50种化合物的测试集时，使用200种苯酚对纤毛原生动物四膜虫的毒性数据研究了模型复杂性对统计拟合和误差的影响。这项调查的结果表明，一些重要因素在定义良好和可靠的QSAR中起着作用。这些事实包括q2不是模型预测性的良好标准;不必删除异常值，因为这可能会减少模型的化学空间;应谨慎选择多元模型中描述符的数量，以避免模型过低或过高;并且PLS建模需要适当数量的尺寸。

著录项

来源
《QSAR & combinatorial science》 |2005年第3期|共14页
作者
Aynur O. Aptula; Nina G. Jeliazkova; Terry W. Schultz; Mark T. D. Cronin;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类化学;
关键词
phenol toxicity; model complexity; validation; QSAR; RMSE; q~2;

机译：苯酚毒性;模型复杂度;验证;QSAR;RMSE;q〜2;

相似文献

外文文献
中文文献
专利

1. The Better Predictive Model: High q~2 for the Training Set or Low Root Mean Square Error of Prediction for the Test Set? [J] . Aynur O. Aptula, Nina G. Jeliazkova, Terry W. Schultz, QSAR & combinatorial science . 2005,第3期

机译：更好的预测模型：训练集的q〜2高还是测试集的预测均方根误差低？
2. Statistical Significance Testing as a Guide to Partial Least-Squares (PLS) Modeling of Nonideal Data Sets for Fuel Property Predictions [J] . Kirsten E. Kramer, Robert E. Morris, Susan L. Rose-Pehrsson, Energy & fuels . 2008,第1期

机译：统计显着性测试作为燃料属性预测的非理想数据集的偏最小二乘（PLS）建模的指南
3. Prediction-Error-Driven Position Estimation Method for Finite-Control-Set Model Predictive Control of Interior Permanent-Magnet Synchronous Motors [J] . Chen Zhuoyi, Qiu Jianqi, Jin Mengjia Emerging and Selected Topics in Power Electronics, IEEE Journal of . 2019,第1期

机译：永磁同步电动机有限控制集模型预测控制的预测误差驱动位置估计方法
4. Stopped training via algebraic online estimation of the expected test-set error [C] . Utans, J. . 1997

机译：通过预期的测试集错误的代数在线估计停止训练
5. Low Voltage Ride-Through for Photovoltaic Systems Using Finite Control-Set Model Predictive Control [D] . Franco, Fernand Diaz. 2017

机译：使用有限控制集模型预测控制的光伏系统低压穿越
6. Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction [O] . Shizhong Xu 2017

机译：混合模型平方的预测残留误差和：在基因组预测中的应用
7. Figure 2: Evaluation of the predictive power of the PCG-lncRNA-microRNA signature and pathologic stage in the training set, test set and entire set. [O] . -1

机译：图2：在训练集，测试集和整套中评估PCG-LNCRNA-MicroRNA签名和病理阶段的预测力。
8. Riemann-Problem and Level-Set Approaches for Two-Fluid Flow Computations. 2. Fixes for Solution Errors Near Interfaces. Modelling, Analysis and Simulation. [R] . Koren, B., Lewis, M. R., van Brummelen, E. H., 2001

机译：双流体流动计算的黎曼问题和水平集方法。 2.修复接口附近的解决方案错误。建模，分析和模拟。

The Better Predictive Model: High q~2 for the Training Set or Low Root Mean Square Error of Prediction for the Test Set?

摘要

著录项

相似文献

相关主题

期刊订阅