首页> 外文期刊>QSAR & combinatorial science >The Better Predictive Model: High q~2 for the Training Set or Low Root Mean Square Error of Prediction for the Test Set?
【24h】

The Better Predictive Model: High q~2 for the Training Set or Low Root Mean Square Error of Prediction for the Test Set?

机译:更好的预测模型:训练集的q〜2高还是测试集的预测均方根误差低?

获取原文
获取原文并翻译 | 示例
           

摘要

The process of validation of computational models (e.g., QSARs) may become the most important step in their development. Different requirements for the reliability and predictability of QSAR models have been described in the literature. Despite these formal recommendations there are few practical rules as to when to cease adding variables to a QSAR (i.e., what is an appropriate level of complexity of the model). In this work the influence of model complexity to statistical fit and error have been investigated using toxicity data for 200 phenols to the ciliated protozoan Tetrahymena pyriformis when applying a test set of a further 50 compounds. The results from this investigation showed that some important factors play a role in the definition of a good and reliable QSAR. These include the fact that q2 is not a good criterion for a model predictivity; that outliers should not necessarily be deleted as this may reduce the chemical space of the model; the number of descriptors in a multivariate model should be chosen carefully to avoid model under- and over-estimation; and that an appropriate number of dimensions is required for PLS modelling.
机译:计算模型(例如QSAR)的验证过程可能会成为其开发中最重要的一步。文献中已经描述了对QSAR模型的可靠性和可预测性的不同要求。尽管有这些正式建议,但是关于何时停止向QSAR中添加变量的实用规则很少(即,模型的适当复杂程度是多少)。在这项工作中,当使用另外50种化合物的测试集时,使用200种苯酚对纤毛原生动物四膜虫的毒性数据研究了模型复杂性对统计拟合和误差的影响。这项调查的结果表明,一些重要因素在定义良好和可靠的QSAR中起着作用。这些事实包括q2不是模型预测性的良好标准;不必删除异常值,因为这可能会减少模型的化学空间;应谨慎选择多元模型中描述符的数量,以避免模型过低或过高;并且PLS建模需要适当数量的尺寸。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号