...
首页> 外文期刊>Croatica chemica acta >Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges
【24h】

Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges

机译:在预测挑战中估算随机准确性及其在预测挑战中验证的验证

获取原文

摘要

Shortcomings of the correlation coefficient (Pearson's) as a measure for estimating and calculating the accuracy of predictive model properties are analysed. Here we discuss two such cases that can often occur in the application of the model in predicting properties of a new external set of compounds. The first problem in using the correlation coefficient is its insensitivity to the systemic error that must be expected in predicting properties of a novel external set of compounds, which is not a random sample selected from the training set. The second problem is that an external set can be arbitrarily large or small and have an arbitrary and uneven distribution of the measured value of the target variable, whose values are not known in advance. In these conditions, the correlation coefficient can be an overoptimistic measure of agreement of predicted values with the corresponding experimental values and can lead to a highly optimistic conclusion about the predictive ability of the model. Due to these shortcomings of the correlation coefficient, the use of standard error (root-mean-square-error) of prediction is suggested as a better quality measure of predictive capabilities of a model. In the case of classification models, the use of the difference between the real accuracy and the most probable random accuracy of the model shows very good characteristics in ranking different models according to predictive quality, having at the same time an obvious interpretation . This work is licensed under a Creative Commons Attribution 4.0 International License.
机译:分析了相关系数(Pearson)作为估算和计算预测模型性能准确性的措施的缺点。在这里,我们讨论了两个这样的病例,这些情况通常可以在模型中预测新外部化合物的性质的应用中。使用相关系数的第一问题是对必须预期的系统错误的不敏感性,所述系统错误在预测新的外部化合物的性质中,这不是选自训练集中的随机样品。第二问题是外部集合可以任意大或小并且具有目标变量的测量值的任意和不均匀分布,其值预先知道。在这些条件下,相关系数可以是具有相应实验值的预测值达的达的常量衡量标准,并且可以导致模型预测能力的高度乐观结论。由于这些相关系数的缺点,建议使用标准误差(根均方误差)的预测作为模型预测能力的更好质量测量。在分类模型的情况下,使用实际精度与模型最可能的随机精度之间的差异显示了根据预测质量等级不同模型的非常好的特征,同时具有明显的解释。这项工作是根据Creative Commons归因于4.0国际许可证的许可。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号