...
首页> 外文期刊>Journal of Chemometrics >Comparison of validation variants by sum of ranking differences and ANOVA
【24h】

Comparison of validation variants by sum of ranking differences and ANOVA

机译:排名差异和ANOVA的验证变体的比较

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The old debate is revived: Definite differences can be observed in suggestions of estimation for prediction performances of models and for validation variants according to the various scientific disciplines. However, the best and/or recommended practice for the same data set cannot be dependent on the field of usage. Fortunately, there is a method comparison algorithm, which can rank and group the validation variants; its combination with variance analysis will reveal whether the differences are significant or merely the play of random errors. Therefore, three case studies have been selected carefully to reveal similarities and differences in validation variants. The case studies illustrate the different significance of these variants well. In special circumstances, any of the influential factors for validation variants can exert significant influence on evaluation by sums of (absolute) ranking differences (SRDs): stratified (contiguous block) or repeated Monte Carlo resampling and how many times the data set is split (5-7-10). The optimal validation variant should be determined individually again and again. A random resampling with sevenfold cross-validations seems to be a good compromise to diminish the bias and variance alike. If the data structure is unknown, a randomization of rows is suggested before SRD analysis. On the other hand, the differences in classifiers, validation schemes, and models proved to be always significant, and even subtle differences can be detected reliably using SRD and analysis of variance (ANOVA).
机译:恢复旧辩论:根据各种科学学科的预测性能的估计和验证变体的估计,可以观察到明确的差异。但是,相同数据集的最佳和/或推荐的做法不能依赖于使用领域。幸运的是,有一种方法比较算法,可以排名和分组验证变体;它与方差分析的结合将揭示差异是否是显着的或仅仅是随机误差的播放。因此,仔细选择了三种案例研究,以揭示验证变体中的相似性和差异。案例研究说明了这些变体的不同意义。在特殊情况下,任何用于验证变体的影响因素都可以通过(绝对)排名差异(SRD)的总和对评估产生重大影响:分层(连续块)或重复的蒙特卡罗重新采样以及数据集分为多次( 5-7-10)。应当再次又一次地单独确定最佳验证变量。随机重新采样,具有七倍交叉验证似乎是一种良好的折衷,以减少偏差和方差相似。如果数据结构未知,则在SRD分析之前建议行的随机化。另一方面,使用SRD和方差分析(ANOVA)可以可靠地检测分类器,验证方案和模型中的分类器,验证方案和模型的差异,甚至可以可靠地检测到细微差异(ANOVA)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号