Apples-to-Apples in Cross-Validation Studies: Pitfalls in Classifier Performance Measurement

George Forman; Martin Scholz

首页> 外文期刊>SIGKDD explorations >Apples-to-Apples in Cross-Validation Studies: Pitfalls in Classifier Performance Measurement

【24h】

Apples-to-Apples in Cross-Validation Studies: Pitfalls in Classifier Performance Measurement

机译：交叉验证研究中的“苹果对苹果”：分类器性能评估中的陷阱

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cross-validation is a mainstay for measuring performance and progress in machine learning. There are subtle differences in how exactly to compute accuracy, F-measure and Area Under the ROC Curve (AUC) in cross-validation studies. However, these details are not discussed in the literature, and incompatible methods are used by various papers and software packages. This leads to inconsistency across the research literature. Anomalies in performance calculations for particular folds and situations go undiscovered when they are buried in aggregated results over many folds and datasets, without ever a person looking at the intermediate performance measurements. This research note clarifies and illustrates the differences, and it provides guidance for how best to measure classification performance under cross-validation. In particular, there are several divergent methods used for computing F-measure, which is often recommended as a performance measure under class imbalance, e.g., for text classification domains and in one-vs.-all reductions of datasets having many classes. We show by experiment that all but one of these computation methods leads to biased measurements, especially under high class imbalance. This paper is of particular interest to those designing machine learning software libraries and researchers focused on high class imbalance.

机译：交叉验证是衡量机器学习性能和进度的主要手段。在交叉验证研究中，如何精确计算准确性，F度量和ROC曲线下面积（AUC）有细微的差异。但是，这些细节没有在文献中讨论，并且各种论文和软件包都使用了不兼容的方法。这导致整个研究文献不一致。当特定折叠和情况的性能计算异常隐藏在许多折叠和数据集的汇总结果中时，就不会发现这些异常，而无需人们查看中间性能测量。该研究笔记阐明并说明了差异，并为如何最好地在交叉验证下衡量分类性能提供了指导。特别是，有几种不同的方法用于计算F度量，通常建议将其用作类别不平衡下的性能度量，例如，针对文本分类域以及与具有多个类别的数据集进行的全约简。我们通过实验证明，除一种计算方法外，所有其他方法都会导致测量结果有偏差，尤其是在高度不平衡的情况下。对于那些设计机器学习软件库的人和致力于高级不平衡的研究人员，本文特别感兴趣。

著录项

来源
《SIGKDD explorations》 |2010年第1期|共9页
作者
George Forman; Martin Scholz;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类 TP274.2;
关键词

相似文献

外文文献
中文文献
专利

1. Apples-to-Apples in Cross-Validation Studies: Pitfalls in Classifier Performance Measurement [J] . George Forman, Martin Scholz SIGKDD explorations . 2010,第1期

机译：交叉验证研究中的“苹果对苹果”：分类器性能评估中的陷阱
2. Comparison of the Effects of Cross-validation Methods on Determining Performances of Classifiers Used in Diagnosing Congestive Heart Failure [J] . Yalcin Isler, Ali Narin, Mahmut Ozer Mineralogia . 2015,第4期

机译：交叉验证方法对充血性心力衰竭诊断分类器性能的影响比较
3. Possible pitfalls in the interpretation of microcirculatory measurements. A comparative study using intravital microscopy, spectroscopy and polarographic pO2 measurements. [J] . Duchs R, Foitzik T European surgical research . 2008,第1期

机译：在微循环测量的解释中可能存在的陷阱。使用活体显微镜，光谱学和极谱法pO2测量进行的比较研究。
4. The Effect of Recursive Feature Elimination with Cross-Validation (RFECV) Feature Selection Algorithm toward Classifier Performance on Credit Card Fraud Detection [C] . Adi Zaenul Mustaqim, Sumarni Adi, Yoga Pristyanto, International Conference on Artificial Intelligence and Computer Science Technology . 2021

机译：递归特征消除对信用卡欺诈检测对分类器性能的跨验证（RFECV）特征选择算法的影响
5. An empirical study of performance metrics for classifier evaluation in machine learning. [D] . Bruhns, Stefan. 2008

机译：对机器学习中分类器评估的性能指标进行的经验研究。
6. Anatomy Education Environment Measurement Inventory (AEEMI): a cross-validation study in Malaysian medical schools [O] . Siti Nurma Hanim Hadie, Muhamad Saiful Bahri Yusoff, Wan Nor Arifin, 2021

机译：解剖学教育环境测量库存（aeemi）：马来西亚医学院的交叉验证研究
7. Comparison of the Effects of Cross-validation Methods on Determining Performances of Classifiers Used in Diagnosing Congestive Heart Failure [O] . Isler Yalcin, Narin Ali, Ozer Mahmut 2015

机译：交叉验证方法对确定充血性心力衰竭诊断用量表性能影响的比较

Apples-to-Apples in Cross-Validation Studies: Pitfalls in Classifier Performance Measurement

摘要

著录项

相似文献

相关主题

期刊订阅