A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

André M. Carrington; Paul W. Fieguth; Hammad Qazi; Andreas Holzinger; Helen H. Chen; Franz Mayr; Douglas G. Manuel

首页> 外文期刊>BMC Medical Informatics and Decision Making >A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

【24h】

A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

机译：机器学习算法评估中的新辅助部分AUC和部分C统计数据

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Only part of the ROC curve and AUC are informative however when they are used with imbalanced data. Hence, alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve. However, these alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives. We derive and propose a new concordant partial AUC and a new partial c statistic for ROC data—as foundational measures and methods to help understand and explain parts of the ROC plot and AUC. Our partial measures are continuous and discrete versions of the same measure, are derived from the AUC and c statistic respectively, are validated as equal to each other, and validated as equal in summation to whole measures where expected. Our partial measures are tested for validity on a classic ROC example from Fawcett, a variation thereof, and two real-life benchmark data sets in breast cancer: the Wisconsin and Ljubljana data sets. Interpretation of an example is then provided. Results show the expected equalities between our new partial measures and the existing whole measures. The example interpretation illustrates the need for our newly derived partial measures. The concordant partial area under the ROC curve was proposed and unlike previous partial measure alternatives, it maintains the characteristics of the AUC. The first partial c statistic for ROC plots was also proposed as an unbiased interpretation for part of an ROC curve. The expected equalities among and between our newly derived partial measures and their existing full measure counterparts are confirmed. These measures may be used with any data set but this paper focuses on imbalanced data with low prevalence. Future work with our proposed measures may: demonstrate their value for imbalanced data with high prevalence, compare them to other measures not based on areas; and combine them with other ROC measures and techniques.

机译：在分类和诊断测试中，接收器运算符特征（ROC）绘图和ROC曲线（AUC）下的区域描述了可调阈值如何导致两种类型的错误变化：误报和假否定。然而，当它们与不平衡数据一起使用时，只有部分ROC曲线和AUC都是信息性的。因此，已经提出了AUC的替代方案，例如精密召回曲线下的部分AUC和该区域。然而，这些替代方案不能像AUC那样完全解释，部分原因是他们忽略了有关实际否定的一些信息。我们派生并提出了一种新的协调部分AUC和ROC数据的新部分C统计数据 - 作为帮助理解和解释ROC PLOT和AUC部分的方法和方法。我们的部分措施是同一措施的连续和离散版本，分别来自AUC和C统计数据，验证等于彼此，并验证为相同的总和到预期的整个措施。我们的部分措施在FAWCETT，其变异和乳腺癌中的两个现实寿命基准数据集中测试了对经典ROC示例的有效性，以及乳腺癌中的两个现实基准数据集：威斯康星州和卢布尔雅那数据集。然后提供对示例的解释。结果显示了我们新的部分措施与现有措施之间的预期相等。示例解释说明了对我们新派生的部分措施的需求。 ROC曲线下的一致性部分面积提出并与之前的部分措施替代品不同，它保持AUC的特征。 ROC地块的第一部分C统计数据也被提出为部分ROC曲线的解释。确认了我们新派生的部分措施及其现有的全面措施对应物之间的预期平衡。这些措施可以与任何数据集一起使用，但本文侧重于具有低普及率的不平衡数据。未来与我们拟议措施的工作可能：展示其对具有高普及率的不平衡数据的价值，将它们与其他不基于地区的其他措施进行比较;并将其与其他ROC措施和技术相结合。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2020年第1期|共12页
作者
André M. Carrington; Paul W. Fieguth; Hammad Qazi; Andreas Holzinger; Helen H. Chen; Franz Mayr; Douglas G. Manuel;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Area under the ROC curveReceiver operating characteristicC statisticConcordancePartial area indexImbalanced dataPrevalenceClassificationDiagnostic testingExplainable artificial intelligence;

机译：区域下的区域下的ROC CureCeiver操作特征统计统计级级索引计时器DataPrevalenceCeclassificddiagnostic Testingexplainableable人工智能;

相似文献

外文文献
中文文献
专利

1. Machine Learning Approximation Algorithms for High-Dimensional Fully Nonlinear Partial Differential Equations and Second-order Backward Stochastic Differential Equations [J] . Beck Christian, Weinan E., Jentzen Arnulf Journal of nonlinear science . 2019,第4期

机译：高维全非线性偏微分方程和二阶向后随机微分方程的机器学习近似算法
2. Handling Imbalance Classification Virtual Screening Big Data Using Machine Learning Algorithms [J] . Sahar K. Hussin, Salah M. Abdelmageid, Adel Alkhalil, Complexity . 2021,第a期

机译：使用机器学习算法处理不平衡分类虚拟筛选大数据
3. Insider Threat Detection Using Supervised Machine Learning Algorithms on an Extremely Imbalanced Dataset [J] . International Journal of Cyber Warfare and Terrorism . 2020,第2期

机译：使用监督的机器学习算法对极端不平衡的数据集进行内部威胁检测
4. Evaluation of Machine Learning Methods in a Rain Detection System for Partial Discharge Data Analysis [C] . Leandro H. S. Silva, Sergio C. Oliveira, Eduardo Fontana International Conference on Intelligent Systems and Applications . 2013

机译：近放电数据分析雨量检测系统中机器学习方法的评价
5. Investigating Machine Learning Algorithms with Imbalanced Big Data [D] . Hasanin, Tawfiq. 2019

机译：调查机器学习算法，具有不平衡的大数据
6. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms [O] . André M. Carrington, Paul W. Fieguth, Hammad Qazi, 2020

机译：机器学习算法评估中不平衡数据的新一致局部AUC和局部c统计量
7. Comparative Performance of Deep Learning and Machine Learning Algorithms on Imbalanced Handwritten Data [O] . A’inur A’fifah, Amelia Ritahani, Abdullah Ahmad 2018

机译：基于手写数据的深度学习和机器学习算法的比较表现

A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

摘要

著录项

相似文献

相关主题

期刊订阅