首页> 外文期刊>Journal of classification >ROC and AUC with a Binary Predictor: a Potentially Misleading Metric
【24h】

ROC and AUC with a Binary Predictor: a Potentially Misleading Metric

机译:ROC和AUC与二进制预测器:一个潜在的误导性指标

获取原文
获取原文并翻译 | 示例
           

摘要

In analysis of binary outcomes, the receiver operator characteristic (ROC) curve is heavily used to show the performance of a model or algorithm. The ROC curve is informative about the performance over a series of thresholds and can be summarized by the area under the curve (AUC), a single number. When a predictor is categorical, the ROC curve has one less than number of categories as potential thresholds; when the predictor is binary, there is only one threshold. As the AUC may be used in decision-making processes on determining the best model, it important to discuss how it agrees with the intuition from the ROC curve. We discuss how the interpolation of the curve between thresholds with binary predictors can largely change the AUC. Overall, we show using a linear interpolation from the ROC curve with binary predictors corresponds to the estimated AUC, which is most commonly done in software, which we believe can lead to misleading results. We compare R, Python, Stata, and SAS software implementations. We recommend using reporting the interpolation used and discuss the merit of using the step function interpolator, also referred to as the "pessimistic" approach by Fawcett (2006).
机译:在分析二进制结果时,接收器-操作员特征(ROC)曲线被大量用于显示模型或算法的性能。ROC曲线是关于一系列阈值下性能的信息,可以通过曲线下面积(AUC)来总结,即单个数字。当预测因子是分类的时,ROC曲线的潜在阈值比类别数少一个;当预测器为二进制时,只有一个阈值。由于AUC可用于确定最佳模型的决策过程,因此讨论其与ROC曲线直觉的一致性非常重要。我们讨论了使用二元预测器对阈值之间的曲线进行插值如何在很大程度上改变AUC。总的来说,我们展示了使用ROC曲线的线性插值和二元预测值对应于估计的AUC,这通常是在软件中完成的,我们认为这可能会导致误导性结果。我们比较了R、Python、Stata和SAS软件实现。我们建议使用报告所使用的插值,并讨论使用阶跃函数插值器的优点,也就是福塞特(2006)提出的“悲观”方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号