首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >The impact of class imbalance in classification performance metrics based on the binary confusion matrix
【24h】

The impact of class imbalance in classification performance metrics based on the binary confusion matrix

机译:基于二进制混淆矩阵的分类性能指标的类别不平衡的影响

获取原文
获取原文并翻译 | 示例
           

摘要

A major issue in the classification of class imbalanced datasets involves the determination of the most suitable performance metrics to be used. In previous work using several examples, it has been shown that imbalance can exert a major impact on the value and meaning of accuracy and on certain other well-known performance metrics. In this paper, our approach goes beyond simply studying case studies and develops a systematic analysis of this impact by simulating the results obtained using binary classifiers. A set of functions and numerical indicators are attained which enables the comparison of the behaviour of several performance metrics based on the binary confusion matrix when they are faced with imbalanced datasets. Throughout the paper, a new way to measure the imbalance is defined which surpasses the Imbalance Ratio used in previous studies. From the simulation results, several clusters of performance metrics have been identified that involve the use of Geometric Mean or Bookmaker Informedness as the best null-biased metrics if their focus on classification successes (dismissing the errors) presents no limitation for the specific application where they are used. However, if classification errors must also be considered, then the Matthews Correlation Coefficient arises as the best choice. Finally, a set of null biased multi-perspective Class Balance Metrics is proposed which extends the concept of Class Balance Accuracy to other performance metrics. (C) 2019 The Authors. Published by Elsevier Ltd.
机译:类别不平衡数据集的分类中的一个主要问题涉及确定要使用的最合适的性能度量。在以前的工作中使用了几个例子,已经表明,不平衡可能对准确性和某些其他众所周知的性能指标的价值和含义产生重大影响。在本文中,我们的方法超出了简单地研究案例研究,通过模拟使用二元分类器获得的结果来发展对这种影响的系统分析。达到了一组功能和数值指示符,当它们面对不平衡数据集时,可以基于二进制混淆矩阵来比较多个性能度量的行为。在本文中,定义了一种测量不平衡的新方法,其超越了先前研究中使用的不平衡比。从模拟结果中,已经确定了几个性能指标集群,这涉及使用几何均值或账簿信息,作为最佳的无偏向度量,如果他们关注分类成功(解雇错误),没有限制他们的特定应用程序使用。但是,如果还必须考虑分类错误,则Matthews相关系数是最佳选择。最后,提出了一组空偏置多透视类余额度量,其将类平衡精度的概念扩展到其他性能指标。 (c)2019年作者。 elsevier有限公司出版

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号