首页> 外文会议>International Conference on Pattern Recognition >Hellinger Distance Trees for Imbalanced Streams
【24h】

Hellinger Distance Trees for Imbalanced Streams

机译:流失的Hellinger距离树

获取原文
获取外文期刊封面目录资料

摘要

Classifiers trained on data sets possessing an imbalanced class distribution are known to exhibit poor generalisation performance. This is known as the imbalanced learning problem. The problem becomes particularly acute when we consider incremental classifiers operating on imbalanced data streams, especially when the learning objective is rare class identification. As accuracy may provide a misleading impression of performance on imbalanced data, existing stream classifiers based on accuracy can suffer poor minority class performance on imbalanced streams, with the result being low minority class recall rates. In this paper we address this deficiency by proposing the use of the Hellinger distance measure, as a very fast decision tree split criterion. We demonstrate that by using Hellinger a statistically significant improvement in recall rates on imbalanced data streams can be achieved, with an acceptable increase in the false positive rate.
机译:已知在具有不平衡类别分布的数据集上训练的分类器表现出较差的泛化性能。这就是所谓的学习失衡问题。当我们考虑在不平衡的数据流上运行增量分类器时,尤其是在学习目标是罕见的类别识别时,问题变得尤为严重。由于准确性可能会给不平衡的数据带来误导性的性能印象,因此基于准确性的现有流分类器可能会在不平衡的流中遭受少数族裔类别的不良性能,结果是少数族裔类别的召回率较低。在本文中,我们通过提出使用Hellinger距离度量作为非常快速的决策树拆分准则来解决这一缺陷。我们证明,通过使用Hellinger,可以在不平衡数据流上实现召回率的统计上显着改善,并且误报率的增加也可以接受。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号