【24h】

Learning Decision Trees for Unbalanced Data

机译:学习不平衡数据的决策树

获取原文

摘要

Learning from unbalanced datasets presents a convoluted problem in which traditional learning algorithms may perform poorly. The objective functions used for learning the classifiers typically tend to favor the larger, less important classes in such problems. This paper compares the performance of several popular decision tree splitting criteria - information gain, Gini measure, and DKM - and identifies a new skew insensitive measure in Hellinger distance. We outline the strengths of Hellinger distance in class imbalance, proposes its application in forming decision trees, and performs a comprehensive comparative analysis between each decision tree construction method. In addition, we consider the performance of each tree within a powerful sampling wrapper framework to capture the interaction of the splitting metric and sampling. We evaluate over this wide range of datasets and determine which operate best under class imbalance.
机译:从不平衡的数据集学习提出了一个复杂的问题,其中传统的学习算法可能会表现不佳。用于学习分类器的目标函数通常倾向于在此类问题中偏爱较大,较不重要的类别。本文比较了几种流行的决策树划分标准(信息增益,Gini度量和DKM)的性能,并确定了Hellinger距离中一种新的偏斜不敏感度量。我们概述了Hellinger距离在类不平衡中的优势,提出了其在形成决策树中的应用,并对每种决策树的构造方法进行了全面的比较分析。此外,我们考虑了功能强大的采样包装器框架中每棵树的性能,以捕获拆分指标和采样的相互作用。我们在广泛的数据集上进行评估,并确定在类不平衡情况下哪个操作效果最佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号