首页> 中文期刊> 《大观周刊》 >基于不平衡数据集的决策树算法

基于不平衡数据集的决策树算法

         

摘要

为了使决策树健壮,我们从描述信息增益开始,关于这个规则的置信度,使用C4.5作为度量。这可以使我们快速的解释为什么信息增益,象置信度,偏重大多数类的规则的结果。为了克服这种偏见,我们介绍一种新度量,类置信度比例(CCP),它是CCPDT(类置信度比例决策树)形成的基础。这两种变化在一起产生一个分类器,它不仅比传统的决策树,而且比著名的平衡取样技术学习树能更好的完成统计。%In order to make decision trees robust, we begin by expressing Information Gain, the metric used in C4.5, in terms of con- fidence of a rule. This allows us to immediately explain why Information Gain. like confidence, results in rules which are biased towards the majority class. To overcome this bias. we introduce a new measure. Class Confidence Proportion (CCP), which forms the basis of CCPDT. Together these two changes yield a classifier that performs statistically better than not only traditional decision trees but also trees learned from data that has been balanced by well known sampling techniques.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号