...
【24h】

Improved C4.5 algorithm based on k-means

机译:基于k-means的改进的C4.5算法

获取原文
获取原文并翻译 | 示例
           

摘要

When the traditional C4.5 algorithm deals with the big data with a large number of multidimensional continuous attribute values, it may cause the issue of low classification accuracy with the related discretization method. This paper proposes a novel method to discretize continuous data based on the k-means algorithm. The method generates data clusters by combining continuous, unfeatured data with corresponding class labels, and then takes the approximate boundary points of the cluster as the candidate splitting-points of the continuous attribute. Based on this, the information gain ratio is calculated. Experimental results show that, the proposed K-C4.5 algorithm improves the classification accuracy of the decision tree in comparison with the traditional one.
机译:当传统的C4.5算法处理具有大量多维连续属性值的大数据时,它可能导致与相关离散化方法的低分类精度问题。本文提出了一种基于K均值算法的连续数据的新方法。该方法通过与相应的类标签组合,通过连续的未被曝光的数据组合生成数据群集,然后将群集的近似边界点作为连续属性的候选分裂点。基于此,计算信息增益比率。实验结果表明,拟议的K-C4.5算法改善了与传统的算法的分类准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号