首页> 中文期刊>计算机与数字工程 >基于 Map/Reduce 的决策树分类挖掘方法应用研究

基于 Map/Reduce 的决策树分类挖掘方法应用研究

     

摘要

传统数据挖掘模式在处理海量、多维、复杂等特征的数据时,存在计算能力弱、效率低、可扩展性差等问题。论文提出基于 Map/Reduce 的决策树分类挖掘方法(C4.5BH 算法),该算法采用 K-means 聚类方法对连续属性进行离散化,并利用 Map/Reduce 编程模型和属性表结构实现了决策树构造过程中属性的并行计算和节点的并行分裂。实验证明,与传统的 C4.5算法相比,C4.5BH 算法在处理大规模数据集时具有更高的执行效率和良好的加速比。%The traditional data mining model is weak in computing power ,low efficiency and poor scalability when deal-ing with the data of massive ,multi-dimensional and complex characteristics .This paper proposes a mining method (C4 .5BH algorithm) based on Map/Reduce the decision tree classification ,which uses the K-means clustering method to discretize the continuous attributes and the Map/Reduce programming model and attribute table structure to achieve the parallel computa-tion of the attributes and the parallel splitting of nodes in the process of constructing decision tree .Experiments show that C4 .5BH algorithm has a higher efficiency and a better speedup when dealing with large data sets ,compared with the tradi-tional C4 .5 algorithm .

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号