首页> 中文期刊> 《计算机工程与设计》 >基于Hadoop的改进决策树剪枝算法

基于Hadoop的改进决策树剪枝算法

         

摘要

针对当前决策树剪枝算法较少考虑训练集嘈杂度对模型的影响,以及传统驻留内存分类算法处理海量数据困难的问题,提出一种基于 Hadoop平台的不确定概率误差剪枝算法(IEP),并将其应用在C4.5算法中。在剪枝时,认为用于建树的训练集是嘈杂的,通过将基于不确定概率误差分类数作为剪枝选择依据,减少训练集不可靠对模型的影响。在 Ha-doop平台下,通过将C4.5-IEP算法以文件分裂的方式进行 MapReduce程序设计,增强处理大规模数据的能力,具有较好的可扩展性。%Concerning that current decision tree pruning algorithms seldom consider the influence of the level of noise in the training set on the model,and traditional algorithms of resident memory have difficulty on processing massive data,an imprecise probability error pruning algorithm named IEP was proposed based on Hadoop and applied in C4.5 algorithm.When pruning,IEP algorithm considered that the training set used to design decision trees is noisy,and the error classified number based on imprecise probabi-lity was used as a foundation of pruning to reduce the influence of the noisy data on the model.C4.5-IEP implemented on Hadoop by MapReduce programming based on file split enhanced the ability of dealing with massive data and improved the algorithm’s extendibility.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号