首页> 外文期刊>Concurrency and computation: practice and experience >A parallel C4.5 decision tree algorithm based on MapReduce
【24h】

A parallel C4.5 decision tree algorithm based on MapReduce

机译:基于MapReduce的并行C4.5决策树算法

获取原文
获取原文并翻译 | 示例

摘要

In the supervised classification, large training data are very common,and decision trees arewidelyrnused. However, as some bottlenecks such as memory restrictions, time complexity, or data complexity,rnmany supervised classifiers including classical C4.5 tree cannot directly handle big data.rnOne solution for this problem is to design a highly parallelized learning algorithm. Motivated byrnthis,we propose a parallelized C4.5 decision tree algorithm based on MapReduce (MR-C4.5-Tree)rnwith 2 parallelized methods to build the tree nodes. First, an information entropy-based parallelizedrnattribute selection method (MR-A-S) on several subsets for MR-C4.5-Tree is proposed tornconfirm the best splitting attribute and the cut points. Then, a data splitting method (MR-D-S)rnin parallel is presented to partition the training data into subsets. At last, we introduce thernMR-C4.5-Tree learning algorithm that grows ina top-down recursiveway.Besides, thedepth of thernconstructed decision tree, the number of samples and the maximal class probability in each treernnode are used as the termination conditions to avoid the over-partitioning problem. Experimentalrnstudies show the feasibility and the good performance of the proposed parallelized MR-C4.5-Treernalgorithm.
机译:在监督分类中,大型训练数据非常普遍,决策树被广泛使用。但是,由于内存限制,时间复杂度或数据复杂度等瓶颈,包括经典C4.5树在内的许多监督分类器无法直接处理大数据。解决此问题的一种方法是设计一种高度并行化的学习算法。为此,我们提出了一种基于MapReduce的并行C4.5决策树算法(MR-C4.5-Tree),并提出了两种并行化方法来构建树节点。首先,针对MR-C4.5-树的几个子集,提出了一种基于信息熵的并行属性选择方法(MR-A-S),以确认最佳分割属性和切点。然后,提出了一种并行的数据拆分方法(MR-D-S),将训练数据划分为子集。最后,我们介绍了一种以自顶向下的递归方式增长的MR-C4.5-Tree学习算法。此外,将决策树的深度,每个树节点中的样本数和最大类概率作为避免的终止条件。过度分配问题。实验研究表明,该并行化MR-C4.5-Treern算法的可行性和良好的性能。

著录项

  • 来源
  • 作者单位

    School of Control Science and Engineering,Faculty of Electronic Information and ElectricalEngineering, Dalian University of Technology;

    School of Control Science and Engineering,Faculty of Electronic Information and ElectricalEngineering, Dalian University of Technology;

    School of Control Science and Engineering,Faculty of Electronic Information and ElectricalEngineering, Dalian University of Technology;

    School of Control Science and Engineering,Faculty of Electronic Information and ElectricalEngineering, Dalian University of Technology;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    C4.5; decision trees; MapReduce; parallel computing;

    机译:C4.5;决策树;MapReduce;并行计算;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号