首页> 外文会议>Agents and data mining interaction. >Data Cloud for Distributed Data Mining via Pipelined MapReduce
【24h】

Data Cloud for Distributed Data Mining via Pipelined MapReduce

机译:通过流水线MapReduce进行分布式数据挖掘的数据云

获取原文
获取原文并翻译 | 示例

摘要

Distributed data mining (DDM) which often utilizes autonomous agents is a process to extract globally interesting associations, classifiers, clusters, and other patterns from distributed data. As datasets double in size every year, moving the data repeatedly to distant CPUs brings about high communication cost. In this paper, data cloud is utilized to implement DDM in order to move the data rather than moving computation. MapReduce is a popular programming model for implementing data-centric distributed computing. Firstly, a kind of cloud system architecture for DDM is proposed. Secondly, a modified MapReduce framework called pipelined MapReduce is presented. We select Apriori as a case study and discuss its implementation within MapReduce framework. Several experiments are conducted at last. Experimental results show that with moderate number of map tasks, the execution time of DDM algorithms (i.e., Apriori) can be reduced remarkably. Performance comparison between traditional and our pipelined MapReduce has shown that the map task and reduce task in our pipelined MapReduce can run in a parallel manner, and our pipelined MapReduce greatly decreases the execution time of DDM algorithm. Data cloud is suitable for a multitude of DDM algorithms and can provide significant speedups.
机译:经常利用自治代理的分布式数据挖掘(DDM)是从分布式数据中提取全局感兴趣的关联,分类器,集群和其他模式的过程。随着数据集的规模每年增加一倍,将数据反复移动到遥远的CPU会带来很高的通信成本。在本文中,数据云用于实现DDM,以便移动数据而不是移动计算。 MapReduce是用于实现以数据为中心的分布式计算的流行编程模型。首先,提出了一种用于DDM的云系统架构。其次,提出了一种改进的MapReduce框架,称为流水线MapReduce。我们选择Apriori作为案例研究,并讨论其在MapReduce框架内的实现。最后进行了几次实验。实验结果表明,通过适度的映射任务,可以显着减少DDM算法(即Apriori)的执行时间。传统流水线MapReduce与流水线MapReduce的性能比较表明,流水线MapReduce中的map任务和reduce任务可以并行运行,而我们的流水线MapReduce大大减少了DDM算法的执行时间。数据云适用于多种DDM算法,并且可以显着提高速度。

著录项

  • 来源
    《Agents and data mining interaction.》|2011年|p.316-330|共15页
  • 会议地点 Taipei(CT);Taipei(CT)
  • 作者单位

    Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing, P.R. China;

    Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing, P.R. China;

    Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing, P.R. China;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 TP311.13;TP311.13;
  • 关键词

    distributed data mining (DDM); cloud computing; mapreduce; apriori; hadoop;

    机译:分布式数据挖掘(DDM);云计算;减少先验Hadoop;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号