首页> 外文会议>International conference on algorithms and architectures for parallel processing >A MapReduce Reinforced Distributed Sequential Pattern Mining Algorithm
【24h】

A MapReduce Reinforced Distributed Sequential Pattern Mining Algorithm

机译:MapReduce增强分布式顺序模式挖掘算法

获取原文

摘要

Redesign and reimplementation of traditional sequential pattern mining algorithms on distributed computing frameworks are essential for dealing with big data. Along the way, the critical issue is how to minimize the communication overhead of the distributed sequential pattern mining algorithm and maximize its execution efficiency by balancing the workload of distributed computing resources. To address such an issue, this paper proposes a MapReduce reinforced distributed sequential pattern mining algorithm DGSP (Distributed GSP algorithm based on MapReduce), which consists of two MapReduce jobs. The "two-jobs" structure of DGSP can effectively reduce the communication overhead of the distributed sequential pattern mining algorithm. DGSP also enables optimizing the workload balance and the execution efficiency of distributed sequential pattern mining by evenly partitioning the database and assigning the fragments to Map workers. Experimental results indicate that DGSP can significantly improve the overall performance, scalability and fault tolerance of sequential pattern mining on big data.
机译:在分布式计算框架上重新设计和重新实现传统的顺序模式挖掘算法对于处理大数据至关重要。一直以来,关键问题是如何通过平衡分布式计算资源的工作量来最小化分布式顺序模式挖掘算法的通信开销并最大化其执行效率。为了解决这个问题,本文提出了一种MapReduce增强型分布式顺序模式挖掘算法DGSP(基于MapReduce的分布式GSP算法),该算法由两个MapReduce作业组成。 DGSP的“双重”结构可以有效减少分布式顺序模式挖掘算法的通信开销。 DGSP还可以通过均匀地划分数据库并将片段分配给Map worker来优化工作负载平衡和分布式顺序模式挖掘的执行效率。实验结果表明,DGSP可以显着提高大数据上顺序模式挖掘的整体性能,可伸缩性和容错能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号