首页> 外文会议>IEEE International Conference on Data Mining Workshops >Accelerating Frequent Itemsets Mining on the Cloud: A MapReduce -Based Approach
【24h】

Accelerating Frequent Itemsets Mining on the Cloud: A MapReduce -Based Approach

机译:加速在云上的频繁项集挖掘:一种基于MapReduce的方法

获取原文

摘要

Frequent pattern mining has a critical role in mining associations, sequential patterns, correlations, causality, episodes, multidimensional patterns, emerging patterns, and many other significant data mining tasks. With the exponential growth of available data, most of the traditional frequent pattern mining algorithms become ineffective due to either huge resource requirements or large communications overhead. Cloud computing has proved that processing very large datasets over commodity clusters can be performed by providing the right programming model. As a parallel programming model, MapReduce, one of most important techniques for cloud computing, has emerged in the mining of datasets of terabyte scale or larger on clusters of computers. Converting a serial mining algorithm into a distributed algorithm on the MapReduce framework is not necessarily difficult, but the mining performance can be unsatisfactory. In this paper, we propose a method which finds all frequent item sets by using just two MapReduce phases in a time and communication efficient manner. We demonstrate experimental results to corroborate our theoretical claims.
机译:频繁模式挖掘在挖掘关联,顺序模式,相关性,因果关系,情节,多维模式,新兴模式以及许多其他重要数据挖掘任务中起着至关重要的作用。随着可用数据的指数增长,大多数传统的频繁模式挖掘算法由于巨大的资源需求或巨大的通信开销而变得无效。云计算已经证明,通过提供正确的编程模型,可以在商品集群上处理非常大的数据集。作为一种并行编程模型,MapReduce是云计算中最重要的技术之一,它已经在计算机集群中挖掘了TB级或更大容量的数据集。在MapReduce框架上将串行挖掘算法转换为分布式算法并不一定很困难,但是挖掘性能可能无法令人满意。在本文中,我们提出了一种方法,该方法通过仅使用两个MapReduce阶段以时间和通信有效的方式来查找所有频繁项目集。我们展示了实验结果,以证实我们的理论主张。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号