首页> 外文期刊>Indian Journal of Science and Technology >An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce
【24h】

An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce

机译:一种高效的基于FP-生长的关联规则挖掘算法使用Hadoop MakReduce

获取原文
       

摘要

Objectives: To achieve improved performance of FP-Growth based Association Rule Mining algorithm for massive data by effective utilization of storage,execution capability and improved partition technique within the Hadoop MapReduce framework. Methodology: The proposed methodology has four main phases: In the first phase, the item sets for finding the frequent pattern are encoded and thus minimizes the expensive operation for large data set. In the second phase, improved hash partitioning reduces the network overhead and improves the communication speed within the MapReduce phase for each item set. The effective usage of network bandwidth and storage is obtained by the impact of compression in the third phase. The use of combiner in final phase for frequent item set mining minimizes the overhead of reduce phase by finding the pattern in each partition and minimizes the overall execution time of the FP-Growth algorithm. Findings: FP-Growth based association rule mining algorithm is designed for parallel execution on distributed cluster of servers. Changes to the MapReduce implementation of FP-Growth with the impact of encoding. Improved hash partitioning, compression and configuration results in a significant performance gain with better improvement in execution time.Novelty/Improvements: According to the experimental results, the changes in storage and processing level within the MapReduce framework improves the overall performance of the parallel frequent item set mining in Hadoop cluster.
机译:目标:通过有效利用Hadoop MapReduce框架内的存储,执行能力和改进的分区技术来实现基于FP-Growce Culity挖掘算法的改进性能。方法论:所提出的方法有四个主要阶段:在第一阶段,用于查找频繁模式的项目集被编码,从而最大限度地减少了大数据集的昂贵操作。在第二阶段中,改进的散列分区减少了网络开销,并提高了每个项目集的Mapreduce阶段内的通信速度。通过压缩在第三阶段的影响获得了网络带宽和存储的有效使用。在频繁项目集挖掘的最终阶段中使用组合器通过在每个分区中查找模式来最小化降低阶段的开销,并最大限度地减少FP-Grows算法的总执行时间。调查结果:FP-Growce基础的关联规则挖掘算法旨在用于Servered Server的分布式群集上的并行执行。通过编码的影响,对MAPREDUCE实施FP-Grower的实施。改进的散列分区,压缩和配置导致显着的性能增益,执行时间更好地提高.Novelty /改进:根据实验结果,MapReduce框架内的存储和处理级别的变化提高了并行频繁项目的整体性能在Hadoop集群中设置挖掘。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号