首页> 外文会议>International Conference on Science in Information Technology >A Scalable Approach for Improving Implementation of a Frequent Pattern Mining Algorithm using MapReduce Programming
【24h】

A Scalable Approach for Improving Implementation of a Frequent Pattern Mining Algorithm using MapReduce Programming

机译:一种可扩展方法,用于改进MapReduce编程频繁模式挖掘算法的实现

获取原文

摘要

A Frequent pattern is a pattern (a set of items, subsequences, sub-graphs, etc.) that occurs frequently in a transactional database. Frequent pattern mining gives vast benefit in domains such as extracting knowledge from transactional data for market basket analysis or cross-marketing and selling. A number of important FIM (Frequent itemset mining) algorithms have been developed to speed up mining performance since its inception. Unfortunately, when the dataset size is massive, it can still be prohibitively expensive for communication cost, memory usage, balanced data distribution & I/O utilization. One of the existing frequent pattern mining algorithms called CATS Tree (Compressed and Arranged Sequences tree) can perform interactive mining by a single scan. In this work, we propose to parallelize a part of CATS-Tree algorithm on scattered machines, which will improve the overall performance of CATS-Tree for large transaction data. This algorithm partitions computation to execute an independent group of mining tasks on each machine. We present a comparison based on time complexity, algorithm complexity and performance on a different type of datasets. The result shows that the proposed parallel implementation of CATS-Tree provides better performance for massive datasets.
机译:频繁模式是在事务数据库中经常发生的模式(一组项目,子序列,子图形等)。频繁的模式挖掘在域中提供了巨大的利益,例如从交易数据中提取知识,以进行市场篮子分析或跨营销和销售。已经开发了许多重要的FIM(频繁的项目集挖掘)算法以加速自成立以来的采矿性能。遗憾的是,当数据集大小是大量的时,它仍然可以对通信成本,内存使用,平衡数据分发和I / O利用率进行预付昂贵。名为CATS树(压缩和排列的序列树)的现有频繁模式挖掘算法之一可以通过单次扫描执行交互挖掘。在这项工作中,我们建议将散射机上的猫树算法的一部分并行化,这将提高猫树的整体性能进行大型交易数据。此算法分区计算以在每台计算机上执行独立的挖掘任务组。我们在不同类型的数据集中基于时间复杂性,算法复杂性和性能的时间复杂性和性能进行了比较。结果表明,CATS-Tree的建议并行实现为大规模数据集提供了更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号