...
首页> 外文期刊>Cluster computing >BIGMiner: a fast and scalable distributed frequent pattern miner for big data
【24h】

BIGMiner: a fast and scalable distributed frequent pattern miner for big data

机译:BIGMINER:用于大数据的快速和可扩展的分布式频繁模式矿器

获取原文
获取原文并翻译 | 示例
           

摘要

Frequent itemset mining is widely used as a fundamental data mining technique. Recently, there have been proposed a number of MapReduce-based frequent itemset mining methods in order to overcome the limits on data size and speed of mining that sequential mining methods have. However, the existing MapReduce-based methods still do not have a good scalability due to high workload skewness, large intermediate data, and large network communication overhead. In this paper, we propose BIGMiner , a fast and scalable MapReduce-based frequent itemset mining method. BIGMiner generates equal-sized sub-databases called transaction chunks and performs support counting only based on transaction chunks and bitwise operations without generating and shuffling intermediate data. As a result, BIGMiner achieves very high scalability due to no workload skewness, no intermediate data, and small network communication overhead. Through extensive experiments using large-scale datasets of up to 6.5 billion transactions, we have shown that BIGMiner consistently and significantly outperforms the state-of-the-art methods without any memory problems.
机译:频繁的项目集挖掘被广泛用作基本数据挖掘技术。最近,已经提出了许多基于MapReduce的频繁项目组挖掘方法,以克服顺序采矿方法具有的数据大小和速度的限制。然而,由于高工作负载偏斜,大的中间数据和大型网络通信开销,现有的基于MapReduce的方法仍然没有良好的可扩展性。在本文中,我们提出了BigMiner,一种快速且可扩展的MapReduce的频繁项目集挖掘方法。 BigMiner生成称为事务块的平等大小的子数据库,并且仅基于事务块和BitWise操作执行支持计数而不生成和混洗中间数据。因此,Bigminer由于没有工作负载偏差,没有中间数据和小网络通信开销而实现了非常高的可扩展性。通过使用高达65亿交易的大规模数据集的广泛实验,我们表明Bigminer一直在且显着优于最先进的方法而没有任何内存问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号