BIGMiner: a fast and scalable distributed frequent pattern miner for big data

Kang-Wook Chon; Min-Soo Kim

首页> 外文期刊>Cluster computing >BIGMiner: a fast and scalable distributed frequent pattern miner for big data

【24h】

BIGMiner: a fast and scalable distributed frequent pattern miner for big data

机译：BIGMINER：用于大数据的快速和可扩展的分布式频繁模式矿器

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Frequent itemset mining is widely used as a fundamental data mining technique. Recently, there have been proposed a number of MapReduce-based frequent itemset mining methods in order to overcome the limits on data size and speed of mining that sequential mining methods have. However, the existing MapReduce-based methods still do not have a good scalability due to high workload skewness, large intermediate data, and large network communication overhead. In this paper, we propose BIGMiner , a fast and scalable MapReduce-based frequent itemset mining method. BIGMiner generates equal-sized sub-databases called transaction chunks and performs support counting only based on transaction chunks and bitwise operations without generating and shuffling intermediate data. As a result, BIGMiner achieves very high scalability due to no workload skewness, no intermediate data, and small network communication overhead. Through extensive experiments using large-scale datasets of up to 6.5 billion transactions, we have shown that BIGMiner consistently and significantly outperforms the state-of-the-art methods without any memory problems.

机译：频繁的项目集挖掘被广泛用作基本数据挖掘技术。最近，已经提出了许多基于MapReduce的频繁项目组挖掘方法，以克服顺序采矿方法具有的数据大小和速度的限制。然而，由于高工作负载偏斜，大的中间数据和大型网络通信开销，现有的基于MapReduce的方法仍然没有良好的可扩展性。在本文中，我们提出了BigMiner，一种快速且可扩展的MapReduce的频繁项目集挖掘方法。 BigMiner生成称为事务块的平等大小的子数据库，并且仅基于事务块和BitWise操作执行支持计数而不生成和混洗中间数据。因此，Bigminer由于没有工作负载偏差，没有中间数据和小网络通信开销而实现了非常高的可扩展性。通过使用高达65亿交易的大规模数据集的广泛实验，我们表明Bigminer一直在且显着优于最先进的方法而没有任何内存问题。

著录项

来源
《Cluster computing》 |2018年第3期|共14页
作者
Kang-Wook Chon; Min-Soo Kim;
展开▼
作者单位

Department of Information and Communication Engineering Daegu Gyeongbuk Institute of Science &

Technology (DGIST);

Department of Information and Communication Engineering Daegu Gyeongbuk Institute of Science &

Technology (DGIST);

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类分子生物学;
关键词
Frequent pattern mining; Big data; Scalable algorithm; Distributed algorithm; MapReduce;

机译：频繁的模式挖掘;大数据;可扩展算法;分布式算法;mapreduce;

相似文献

外文文献
中文文献
专利

1. BIGMiner: a fast and scalable distributed frequent pattern miner for big data [J] . Kang-Wook Chon, Min-Soo Kim Cluster computing . 2018,第3期

机译：BIGMINER：用于大数据的快速和可扩展的分布式频繁模式矿器
2. A fast and distributed algorithm for mining frequent patterns in congested networks [J] . Lin Kawuu W., Chung Sheng-Hao, Lin Chun-Cheng Computing . 2016,第3期

机译：一种用于挖掘拥塞网络中频繁模式的快速分布式算法
3. Determining the appropriate number of nodes for fast mining of frequent patterns in distributed computing environments [J] . Wei-Tee Lin, Chih-Ping Chu Parallel Algorithms and Applications . 2015,第5a6期

机译：确定适当数量的节点以快速挖掘分布式计算环境中的频繁模式
4. SSDMiner: A Scalable and Fast Disk-Based Frequent Pattern Miner [C] . Kang-Wook Chon, Min-Soo Kim International conference on emerging databases . 2018

机译：SSDminer：可扩展和基于磁盘的频繁模式矿器
5. Generalized distributed hardware architecture for fast pattern search in large databases. [D] . Goswami, Kuldeep. 2010

机译：用于大型数据库中快速模式搜索的通用分布式硬件体系结构。
6. Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns [O] . Shaoming Pan, Yongkai Li, Zhengquan Xu, -1

机译：基于数据访问模式的地理空间图像数据分布式存储算法
7. A Distributed Algorithm for Fast Mining Frequent Patterns in Limited and Varying Network Bandwidth Environments [O] . Chun-Cheng Lin, Wei-Ching Li, Ju-Chin Chen, 2019

机译：一种分布式算法，用于快速挖掘有限和不同网络带宽环境中的频繁模式

BIGMiner: a fast and scalable distributed frequent pattern miner for big data

摘要

著录项

相似文献

相关主题

期刊订阅