Mining diversified association rules in big datasets: A cluster/GPU/genetic approach

Djenouri Youcef; Belhadi Asma; Fournier-Viger Philippe; Fujita Hamido

首页> 外文期刊>Information Sciences: An International Journal >Mining diversified association rules in big datasets: A cluster/GPU/genetic approach

【24h】

Mining diversified association rules in big datasets: A cluster/GPU/genetic approach

机译：挖掘大数据集中多元化的关联规则：群集/ GPU /遗传方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Association rule mining is a popular data mining task, which has important in many domains. Because the task of association rule mining is very time consuming, evolutionary and swarm based algorithms have been designed to find approximate solutions. However, these approaches still have long execution times, especially when applied on dense and big databases, or when low minsup and minconf threshold values are used. Moreover, these approaches suffer from the lack of diversity in the rules presented to the user. To address these drawbacks of previous algorithms, this paper proposes an efficient parallel algorithm named CGPUGA. It is a genetic algorithm that runs on clusters of CPUs to efficiently discover diversified association rules. It benefits from cluster computing to generate rules. Then, to evaluate rules, which is the most time consuming task, the designed algorithm relies on the massively parallel GPU threads. Furthermore, to deal with the issue of rule quality, the search space of rules is partitioned into several regions assigned to different workers, and rules found by each workers are the merged to ensure diversification. The designed approach has been empirically compared with state-of-the-art algorithms using small, medium, large and big datasets. Results reveal that CGPUGA is 600 times faster than the sequential version of the algorithm for big datasets. Moreover, it outperforms state-ofthe-art high performance computing based association rule mining algorithms for real big datasets such as Pokec, Webdocs and Wikilinks. In terms of rule quality, results show that the designed CGPUGA algorithm provides rules of higher quality compared to the state-ofthe-art NIGGAR, MSP-MPSO and MPGA algorithms for diversified association rule mining. (C) 2018 Elsevier Inc. All rights reserved.

机译：关联规则挖掘是一个流行的数据挖掘任务，在许多域中都很重要。由于关联规则挖掘的任务是非常耗时的，因此设计的进化和群算法已经设计用于找到近似解决方案。然而，这些方法仍然具有长的执行时间，尤其是在应用于密集和大数据库时，或者使用低细泵和MINCONF阈值时。此外，这些方法遭受向用户提供的规则缺乏多样性。为了解决先前算法的这些缺点，本文提出了一个名为CGPuga的有效并行算法。它是一种遗传算法，在CPU的集群上运行，以有效地发现多样化的关联规则。它从集群计算中获益生成规则。然后，为了评估规则，这是最耗时的任务，所设计的算法依赖于大规模平行的GPU线程。此外，为了处理规则质量问题，规则的搜索空间被分配到分配给不同工人的几个区域，并且每个工人发现的规则是合并的，以确保多样化。设计的方法是与使用小型，中，大型和大型数据集的最先进的算法进行了经验。结果表明，CGPUGA比大数据集算法的顺序版本快600倍。此外，它优于基于最新的高性能计算的基于高性能计算的关联规则挖掘算法，例如Pokec，Webdocs和Wikilinks等真实大数据集。在规则质量方面，结果表明，与最先进的Niggar，MSP-MPSO和MPGA算法相比，设计的CGPuga算法提供了更高质量的规则，用于多样化关联规则挖掘。（c）2018年Elsevier Inc.保留所有权利。

著录项

来源
《Information Sciences: An International Journal》 |2018年第2018期|共18页
作者
Djenouri Youcef; Belhadi Asma; Fournier-Viger Philippe; Fujita Hamido;
展开▼
作者单位

Southern Denmark Univ IMADA Odense Denmark;

USTHB RIMA Algiers Algeria;

Harbin Inst Technol Shenzhen Sch Humanities &

Social Sci Shenzhen Peoples R China;

Iwate Prefectural Univ 152-52 Sugo Takizawa Iwate 0200193 Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;计算机的应用;信息与知识传播;自动化技术、计算机技术;
关键词
Association rule mining; GPU-based algorithm; Genetic algorithm; Cluster of GPUs;

机译：关联规则挖掘;基于GPU的算法;遗传算法;GPU簇;

相似文献

外文文献
中文文献
专利

1. Mining diversified association rules in big datasets: A cluster/GPU/genetic approach [J] . Djenouri Youcef, Belhadi Asma, Fournier-Viger Philippe, Information Sciences: An International Journal . 2018,第期

机译：挖掘大数据集中多元化的关联规则：群集/ GPU /遗传方法
2. An Association Rule Mining Approach to Discover lncRNAs Expression Patterns in Cancer Datasets [J] . Paolo Cremaschi, Roberta Carriero, Stefania Astrologo, BioMed research international . 2015,第29期

机译：一种关联规则挖掘方法，以发现癌数据集中的LNCRNA表达模式
3. Two-level parallel CPU/GPU-based genetic algorithm for association rule mining [J] . International Journal of Computational Science and Engineering . 2020,第2a3期

机译：基于双层并行CPU / GPU的关联规则挖掘遗传算法
4. Classification, Clustering and Association Rule Mining in Educational Datasets Using Data Mining Tools: A Case Study [C] . Sadiq Hussain, Rasha Atallah, Amirrudin Kamsin, Computer Science On-line Conference . 2018

机译：使用数据挖掘工具在教育数据集中分类，聚类和关联规则挖掘：一个案例研究
5. Efficient mining and maintenance of association rules in large datasets. [D] . Song, Yu. 2005

机译：在大型数据集中高效挖掘和维护关联规则。
6. An Association Rule Mining Approach to Discover lncRNAs Expression Patterns in Cancer Datasets [O] . Paolo Cremaschi, Roberta Carriero, Stefania Astrologo, -1

机译：在癌症数据集中发现lncRNAs表达模式的关联规则挖掘方法
7. A Regression-Based Approach for Improving the Association Rule Mining through Predicting the Number of Rules on General Datasets [O] . Dien Tuan Le, Fenghui Ren, Minjie Zhang 2013

机译：一种基于回归的方法，通过预测一般数据集的规则数来改进关联规则挖掘

Mining diversified association rules in big datasets: A cluster/GPU/genetic approach

摘要

著录项

相似文献

相关主题

期刊订阅