首页> 外文期刊>Information Sciences: An International Journal >Mining diversified association rules in big datasets: A cluster/GPU/genetic approach
【24h】

Mining diversified association rules in big datasets: A cluster/GPU/genetic approach

机译:挖掘大数据集中多元化的关联规则:群集/ GPU /遗传方法

获取原文
获取原文并翻译 | 示例
           

摘要

Association rule mining is a popular data mining task, which has important in many domains. Because the task of association rule mining is very time consuming, evolutionary and swarm based algorithms have been designed to find approximate solutions. However, these approaches still have long execution times, especially when applied on dense and big databases, or when low minsup and minconf threshold values are used. Moreover, these approaches suffer from the lack of diversity in the rules presented to the user. To address these drawbacks of previous algorithms, this paper proposes an efficient parallel algorithm named CGPUGA. It is a genetic algorithm that runs on clusters of CPUs to efficiently discover diversified association rules. It benefits from cluster computing to generate rules. Then, to evaluate rules, which is the most time consuming task, the designed algorithm relies on the massively parallel GPU threads. Furthermore, to deal with the issue of rule quality, the search space of rules is partitioned into several regions assigned to different workers, and rules found by each workers are the merged to ensure diversification. The designed approach has been empirically compared with state-of-the-art algorithms using small, medium, large and big datasets. Results reveal that CGPUGA is 600 times faster than the sequential version of the algorithm for big datasets. Moreover, it outperforms state-ofthe-art high performance computing based association rule mining algorithms for real big datasets such as Pokec, Webdocs and Wikilinks. In terms of rule quality, results show that the designed CGPUGA algorithm provides rules of higher quality compared to the state-ofthe-art NIGGAR, MSP-MPSO and MPGA algorithms for diversified association rule mining. (C) 2018 Elsevier Inc. All rights reserved.
机译:关联规则挖掘是一个流行的数据挖掘任务,在许多域中都很重要。由于关联规则挖掘的任务是非常耗时的,因此设计的进化和群算法已经设计用于找到近似解决方案。然而,这些方法仍然具有长的执行时间,尤其是在应用于密集和大数据库时,或者使用低细泵和MINCONF阈值时。此外,这些方法遭受向用户提供的规则缺乏多样性。为了解决先前算法的这些缺点,本文提出了一个名为CGPuga的有效并行算法。它是一种遗传算法,在CPU的集群上运行,以有效地发现多样化的关联规则。它从集群计算中获益生成规则。然后,为了评估规则,这是最耗时的任务,所设计的算法依赖于大规模平行的GPU线程。此外,为了处理规则质量问题,规则的搜索空间被分配到分配给不同工人的几个区域,并且每个工人发现的规则是合并的,以确保多样化。设计的方法是与使用小型,中,大型和大型数据集的最先进的算法进行了经验。结果表明,CGPUGA比大数据集算法的顺序版本快600倍。此外,它优于基于最新的高性能计算的基于高性能计算的关联规则挖掘算法,例如Pokec,Webdocs和Wikilinks等真实大数据集。在规则质量方面,结果表明,与最先进的Niggar,MSP-MPSO和MPGA算法相比,设计的CGPuga算法提供了更高质量的规则,用于多样化关联规则挖掘。 (c)2018年Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号