首页> 外文会议>Machine Learning and Data Mining in Pattern Recognition(MLDM 2007); 20070718-20; Leipzig(DE) >Distributed and Shared Memory Algorithm for Parallel Mining of Association Rules
【24h】

Distributed and Shared Memory Algorithm for Parallel Mining of Association Rules

机译:关联规则并行挖掘的分布式共享内存算法

获取原文
获取原文并翻译 | 示例

摘要

The search for frequent patterns in transactional databases is considered one of the most important data mining problems. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the dataset to determine the set of frequent itemsets, thus implying high I/O overhead. In the parallel case, most algorithms perform a sum-reduction at the end of each pass to construct the global counts, also implying high synchronization cost. We present a novel algorithm that exploits efficiently the trade-offs between computation, communication, memory usage and synchronization. The algorithm was implemented over a cluster of SMP nodes combining distributed and shared memory paradigms. This paper presents the results of our algorithm on different data sizes experimented on different numbers of processors, and studies the effect of these variations on the overall performance.
机译:在事务数据库中搜索频繁模式被认为是最重要的数据挖掘问题之一。在文献中已经提出了几种并行和顺序算法来解决这个问题。几乎所有这些算法都重复遍历数据集以确定频繁项集的集合,因此意味着高I / O开销。在并行情况下,大多数算法在每个遍的末尾执行求和减少以构造全局计数,这也意味着高同步成本。我们提出了一种新颖的算法,可以有效利用计算,通信,内存使用和同步之间的折衷。该算法是在SMP节点的群集上实现的,该群集结合了分布式和共享内存范例。本文介绍了我们的算法在不同数量的处理器上实验的不同数据大小的结果,并研究了这些变化对整体性能的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号