首页> 外文会议>Research and development in knowledge discovery and data mining >Effect of Data Skewness in Parallel Mining of Association Rules
【24h】

Effect of Data Skewness in Parallel Mining of Association Rules

机译:数据偏度在关联规则并行挖掘中的作用

获取原文
获取原文并翻译 | 示例

摘要

An efficient parallel algorithm FPM(Fast Parallel Mining) for mining association rules on a shared-nothing parallel system has been proposed. It adopts the count distribution approach and has incorporated two powerful candidate pruning techniques, i.e., distributed pruning and global pruning. It has a simple communication scheme which performs only one round of message exchange in each iteration. We found that the two pruning techniques are very sensitive to data skewness, which describes the degree of non-uniformity of the itemset distribution among the database partitions. Distributed pruning is very effective when data skewness is high. Global pruning is more effective than distributed pruning even for the mild data skewness case. We have implemented the algorithm on an IBM SP2 parallel machine. The performance studies confirm our observation on the relationship between the effectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup.
机译:提出了一种在无共享并行系统上挖掘关联规则的高效并行算法FPM(快速并行挖掘)。它采用计数分布方法,并结合了两种强大的候选修剪技术,即分布式修剪和全局修剪。它具有简单的通信方案,该方案在每次迭代中仅执行一轮消息交换。我们发现这两种修剪技术对数据偏斜非常敏感,数据偏斜描述了数据库分区之间项目集分布的不均匀程度。当数据偏度很高时,分布式修剪非常有效。即使对于轻微的数据偏斜情况,全局修剪也比分布式修剪更有效。我们已经在IBM SP2并行计算机上实现了该算法。性能研究证实了我们对两种修剪技术的有效性与数据偏度之间关系的观察。它还显示FPM始终优于CD(计数分布),这是流行的Apriori算法的并行版本。此外,FPM具有加速,扩展和放大的良好并行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号