Effect of Data Skewness in Parallel Mining of Association Rules

机译：数据偏度在关联规则并行挖掘中的作用

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

An efficient parallel algorithm FPM(Fast Parallel Mining) for mining association rules on a shared-nothing parallel system has been proposed. It adopts the count distribution approach and has incorporated two powerful candidate pruning techniques, i.e., distributed pruning and global pruning. It has a simple communication scheme which performs only one round of message exchange in each iteration. We found that the two pruning techniques are very sensitive to data skewness, which describes the degree of non-uniformity of the itemset distribution among the database partitions. Distributed pruning is very effective when data skewness is high. Global pruning is more effective than distributed pruning even for the mild data skewness case. We have implemented the algorithm on an IBM SP2 parallel machine. The performance studies confirm our observation on the relationship between the effectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup.

机译：提出了一种在无共享并行系统上挖掘关联规则的高效并行算法FPM（快速并行挖掘）。它采用计数分布方法，并结合了两种强大的候选修剪技术，即分布式修剪和全局修剪。它具有简单的通信方案，该方案在每次迭代中仅执行一轮消息交换。我们发现这两种修剪技术对数据偏斜非常敏感，数据偏斜描述了数据库分区之间项目集分布的不均匀程度。当数据偏度很高时，分布式修剪非常有效。即使对于轻微的数据偏斜情况，全局修剪也比分布式修剪更有效。我们已经在IBM SP2并行计算机上实现了该算法。性能研究证实了我们对两种修剪技术的有效性与数据偏度之间关系的观察。它还显示FPM始终优于CD（计数分布），这是流行的Apriori算法的并行版本。此外，FPM具有加速，扩展和放大的良好并行性。

著录项

来源
《Research and development in knowledge discovery and data mining》|1998年|p.48-60|共13页
会议地点 Melbourne(AU);Melbourne(AU)
作者
David W. Cheung; Yongqiao Xiao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
Association Rules; Data Minng; Data Skewness; Parallel Computing;

机译：关联规则；数据挖掘;数据偏度；并行计算;

相似文献

外文文献
中文文献
专利

1. An Optimized Distributed Association Rule Mining Algorithm in Parallel and Distributed Data Mining with XML Data for Improved Response Time [J] . Sujni Paul International Journal of Computer Science & Information Technology (IJCSIT) . 2010,第2期

机译：XML数据并行和分布式数据挖掘中的优化分布式关联规则挖掘算法，可提高响应时间
2. Mining Association Rules from No-SQL data bases using Map-Reduce Fuzzy Association Rule Mining Algorithm [J] . Chatakunta Praveen Kumar, Pole Anjaiah, Santosh Patil, International Journal of Applied Engineering Research . 2017,第21aPta1期

机译：使用地图减少模糊关联规则挖掘算法来自No-SQL数据基础的挖掘关联规则
3. A sparse memory allocation data structure for sequential and parallel association rule mining [J] . Soysal Oemer M., Gupta Eera, Donepudi Harisha Journal of supercomputing . 2016,第2期

机译：用于顺序和并行关联规则挖掘的稀疏内存分配数据结构
4. Effect of Data Skewness in Parallel Mining of Association Rules [C] . David W. Cheung, Yongqiao Xiao Pacific-Asia conference on research and development in knowledge discovery and data mining . 1998

机译：数据偏斜在关联规则并行挖掘的影响
5. Cfph-growth tree: A data structure for mining association rules with skewed support distribution. [D] . Al-Ghamedy, Fatemah. 2013

机译：Cfph-growth树：一种用于挖掘具有倾斜支持分布的关联规则的数据结构。
6. Development and validation of data quality rules in administrative health data using association rule mining [O] . Mingkai Peng, Sangmin Lee, Adam G. D’Souza, 2020

机译：使用关联规则挖掘来开发和验证行政健康数据中的数据质量规则
7. Effect of Data Skewness in Parallel Mining of Association Rules [O] . David W. Cheung, Yongqiao Xiao 1998

机译：数据偏度在关联规则并行挖掘中的作用
8. Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors [R] . Zaki, M. J., Ogihara, M., Parthasarathy, S., 1996

机译：共享内存多处理器关联规则的并行数据挖掘

Effect of Data Skewness in Parallel Mining of Association Rules

摘要

著录项

相似文献

相关主题

期刊订阅