基于数据集稀疏度的频繁项集挖掘算法性能分析

肖文; 胡娟

首页> 中文期刊>计算机应用 >基于数据集稀疏度的频繁项集挖掘算法性能分析

基于数据集稀疏度的频繁项集挖掘算法性能分析

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

频繁项集挖掘(FIM)是最基础的数据挖掘任务之一,被挖掘数据集的特征对FIM算法的性能有着显著影响.数据集稀疏度是体现数据集本质特征的属性之一,不同类型的FIM算法对数据集稀疏度的可扩展性有着很大的不同.针对如何量化度量数据集稀疏度及稀疏度对不同类型FIM算法性能影响等问题,首先回顾并讨论了已有的度量方法,然后提出两种新的量化度量数据集稀疏度的方法(基于事务差异度的度量方法和基于FP-Tree的度量方法).这两种度量方法均考虑了FIM任务背景下最小支持度对数据集稀疏度的影响,反映的是事务频繁项集之间的差异度.最后通过实验验证了不同类型FIM算法对数据集稀疏度的可扩展性.实验结果表明,数据集稀疏度与最小支持度成反比,基于垂直格式的FIM算法在三类典型FIM算法中具有最佳的稀疏度可扩展性.%Frequent Itemset Mining (FIM) is one of the most important data mining tasks.The characteristics of the mined datasets have a significant effect on the performance of FIM algorithms.Sparseness of datasets is one of the attributes that characterize the essential characteristics of datasets.Different types of FIM algorithms are very different in the scalability of dataset sparseness.Aiming at the measurement of sparseness of datasets and influence of sparsity on the performance of different types of FIM algorithms,the existing measurement methods were reviewed and discussed,then two methods were proposed to quantify the sparseness of the datasets:the measurement based on transaction difference and the measurement based on FP-Tree method,both of which considered the influence of the minimum support degree on the sparseness of the datasets in the background of the FIM task,and reflected the difference between the frequent itemsets of the transaction.The scalability of different types of FIM algorithms for sparseness of datasets was studied experimentally.The experimental results show that the sparseness of datasets is inversely proportional to the minimum support,and the FIM algorithm based on vertical format has the best scalability in three kinds of typical FIM algorithms.

著录项

来源
《计算机应用》|2018年第4期|995-1000|共6页
作者
肖文; 胡娟;
展开▼
作者单位

河海大学文天学院电气信息工程系,安徽马鞍山243031;

河海大学文天学院电气信息工程系,安徽马鞍山243031;

展开▼
原文格式 PDF
正文语种 chi
中图分类软件工程;
关键词
数据挖掘; 频繁项集挖掘; 稀疏度; 可扩展性;
入库时间 2023-07-24 18:51:01

相似文献

中文文献
外文文献
专利

1. 基于Spark框架的大数据局部频繁项集挖掘算法设计 [J] . 王黎 ,吕殿基 . 微型电脑应用 . 2021,第004期
2. 基于MapReduce的并行频繁项集挖掘算法研究 [J] . 刘卫明 ,张弛 ,毛伊敏 . 计算机应用研究 . 2021,第003期
3. 基于差异节点集的加权频繁项集挖掘算法 [J] . 王斌 ,房新秀 ,魏天佑 . 计算机工程 . 2020,第005期
4. 基于差异点集的频繁项集挖掘算法 [J] . 尹远 ,朱璐伟 ,文凯 . 计算机工程与设计 . 2020,第003期
5. 基于强化学习的大数据频繁项集挖掘算法 [J] . 肖坚 . 信息通信 . 2020,第006期
6. 基于SPARK的两阶段频繁项集挖掘算法 [C] . . 第33届中国数据库学术会议（NDBC2016 ） . 2016
7. 传统频繁项集和Top-rank-k频繁项集挖掘算法研究 [A] . 朱璐伟 . 2020

基于数据集稀疏度的频繁项集挖掘算法性能分析

摘要

著录项

相似文献

相关主题

期刊订阅