首页> 中文期刊>计算机应用 >基于数据集稀疏度的频繁项集挖掘算法性能分析

基于数据集稀疏度的频繁项集挖掘算法性能分析

     

摘要

频繁项集挖掘(FIM)是最基础的数据挖掘任务之一,被挖掘数据集的特征对FIM算法的性能有着显著影响.数据集稀疏度是体现数据集本质特征的属性之一,不同类型的FIM算法对数据集稀疏度的可扩展性有着很大的不同.针对如何量化度量数据集稀疏度及稀疏度对不同类型FIM算法性能影响等问题,首先回顾并讨论了已有的度量方法,然后提出两种新的量化度量数据集稀疏度的方法(基于事务差异度的度量方法和基于FP-Tree的度量方法).这两种度量方法均考虑了FIM任务背景下最小支持度对数据集稀疏度的影响,反映的是事务频繁项集之间的差异度.最后通过实验验证了不同类型FIM算法对数据集稀疏度的可扩展性.实验结果表明,数据集稀疏度与最小支持度成反比,基于垂直格式的FIM算法在三类典型FIM算法中具有最佳的稀疏度可扩展性.%Frequent Itemset Mining (FIM) is one of the most important data mining tasks.The characteristics of the mined datasets have a significant effect on the performance of FIM algorithms.Sparseness of datasets is one of the attributes that characterize the essential characteristics of datasets.Different types of FIM algorithms are very different in the scalability of dataset sparseness.Aiming at the measurement of sparseness of datasets and influence of sparsity on the performance of different types of FIM algorithms,the existing measurement methods were reviewed and discussed,then two methods were proposed to quantify the sparseness of the datasets:the measurement based on transaction difference and the measurement based on FP-Tree method,both of which considered the influence of the minimum support degree on the sparseness of the datasets in the background of the FIM task,and reflected the difference between the frequent itemsets of the transaction.The scalability of different types of FIM algorithms for sparseness of datasets was studied experimentally.The experimental results show that the sparseness of datasets is inversely proportional to the minimum support,and the FIM algorithm based on vertical format has the best scalability in three kinds of typical FIM algorithms.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号