首页> 外文期刊>Journal of computational and theoretical nanoscience >An Algorithm of Association Rules Mining for Massive Data Based on Partly Disk Resident FP-TREE Exploiting B+ Tree Index
【24h】

An Algorithm of Association Rules Mining for Massive Data Based on Partly Disk Resident FP-TREE Exploiting B+ Tree Index

机译:基于部分磁盘驻留FP-Tree挖掘B +树索引的大规模数据关联规则挖掘算法

获取原文
获取原文并翻译 | 示例
       

摘要

As the size of the data set mined is increasing constantly, the size of the available memory has become one bottleneck when FP-GROWTH and similar algorithms are used for association rules mining. So it is necessary to tackle scalability by some new algorithms in order to mine associationrules from massive data with low user-specified support. Nowadays, the disk resident algorithms have become a main research target. Therefore, based on LIFO storage strategy and the thought of storing FP-TREE partly exploiting B+ tree index, a novel algorithm called DRBFP-MINE (disk residentB+ tree FP-TREE mine) is presented in this paper. This algorithm decreases memory occupation through storing FP-TREE partly based on optimized storage strategy and high efficient index technology. When the mining job fails because FP-GROWTH and similar algorithms use too much memory, DRBFP-MINEcan discover association rules from massive data successfully. In addition, DRBFP-MINE overcomes the low efficiency problem of DISK-MINE, which projects data set too early only because some FP-TREE can not be accommodated in the available memory. The verification experiments and performanceanalyses are presented on synthetic data set and real data set. The experimental results show that on the limited memory occasions, DRBFP-MINE is an effective disk resident association rules mining algorithm for massive data.
机译:随着所开采的数据集的大小不断增加,当FP-Grower和类似的算法用于关联规则挖掘时,可用内存的大小已成为一个瓶颈。因此,有必要通过一些新算法来解决可扩展性,以便从具有低用户指定支持的大规模数据挖掘协会。如今,磁盘驻留算法已成为主要的研究目标。因此,基于LIFO存储策略和存储FP-Tree部分利用B +树索引的思想,本文介绍了一种名为DRBFP-MINE(磁盘REAREDTB +树FP-Tree MIN)的新型算法。该算法通过基于优化的存储策略和高效指标技术,通过存储FP-Tree来减少存储器占用。当挖掘作业失败时,因为FP-Grower和类似的算法使用太多的内存,DrBFP-Minecan从大规模数据中成功发现关联规则。此外,DRBFP-Mine克服了磁盘 - 我的低效率问题,该磁盘挖掘了,该项目仅提早地投影了数据集,因为某些FP-Tree无法容纳在可用内存中。验证实验和性能是在合成数据集和实际数据集上的。实验结果表明,在有限的内存场合,DRBFP-MINE是一种有效的磁盘驻留关联规则挖掘算法,用于大规模数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号