首页> 外文会议>International Congress on Ultra Modern Telecommunications and Control Systems and Workshops >Explore maximal frequent itemsets for big data pre-processing based on small sample in cloud computing
【24h】

Explore maximal frequent itemsets for big data pre-processing based on small sample in cloud computing

机译:探索基于小样本的云计算中用于大数据预处理的最大频繁项集

获取原文

摘要

The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.
机译:基于云计算的大数据处理中的数据挖掘问题已经成为研究的热点。通常,大多数以前的工作都是通过现有的挖掘方法直接分析数据,这可能会导致诸如冗余计算,高时间复杂度和大存储空间之类的问题。基于此论点,提出了一种新颖的启发式方法,称为PASS(基于小样本的预处理),用于查找由大数据预处理中最频繁的交易组成的小样本。通过利用云计算可以解决分布式环境中数据挖掘的瓶颈,PASS直接在交易数据库上进行操作,并根据不同维度对所有交易进行分组。通过使用位图排序,可以从每个事务集中筛选出最频繁的事务。最后,通过汇总每个交易集的所有交易对象来获得最佳交易集。实验结果表明,PASS显着避免了由于联接操作而产生大量候选集,加速了最大频繁项集挖掘,节省了存储空间,并同时提高了资源利用率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号