首页> 中文期刊> 《计算机工程 》 >基于Map Reduce的序列模式挖掘算法

基于Map Reduce的序列模式挖掘算法

             

摘要

传统数据挖掘算法在处理海量数据集时计算能力有限.为解决该问题,提出一种基于Map Reduce的分布式序列模式挖掘算法MR-PrefixSpan.在PrefixSpan算法的基础上,对模式挖掘任务进行分割,利用Map函数处理由不同前缀得到的序列模式,并行构造投影数据库,从而提高挖掘效率及简化搜索空间.采用Reduce函数对中间结果进行规约,得到全局序列模式.在Hadoop集群上的实验结果表明,MR-PrefixSpan能减少数据库扫描时间,具有较高的并行加速比和较好的可扩展性.%Traditional data mining algorithm has computing power shortage in dealing with mass data set Aiming at the problem, a distributed sequential pattern mining algorithm based on Map Reduce programming model named MR-PrefixSpan is proposed. Mining tasks are decomposed to many, the Map function is used to mine each Prefix-projected sequential pattern, and the projected databases are constructed parallelly. It simplifies the search space and acquires a higher mining efficiency. Then the intermediate values are passed to a Reduce function which merges together all these values to produce a possibly smaller set of values. Experimental results on Hadoop cluster show that MR-PrefixSpan can reduce the time of scanning data base, has higher parallel speed up ratio and better expansibility.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号