首页> 中文期刊>计算机应用研究 >基于 MapReduce 的序列模式挖掘算法

基于 MapReduce 的序列模式挖掘算法

     

摘要

For the disadvantages that traditional GSP algorithm need to scan the database repeatedly and the I /O overhead is huge,this paper proposed a sequential pattern mining algorithm MR-GSP(GSP algorithm based on MapReduce)based on Map-Reduce programming framework.The MR-GSP algorithm divided the original sequence database into some sub-sequence data-bases and distributed them to Map workers,Map function scanned sub-sequence databases stored in memory to generate partial sequence patterns.Reduce function merged all partial sequence patterns and scanned the original sequence database to calcu-late the support of partial sequence patterns and gained the final sequence patterns.Compared with traditional GSP algorithm, the MR-GSP algorithm gained all sequential patterns by scanning the original database just twice.Experimental results show that the MR-GSP algorithm can take advantages of cloud computing technology to improve the efficiency of sequential pattern mining in big data.%针对传统 GSP 算法需要多次扫描数据库、I /O 开销巨大的缺点,提出了一种基于 MapReduce 编程框架的序列模式挖掘算法 MR-GSP(GSP algorithm based on MapReduce)。MR-GSP 算法将原序列数据库划分为多个子序列数据库并分发到多个 Map 节点,Map 函数扫描存放在 Map 节点内存中的子序列数据库,产生局部序列模式,Reduce 函数对所有局部序列模式合并,扫描原序列数据库,计算局部序列模式的支持度,得到最终的序列模式。相比于传统 GSP 算法,MR-GSP 算法只需扫描两次原始数据库即可得到所有序列模式。实验结果表明,MR-GSP 算法在对大数据集进行序列模式挖掘时,可充分利用云计算技术的优势,提高挖掘效率。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号