首页> 外文期刊>Expert systems with applications >A MapReduce solution for incremental mining of sequential patterns from big data
【24h】

A MapReduce solution for incremental mining of sequential patterns from big data

机译:来自大数据的顺序模式增量挖掘的MapReduce解决方案

获取原文
获取原文并翻译 | 示例

摘要

Sequential Pattern Mining (SPM) is a popular data mining task with broad applications. With the advent of big data, traditional SPM algorithms are not scalable. Hence, many of the researchers have migrated to big data frameworks such as MapReduce and proposed distributed algorithms. However, the existing MapReduce algorithms assume the data as static and do not handle the incremental database updates. Moreover, they use to re-mine the updated database while new sequences are inserted. In this paper, we propose an efficient distributed algorithm for incremental sequential pattern mining (MR-INCSPM) using the MapReduce framework that can handle big data. The proposed algorithm incorporates the backward mining approach that efficiently makes use of the knowledge obtained during the previous mining process. Also, based on the study of item co-occurrences, we propose Co-occurrence Reverse Map (CRMAP) data structure. The issue of combinatorial explosion of candidate sequences is dealt using the proposed CRMAP data structure. Besides, a novel candidate generation and early prune mechanisms are designed using CRMAP to speed up the mining process. The proposed algorithm is evaluated on both the real and synthetic datasets. The experimental results prove the efficacy of MR-INCSPM with respect to processing time, memory and pruning efficiency. (C) 2019 Elsevier Ltd. All rights reserved.
机译:顺序模式挖掘(SPM)是具有广泛应用程序的流行数据挖掘任务。随着大数据的出现,传统的SPM算法不可扩展。因此,许多研究人员已经迁移到大数据框架,例如MapReduce和提出的分布式算法。但是,现有的MapReduce算法假设数据是静态的,不处理增量数据库更新。此外,它们用于在插入新序列时重新挖掘更新的数据库。在本文中,我们向增量顺序模式挖掘(MR-IncSPM)提出了一种有效的分布式算法,使用可以处理大数据的MapReduce框架。所提出的算法包含后向采矿方法,其有效地利用先前采矿过程中获得的知识。此外,根据项目共同发生的研究,我们提出了共同发生反向地图(CRMAP)数据结构。使用所提出的CRMAP数据结构处理候选序列组合爆炸问题。此外,使用CRMAP设计了一种新颖的候选生成和早期修剪机制,以加快采矿过程。在实际和合成数据集中评估所提出的算法。实验结果证明了MR-INCSPM对处理时间,记忆和修剪效率的功效。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号