首页> 外文期刊>Expert Systems with Application >A MapReduce solution for incremental mining of sequential patterns from big data
【24h】

A MapReduce solution for incremental mining of sequential patterns from big data

机译:一个MapReduce解决方案,用于从大数据中逐步挖掘顺序模式

获取原文
获取原文并翻译 | 示例

摘要

Sequential Pattern Mining (SPM) is a popular data mining task with broad applications. With the advent of big data, traditional SPM algorithms are not scalable. Hence, many of the researchers have migrated to big data frameworks such as MapReduce and proposed distributed algorithms. However, the existing MapReduce algorithms assume the data as static and do not handle the incremental database updates. Moreover, they use to re-mine the updated database while new sequences are inserted. In this paper, we propose an efficient distributed algorithm for incremental sequential pattern mining (MR-INCSPM) using the MapReduce framework that can handle big data. The proposed algorithm incorporates the backward mining approach that efficiently makes use of the knowledge obtained during the previous mining process. Also, based on the study of item co-occurrences, we propose Co-occurrence Reverse Map (CRMAP) data structure. The issue of combinatorial explosion of candidate sequences is dealt using the proposed CRMAP data structure. Besides, a novel candidate generation and early prune mechanisms are designed using CRMAP to speed up the mining process. The proposed algorithm is evaluated on both the real and synthetic datasets. The experimental results prove the efficacy of MR-INCSPM with respect to processing time, memory and pruning efficiency. (C) 2019 Elsevier Ltd. All rights reserved.
机译:顺序模式挖掘(SPM)是一种流行的数据挖掘任务,具有广泛的应用。随着大数据的到来,传统的SPM算法不可扩展。因此,许多研究人员已经迁移到MapReduce等大数据框架并提出了分布式算法。但是,现有的MapReduce算法假定数据为静态数据,并且不处理增量数据库更新。此外,它们用于在插入新序列时重新挖掘更新的数据库。在本文中,我们提出了一种有效的分布式算法,该算法使用可处理大数据的MapReduce框架进行增量顺序模式挖掘(MR-INCSPM)。所提出的算法结合了后向挖掘方法,该方法有效利用了先前挖掘过程中获得的知识。此外,基于对项目共现的研究,我们提出了共现逆向映射(CRMAP)数据结构。使用建议的CRMAP数据结构解决了候选序列组合爆炸的问题。此外,使用CRMAP设计了新颖的候选生成和早期修剪机制,以加快挖掘过程。该算法在真实数据集和综合数据集上均得到了评估。实验结果证明了MR-INCSPM在处理时间,内存和修剪效率方面的功效。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号