首页> 外文会议>NAFOSTED Conference on Information and Computer Science >An Efficient Parallel Algorithm for Mining Both Frequent Closed and Generator Sequences on Multi-core Processors
【24h】

An Efficient Parallel Algorithm for Mining Both Frequent Closed and Generator Sequences on Multi-core Processors

机译:一种高效的并行算法,用于在多核处理器上挖掘频繁关闭和生成器序列

获取原文

摘要

Compared to frequent sequence mining that is a computationally challenging task with many intermediate subsequences, frequent closed and generator sequence mining provides several benefits because it results in increased efficiency and concise representations while preserving all the information of all traditional patterns recovered from the representations. Besides, frequent closed sequences can be combined with generators to generate non-redundant sequential rules and to recover all sequential patterns as well as their frequencies quickly. However, most algorithms that have been proposed to discover either closed sequences or generators at a time and for large databases containing many long sequences are still too long to complete the work or run out of memory. Therefore, this paper, by exploiting the advantage of multi-core processor architectures, proposes a novel parallel algorithm called Par-GenCloSM for simultaneously mining both frequent closed and generator sequences in the same process. Par-GenCloSM is based on efficient techniques to quickly eliminate unpromising candidate branches and two novel strategies named EPUCloGen and GPPCloGen to reduce the global synchronization cost of the parallel model and speed up the mining process. Par-GenCloSM is the first parallel algorithm for mining frequent closed sequences and generators concurrently. Experimental results on many real-life and synthetic databases show that Par-GenCloSM outperforms state-of-the-art algorithms in terms of runtime and memory consumption, especially for long sequence databases with low minimum support thresholds.
机译:与频繁序列挖掘相比,这是一个具有许多中间子序列的计算挑战性任务,与之相比,频繁闭合和生成器序列挖掘具有多个好处,因为它可以提高效率和简化表示,同时保留从表示中恢复的所有传统模式的所有信息。此外,可以将频繁的闭合序列与生成器组合以生成非冗余序列规则,并快速恢复所有序列模式及其频率。但是,大多数建议一次发现闭合序列或生成器的算法,对于包含许多长序列的大型数据库而言,仍然太长,无法完成工作或内存不足。因此,本文利用多核处理器架构的优势,提出了一种新颖的并行算法Par-GenCloSM,用于在同一过程中同时挖掘频繁的闭合序列和生成器序列。 Par-GenCloSM基于有效的技术来快速消除不希望的候选分支,以及两种新颖的策略,称为EPUCloGen和GPPCloGen,以减少并行模型的全局同步成本并加快挖掘过程。 Par-GenCloSM是第一个并行算法,用于同时挖掘频繁的闭合序列和生成器。在许多现实生活和综合数据库上的实验结果表明,Par-GenCloSM在运行时间和内存消耗方面优于最新的算法,尤其是对于具有最低最小支持阈值的长序列数据库而言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号