首页> 外文期刊>Engineering Applications of Artificial Intelligence >Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions
【24h】

Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions

机译:基于扩展修剪条件同时挖掘顺序模式的简洁表示的高效算法

获取原文
获取原文并翻译 | 示例
           

摘要

The concise representations of sequential patterns, including maximal sequential patterns, closed sequential patterns and sequential generator patterns, play an important role in data mining since they provide several benefits when compared to sequential patterns. One of the most important benefits is that their cardinalities are generally much less than the cardinality of the set of sequential patterns. Therefore, they can be mined more efficiently, use less storage space, and it is easier for users to analyze the information provided by the concise representations. In addition, the set of all maximal sequential patterns can be utilized to recover the complete set of sequential patterns, while closed sequential patterns and sequential generators can be used together to generate non-redundant sequential rules and to quickly recover all sequential patterns and their frequencies. Several algorithms have been proposed to mine the concise representations separately, i.e., each of them has been designed to discover only a type of the concise representation. However, they remain time-consuming and memory intensive tasks. To address this problem, we propose three novel efficient algorithms named FMaxSM, FGenCloSM and MaxGenCloSM to exploit only maximal sequential patterns, to simultaneously mine both the sets of closed sequential patterns and generators, and to discover all three concise representations during the same process. To our knowledge, MaxGenCloSM is the first algorithm for concurrently mining the three concise representations of sequential patterns. The proposed algorithms are based on two novel local pruning strategies called LPMAX and LPMaxGenClo that are designed to prune non-maximal, non-closed and non-generator patterns earlier and more efficiently at two and three successive levels of the prefix tree without subsequence relation checking. Extensive experiments on real-life and synthetic databases show that FMaxSM, FGenCloSM and MaxGenCloSM are up to two orders of magnitude faster than the state-of-the-art algorithms and that the proposed algorithms consume much less memory, especially for low minimum support thresholds and for dense databases.
机译:顺序模式的简洁表示,包括最大顺序模式,闭合顺序模式和顺序生成器模式,在数据挖掘中起着重要作用,因为与顺序模式相比,它们提供了许多好处。最重要的好处之一是它们的基数通常比顺序模式集的基数小得多。因此,可以更有效地挖掘它们,使用更少的存储空间,并且用户更容易分析简明表示形式提供的信息。此外,所有最大顺序模式集均可用于恢复完整的顺序模式集,而封闭顺序模式和顺序生成器可一起使用以生成非冗余顺序规则并快速恢复所有顺序模式及其频率。已经提出了几种算法来分别挖掘简明表示,即,每种算法被设计成仅发现简明表示的类型。但是,它们仍然是耗时且占用大量内存的任务。为了解决这个问题,我们提出了三种名为FMaxSM,FGenCloSM和MaxGenCloSM的新型高效算法,以仅利用最大顺序模式,同时挖掘闭合顺序模式和生成器的集合,并在同一过程中发现所有三个简洁表示。据我们所知,MaxGenCloSM是第一种同时挖掘顺序模式的三个简洁表示的算法。所提出的算法基于称为LPMAX和LPMaxGenClo的两种新颖的本地修剪策略,这些策略被设计为在前缀树的两个和三个连续级别上更早,更有效地修剪非最大,非闭合和非生成器模式,而无需进行子序列关系检查。 。在现实生活和综合数据库上进行的大量实验表明,FMaxSM,FGenCloSM和MaxGenCloSM比最新算法快两个数量级,并且所提出的算法消耗的内存少得多,尤其是对于最低最低支持阈值而言和密集数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号