...
首页> 外文期刊>Data & Knowledge Engineering >Mining sequential patterns across multiple sequence databases
【24h】

Mining sequential patterns across multiple sequence databases

机译:在多个序列数据库中挖掘序列模式

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this paper, given a set of sequence databases across multiple domains, we aim at mining multi-domain sequential patterns, where a multi-domain sequential pattern is a sequence of events whose occurrence time is within a pre-defined time window. We first propose algorithm Naive in which multiple sequence databases are joined as one sequence database for utilizing traditional sequential pattern mining algorithms (e.g., PrefixSpan). Due to the nature of join operations, algorithm Naive is costly and is developed for comparison purposes. Thus, we propose two algorithms without any join operations for mining multi-domain sequential patterns. Explicitly, algorithm IndividualMine derives sequential patterns in each domain and then iteratively combines sequential patterns among sequence databases of multiple domains to derive candidate multi-domain sequential patterns. However, not all sequential patterns mined in the sequence database of each domain are able to form multi-domain sequential patterns. To avoid the mining cost incurred in algorithm IndividualMine, algorithm PropagatedMine is developed. Algorithm PropagatedMine first performs one sequential pattern mining from one sequence database. In light of sequential patterns mined, algorithm PropagatedMine propagates sequential patterns mined to other sequence databases. Furthermore, sequential patterns mined are represented as a lattice structure for further reducing the number of sequential patterns to be propagated. In addition, we develop some mechanisms to allow some empty sets in multi-domain sequential patterns. Performance of the proposed algorithms is comparatively analyzed and sensitivity analysis is conducted. Experimental results show that by exploring propagation and lattice structures, algorithm PropagatedMine outperforms algorithm IndividualMine in terms of efficiency (i.e., the execution time).
机译:在本文中,给定一组跨多个域的序列数据库,我们旨在挖掘多域顺序模式,其中多域顺序模式是事件发生时间在预定义时间窗口内的事件序列。我们首先提出一种朴素算法,其中将多个序列数据库作为一个序列数据库加入以利用传统的顺序模式挖掘算法(例如PrefixSpan)。由于联接操作的性质,Naive算法很昂贵,并且出于比较目的而开发。因此,我们提出了两种没有任何连接操作的算法来挖掘多域顺序模式。明确地,算法IndividualMine在每个域中导出顺序模式,然后迭代地组合多个域的序列数据库之间的顺序模式,以得出候选的多域顺序模式。但是,并不是在每个域的序列数据库中挖掘的所有顺序模式都能形成多域顺序模式。为了避免算法IndividualMine产生的挖掘成本,开发了PropagatedMine算法。算法PropagatedMine首先从一个序列数据库执行一个顺序模式挖掘。根据挖掘的顺序模式,算法PropagatedMine将挖掘的顺序模式传播到其他序列数据库。此外,挖掘的顺序模式被表示为格状结构,以进一步减少要传播的顺序模式的数量。此外,我们开发了一些机制来允许多域顺序模式中的一些空集。比较分析了所提算法的性能,并进行了灵敏度分析。实验结果表明,通过研究传播和晶格结构,PropagatedMine算法在效率(即执行时间)方面优于算法IndividualMine。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号