【24h】

Mining Sequential Patterns from Probabilistic Databases

机译:从概率数据库中挖掘顺序模式

获取原文

摘要

We consider sequential pattern mining in situations where there is uncertainty about which source an event is associated with. We model this in the probabilistic database framework and consider the problem of enumerating all sequences whose expected support is sufficiently large. Unlike frequent itemset mining in probabilistic databases [C. Aggar-wal et al. KDD'09; Chui et al., PAKDD'07; Chui and Kao, PAKDD'08], we use dynamic programming (DP) to compute the probability that a source supports a sequence, and show that this suffices to compute the expected support of a sequential pattern. Next, we embed this DP algorithm into candidate generate-and-test approaches, and explore the pattern lattice both in a breadth-first (similar to GSP) and a depth-first (similar to SPAM) manner. We propose optimizations for efficiently computing the frequent 1-sequences, for re-using previously-computed results through incremental support computation, and for elmiminating candidate sequences without computing their support via probabilistic pruning. Preliminary experiments show that our optimizations are effective in improving the CPU cost.
机译:在不确定事件与哪个来源相关联的情况下,我们考虑顺序模式挖掘。我们在概率数据库框架中对此建模,并考虑枚举其预期支持足够大的所有序列的问题。与概率数据库中频繁的项目集挖掘不同[C. Aggar-wal等。 KDD'09; Chui等人,PAKDD'07; Chui和Kao,PAKDD'08],我们使用动态规划(DP)来计算源支持序列的可能性,并表明这足以计算对序列模式的预期支持。接下来,我们将此DP算法嵌入到候选的生成和测试方法中,并以广度优先(类似于GSP)和深度优先(类似于SPAM)的方式探索模式晶格。我们提出了优化方案,以有效地计算频繁的1序列,通过增量支持计算重用先前计算的结果以及消除候选序列而无需通过概率修剪来计算其支持。初步实验表明,我们的优化可有效降低CPU成本。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号