首页> 外文会议>International Conference on Data Mining >Sequence Mining Automata: A New Technique for Mining Frequent Sequences under Regular Expressions
【24h】

Sequence Mining Automata: A New Technique for Mining Frequent Sequences under Regular Expressions

机译:序列挖掘自动机:在常规表达式下采矿频繁序列的新技术

获取原文

摘要

In this paper we study the problem of mining frequent sequences satisfying a given regular expression. Previous approaches to solve this problem were focusing on its search space, pushing (in some way) the given regular expression to prune unpromising candidate patterns. On the contrary, we focus completely on the given input data and regular expression. We introduce Sequence Mining Automata ($SMA$), a specialized kind of Petri Net that while reading input sequences, it produces for each sequence all and only the patterns contained in the sequence and that satisfy the given regular expression. Based on this automaton, we develop a family of algorithms. Our thorough experimentation on different datasets and application domains confirms that in many cases our methods outperform the current state of the art of frequent sequence mining algorithms using regular expressions (in some cases of orders of magnitude).
机译:在本文中,我们研究了满足给定正则表达的频繁序列的问题。以前解决此问题的方法专注于其搜索空间,推动(以某种方式)给定的正则表达式,以修剪不妥协的候选模式。相反,我们完全专注于给定的输入数据和正则表达式。我们介绍序列挖掘自动机($ SMA $),一种专门的Petri网,在读取输入序列时,它为每个序列产生了所有序列,并且仅在序列中包含的模式,并满足给定的正则表达式。基于这款自动机,我们开发了一系列算法。我们对不同数据集和应用领域的彻底实验证实,在许多情况下,我们的方法优于使用正则表达式的频繁序列挖掘算法的当前状态(在某些级别的情况下)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号