首页> 外文期刊>SIGKDD explorations >The Long and the Short of It: Summarising Event Sequences with Serial Episodes
【24h】

The Long and the Short of It: Summarising Event Sequences with Serial Episodes

机译:它的长与短:用连续情节总结事件序列

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

An ideal outcome of pattern mining is a small set of informative patterns, containing no redundancy or noise, that identifies the key structure of the data at hand. Standard frequent pattern miners do not achieve this goal, as due to the pattern explosion typically very large numbers of highly redundant patterns are returned. We pursue the ideal for sequential data, by employing a pattern set mining approach-an approach where, instead of ranking patterns individually, we consider results as a whole. Pattern set mining has been successfully applied to transactional data, but has been surprisingly understudied for sequential data. In this paper, we employ the MDL principle to identify the set of sequential patterns that summarises the data best. In particular, we formalise how to encode sequential data using sets of serial episodes, and use the encoded length as a quality score. As search strategy, we propose two approaches: the first algorithm selects a good pattern set from a large candidate set, while the second is a parameter-free any-time algorithm that mines pattern sets directly from the data. Experimentation on synthetic and real data demonstrates we efficiently discover small sets of informative patterns.
机译:模式挖掘的理想结果是提供少量信息模式,其中不包含冗余或噪音,这些模式可识别手头数据的关键结构。标准的频繁模式采矿者无法实现此目标,因为由于模式爆炸,通常会返回非常大量的高度冗余模式。通过采用模式集挖掘方法,我们追求顺序数据的理想状态-一种方法,而不是单独对模式进行排名,我们将结果视为一个整体。模式集挖掘已成功应用于事务数据,但令人惊讶的是对顺序数据的研究不足。在本文中,我们采用MDL原理来识别可最佳总结数据的顺序模式集。特别是,我们正式确定了如何使用一系列连续剧集对顺序数据进行编码,并将编码后的长度用作质量得分。作为搜索策略,我们提出了两种方法:第一种算法是从大型候选集中选择一个好的模式集,而第二种是无参数的随时算法,可以直接从数据中挖掘模式集。对合成数据和真实数据进行的实验表明,我们可以有效地发现少量信息模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号