【24h】

Flexible Constrained Sampling with Guarantees for Pattern Mining

机译:灵活的约束采样,确保模式挖掘

获取原文

摘要

Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to (1) flexibility in terms of quality measures and constraints that can be used, and/or (2) guarantees with respect to sampling accuracy. We therefore present Flexics, the first flexible pattern sampler that supports a broad class of quality measures and constraints, while providing strong guarantees regarding sampling accuracy. To achieve this, we leverage the perspective on pattern mining as a constraint satisfaction problem and build upon the latest advances in sampling solutions in SAT as well as existing pattern mining algorithms. Furthermore, the proposed algorithm is applicable to a variety of pattern languages, which allows us to introduce and tackle the novel task of sampling sets of patterns. We introduce and empirically evaluate two variants of Flexics: (1) a generic variant that addresses the well-known itemset sampling task and the novel pattern set sampling task as well as a wide range of expressive constraints within these tasks, and (2) a specialized variant that exploits existing frequent itemset techniques to achieve substantial speed-ups. Experiments show that Flexics is both accurate and efficient, making it a useful tool for pattern-based data exploration.
机译:已经提出模式采样作为臭名昭著的模式爆炸的潜在解决方案。代替枚举满足约束条件的所有模式,而是根据给定的质量度量按比例对单个模式进行采样。已经提出了几种采样算法,但是当涉及到(1)在可使用的质量度量和约束方面的灵活性,和/或(2)关于采样精度的保证时,每种算法都有其局限性。因此,我们展示了Flexics,这是第一款支持广泛质量措施和约束条件的灵活模式采样器,同时为采样精度提供了有力的保证。为了实现这一目标,我们利用模式挖掘作为约束满足问题的观点,并以SAT采样解决方案的最新进展以及现有的模式挖掘算法为基础。此外,所提出的算法适用于多种模式语言,这使我们能够引入和解决模式集采样的新任务。我们介绍并凭经验评估Flexics的两个变体:(1)一个通用变体,用于解决众所周知的项目集采样任务和新颖的模式集采样任务,以及这些任务中的广泛表达约束,以及(2)利用现有的频繁项集技术来实现实质性加速的专业变体。实验表明,Flexics既准确又高效,使其成为基于模式的数据探索的有用工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号