【24h】

Adequate Condensed Representations of Patterns

机译:模式的充分压缩表示

获取原文

摘要

Patterns are at the core of the discovery of a lot of knowledge from data but their uses are limited due to their huge number and their mining cost. During the last decade, many works addressed the concept of condensed representation w.r.t. frequency queries. Such representations are several orders of magnitude smaller than the size of the whole collections of patterns, and also enable us to regenerate the frequency information of any pattern. Equivalence classes, based on the Galois closure, are at the core of the pattern condensed representations. However, in real-world applications, interestingness of patterns is evaluated by various many other user-defined measures (e.g., confidence, lift, minimum). To the best of our knowledge, these measures have received very little attention. The Galois closure is appropriate to frequency based measures but unfortunately not to other measures. This paper extends the concept of pattern condensed representations. We propose a framework for condensed representations w.r.t. a large set of new and various queries named condensable functions. These queries encompass not only the frequency (conjunctive, disjunctive or negative) and frequency-based measures, but also address many other interestingness measures (e.g., minimum) and constraints having no suitable property of monotonicity. Condensed representations are achieved thanks to new closure operators automatically derived from each condensable function to get adequate condensed representations. We propose a sound and correct generic algorithm MicMac to efficiently mine the adequate condensed representations. Experiments show the conciseness of the adequate condensed representations, especially in dense and/or correlated data. They also demonstrate the scalability of our algorithm for measures or constraints which are intractible with naive methods. We think that generalizing closure-based condensed representations will offer new tools for higher KDD tasks (e.g., non-redundant rules w.r.t. any measures), similarly there are many uses stemming from the frequency.
机译:模式是从数据发现许多知识的核心,但由于它们的数量庞大及其采矿成本,它们的用途受到限制。在过去十年中,许多作品解决了浓缩表示的概念w.r.t.频率查询。这些表示是比整个图案集合的大小小的数量级,并且还使我们能够再生任何模式的频率信息。基于Galois Closure的等价类位于图案浓缩表示的核心。然而,在真实的应用中,通过各种其他用户定义的措施(例如,信心,提升,最小)评估模式的有趣性。据我们所知,这些措施得到了很少的关注。 Galois Closure适用于基于频率的措施,但遗憾的是不对其他措施。本文扩展了图案浓缩表示的概念。我们提出了一个融合表示W.R.T.的框架。一大集新的和各种查询名为可透明的功能。这些查询不仅包括频率(结合,分离或负)和基于频率的措施,还包括许多其他有趣的措施(例如,最小)和没有合适的单调性的限制。由于新的封闭算子自动衍生自每个可冷凝功能来实现浓缩的表示,以获得足够的浓缩表示。我们提出了一种声音和正确的通用算法MicMac,以有效地挖掘足够的凝聚表示。实验表明了足够的浓缩表示的简明,特别是在密集和/或相关数据中。它们还展示了我们算法的算法或限制性,这些测量或限制性是难以理解的方法。我们认为概括基于闭合的浓缩表示将为更高的KDD任务提供新工具(例如,非冗余规则W.R.T.任何措施),类似地有许多频率源的用途。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号