首页> 外文期刊>Data Mining and Knowledge Discovery >Scalable pattern mining with Bayesian networks as background knowledge
【24h】

Scalable pattern mining with Bayesian networks as background knowledge

机译:以贝叶斯网络为背景知识的可扩展模式挖掘

获取原文
获取原文并翻译 | 示例

摘要

We study a discovery framework in which background knowledge on variables and their relations within a discourse area is available in the form of a graphical model. Starting from an initial, hand-crafted or possibly empty graphical model, the network evolves in an interactive process of discovery. We focus on the central step of this process: given a graphical model and a database, we address the problem of finding the most interesting attribute sets. We formalize the concept of interestingness of attribute sets as the divergence between their behavior as observed in the data, and the behavior that can be explained given the current model. We derive an exact algorithm that finds all attribute sets whose interestingness exceeds a given threshold. We then consider the case of a very large network that renders exact inference unfeasible, and a very large database or data stream. We devise an algorithm that efficiently finds the most interesting attribute sets with prescribed approximation bound and confidence probability, even for very large networks and infinite streams. We study the scalability of the methods in controlled experiments; a case-study sheds light on the practical usefulness of the approach.
机译:我们研究了一个发现框架,其中以图形模型的形式提供了有关话语区域内变量及其关系的背景知识。网络从最初的手工制作或可能为空的图形模型开始,在交互的发现过程中发展。我们专注于此过程的中心步骤:给定图形模型和数据库,我们解决了寻找最有趣的属性集的问题。我们将属性集的有趣性的概念正式化为在数据中观察到的它们的行为与在当前模型下可以解释的行为之间的差异。我们推导一种精确的算法,该算法可以找到其有趣程度超过给定阈值的所有属性集。然后,我们考虑一个非常大的网络(无法进行确切的推断)以及一个非常大的数据库或数据流的情况。我们设计了一种算法,该算法即使对于非常大的网络和无限流,也能以规定的近似范围和置信度有效地找到最有趣的属性集。我们在受控实验中研究了这些方法的可扩展性。案例研究揭示了该方法的实际实用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号