首页> 外文期刊>ACM transactions on knowledge discovery from data >TIPTAP: Approximate Mining of Frequent k-Subgraph Patterns in Evolving Graphs
【24h】

TIPTAP: Approximate Mining of Frequent k-Subgraph Patterns in Evolving Graphs

机译:TIPTAP:在不断发展的图表中常见的k子图模式的近似挖掘

获取原文
获取原文并翻译 | 示例

摘要

Given a labeled graph, the collection of k-vertex induced connected subgraph patterns that appear in the graph more frequently than a user-specified minimum threshold provides a compact summary of the characteristics of the graph, and finds applications ranging from biology to network science. However, finding these patterns is challenging, even more so for dynamic graphs that evolve over time, due to the streaming nature of the input and the exponential time complexity of the problem.We study this task in both incremental and fully-dynamic streaming settings, where arbitrary edges can be added or removed from the graph. We present TIPTAP, a suite of algorithms to compute high-quality approximations of the frequent k-vertex subgraphs w.r.t. a given threshold, at any time (i.e., point of the stream), with high probability. In contrast to existing state-of-the-art solutions that require iterating over the entire set of subgraphs in the vicinity of the updated edge, TIPTAP operates by efficiently maintaining a uniform sample of connected k-vertex subgraphs, thanks to an optimized neighborhood-exploration procedure. We provide a theoretical analysis of the proposed algorithms in terms of their unbiasedness and of the sample size needed to obtain a desired approximation quality. Our analysis relies on sample-complexity bounds that use Vapnik-Chervonenkis dimension, a key concept from statistical learning theory, which allows us to derive a sufficient sample size that is independent from the size of the graph. The results of our empirical evaluation demonstrates that TIPTAPreturns high-quality results more efficiently and accurately than existing baselines.
机译:给定标记图,k-顶点引起的诱导的子图模式比用户指定的最小阈值更频繁地出现在图表中的CAPLE的紧凑概述,并找到从生物学到网络科学的应用程序。然而,发现这些模式是具有挑战性的,甚至更为令人挑战,即动态图表随着时间的推移而发展的动态图,由于输入的流性质和问题的指数时间复杂性。我们在增量和完全动态的流定义设置中研究此任务,可以从图中添加或删除任意边缘的位置。我们呈TipTap,一套算法来计算频繁K-Vertex子图的高质量近似值w.r.t.具有高概率的任何时间(即,流点的点)给定阈值。与需要在更新的边缘附近的整个子图中迭代的现有最先进的解决方案,由于优化的邻域 - 探索程序。我们在其无偏见和获得所需近似质量所需的样本尺寸方面提供了对所提出的算法的理论分析。我们的分析依赖于使用VAPNIK-Chervonenkis维度的样本复杂性界限,统计学习理论的关键概念,这使我们能够从图表的大小上获得足够的样本大小。我们的实证评估结果表明,比现有的基线更有效,准确地高质量的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号