TIPTAP: Approximate Mining of Frequent k-Subgraph Patterns in Evolving Graphs

Nasir Muhammad Anis Uddin; Aslay Cigdem; Morales Gianmarco De Francisci; Riondato Matteo

首页> 外文期刊>ACM transactions on knowledge discovery from data >TIPTAP: Approximate Mining of Frequent k-Subgraph Patterns in Evolving Graphs

【24h】

TIPTAP: Approximate Mining of Frequent k-Subgraph Patterns in Evolving Graphs

机译：TIPTAP：在不断发展的图表中常见的k子图模式的近似挖掘

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Given a labeled graph, the collection of k-vertex induced connected subgraph patterns that appear in the graph more frequently than a user-specified minimum threshold provides a compact summary of the characteristics of the graph, and finds applications ranging from biology to network science. However, finding these patterns is challenging, even more so for dynamic graphs that evolve over time, due to the streaming nature of the input and the exponential time complexity of the problem.We study this task in both incremental and fully-dynamic streaming settings, where arbitrary edges can be added or removed from the graph. We present TIPTAP, a suite of algorithms to compute high-quality approximations of the frequent k-vertex subgraphs w.r.t. a given threshold, at any time (i.e., point of the stream), with high probability. In contrast to existing state-of-the-art solutions that require iterating over the entire set of subgraphs in the vicinity of the updated edge, TIPTAP operates by efficiently maintaining a uniform sample of connected k-vertex subgraphs, thanks to an optimized neighborhood-exploration procedure. We provide a theoretical analysis of the proposed algorithms in terms of their unbiasedness and of the sample size needed to obtain a desired approximation quality. Our analysis relies on sample-complexity bounds that use Vapnik-Chervonenkis dimension, a key concept from statistical learning theory, which allows us to derive a sufficient sample size that is independent from the size of the graph. The results of our empirical evaluation demonstrates that TIPTAPreturns high-quality results more efficiently and accurately than existing baselines.

机译：给定标记图，k-顶点引起的诱导的子图模式比用户指定的最小阈值更频繁地出现在图表中的CAPLE的紧凑概述，并找到从生物学到网络科学的应用程序。然而，发现这些模式是具有挑战性的，甚至更为令人挑战，即动态图表随着时间的推移而发展的动态图，由于输入的流性质和问题的指数时间复杂性。我们在增量和完全动态的流定义设置中研究此任务，可以从图中添加或删除任意边缘的位置。我们呈TipTap，一套算法来计算频繁K-Vertex子图的高质量近似值w.r.t.具有高概率的任何时间（即，流点的点）给定阈值。与需要在更新的边缘附近的整个子图中迭代的现有最先进的解决方案，由于优化的邻域 - 探索程序。我们在其无偏见和获得所需近似质量所需的样本尺寸方面提供了对所提出的算法的理论分析。我们的分析依赖于使用VAPNIK-Chervonenkis维度的样本复杂性界限，统计学习理论的关键概念，这使我们能够从图表的大小上获得足够的样本大小。我们的实证评估结果表明，比现有的基线更有效，准确地高质量的结果。

著录项

来源
《ACM transactions on knowledge discovery from data》 |2021年第3期|48.1-48.35|共35页
作者
Nasir Muhammad Anis Uddin; Aslay Cigdem; Morales Gianmarco De Francisci; Riondato Matteo;
展开▼
作者单位

King Digital Entertainment Ltd Sveavagen 44 S-11134 Stockholm Sweden;

Aarhus Univ Dept Comp Sci Abogade 34 DK-8200 Aarhus Denmark;

ISI Fdn Via Chisola 5 I-10126 Turin Italy;

Amherst Coll 25 East Dr Amherst MA 01002 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Edge streams; graph streams; reservoir sampling; random pairing; VC-dimension;

机译：边缘流;图形流;储液器采样;随机配对;VC维度;

相似文献

外文文献
中文文献
专利

1. Complete Mining of Frequent Patterns from Graphs: Mining Graph Data [J] . AKIHTRO INOKUCHI, TAKASHI WASHIO, HIROSHI MOTODA Machine Learning . 2003,第3期

机译：从图上完整挖掘频繁模式：挖掘图数据
2. Mining frequent approximate patterns in large networks [J] . Driss Kaouthar, Boulila Wadii, Leborgne Aurelie, International journal of imaging systems and technology . 2021,第3期

机译：挖掘大型网络中的常见近似模式
3. Mining approximate patterns with frequent locally optimal occurrences [J] . Nakamura Atsuyoshi, Takigawa Ichigaku, Tosaka Hisashi, Discrete Applied Mathematics . 2016,第Null期

机译：挖掘具有频繁局部最优事件的近似模式
4. A change detector for mining frequent patterns over evolving data streams [C] . Ng, Willie, Dash, Systems, Man and Cybernetics (SMC), 2008 IEEE International Conference on . 2008

机译：一种变化检测器，用于挖掘不断发展的数据流上的频繁模式
5. Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs [D] . Meng, Jinghan. 2017

机译：灵活可行的支持措施，用于挖掘大标签图中的频繁模式
6. MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance [O] . Jingbo Shang, Jian Peng, Jiawei Han -1

机译：MACFP：编辑距离下的最大近似连续频繁模式挖掘
7. gApprox: Mining Frequent Approximate Patterns from a Massive Network [O] . Chen Chen, Xifeng Yan, Feida Zhu, 2008

机译：gApprox：从大规模网络中挖掘频繁的近似模式

TIPTAP: Approximate Mining of Frequent k-Subgraph Patterns in Evolving Graphs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅