首页> 外文期刊>The VLDB journal >Temporal locality-aware sampling for accurate triangle counting in real graph streams
【24h】

Temporal locality-aware sampling for accurate triangle counting in real graph streams

机译:用于真实图形流中的准确三角形计数的时间位置感知采样

获取原文
获取原文并翻译 | 示例
           

摘要

If we cannot store all edges in a dynamic graph, which edges should we store to estimate the triangle count accurately? Counting triangles (i.e., cliques of size three) is a fundamental graph problem with many applications in social network analysis, web mining, anomaly detection, etc. Recently, much effort has been made to accurately estimate the counts of global triangles (i.e., all triangles) and local triangles (i.e., all triangle incident to each node) in large dynamic graphs, especially with limited space. Although existing algorithms use sampling techniques without considering temporal dependencies in edges, we observetemporal localityin the formation of triangles in real dynamic graphs. That is, future edges are more likely to form triangles with recent edges than with older edges. In this work, we propose a family of single-pass streaming algorithms calledWaiting-Room Sampling(WRS) for estimating the counts of global and local triangles in a fully dynamic graph, where edges are inserted and deleted over time, within a fixed memory budget.WRSexploits the temporal locality by always storing the most recent edges, which future edges are more likely to form triangles with, in thewaiting room, while it uses reservoir sampling and its variant for the remaining edges. Our theoretical and empirical analyses show thatWRSis:(a) Fast and 'any time':runs in linear time, always maintaining and updating estimates, while the input graph evolves,(b) Effective: yields up to47% smaller estimation errorthan its best competitors, and(c) Theoretically sound: gives unbiased estimates with small variances under the temporal locality.
机译:如果我们无法将所有边缘存储在动态图中,我们应该准确地存储哪些边,以准确地估计三角形计数?计数三角形(即大小三的派系)是社交网络分析中许多应用,网上采矿,异常检测等的基本图表问题最近,已经努力准确估计全球三角形的计数(即,所有在大动态图中,三角形)和局部三角形(即,所有三角形到每个节点的三角形),尤其是有限的空间。尽管现有算法使用采样技术而不考虑边缘中的时间依赖性,但我们遵守普通的局部地标在实际动态图中形成三角形。也就是说,未来的边缘更有可能形成最近的边缘的三角形,而不是旧边缘。在这项工作中,我们提出了一个叫做从事室采样(WRS)的单遍流媒体算法系列,用于估计完全动态图中的全局和局部三角形的计数,其中边缘在固定的内存预算内随着时间的推移插入和删除.wrsexplo通过始终存储最新的边缘,将未来的边缘更有可能在三角形室内形成三角形,而它使用储存器采样及其变型来形成剩余的边缘。我们的理论和经验分析显示了本文:(a)快速和“任何时间”:在线性时间运行,始终保持和更新估计,而输入图演变,(b)有效:收益率高达47%的估计差异(c)理论上的声音:在时间位置下提供小差异的无偏见估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号