首页> 外文会议>International conference on very large data bases >A General Framework for Estimating Graphlet Statistics via Random Walk
【24h】

A General Framework for Estimating Graphlet Statistics via Random Walk

机译:通过随机游走估计小图统计量的通用框架

获取原文

摘要

Graphlets are induced subgraph patterns and have been frequently applied to characterize the local topology structures of graphs across various domains, e.g., online social networks (OSNs) and biological networks. Discovering and computing graphlet statistics arc highly challenging. First, the massive size of real-world graphs makes the exact computation of graphlets extremely expensive. Secondly, the graph topology may not be readily available so one has to resort to web crawling using the available application programming interfaces (APIs). In this work, we propose a general and novel framework to estimate graphlet statistics of "any size". Our framework is based on collecting samples through consecutive steps of random walks. We derive an analytical bound on the sample size (via the Chernoff-Hoeffding technique) to guarantee the convergence of our unbiased estimator. To further improve the accuracy, we introduce two novel optimization techniques to reduce the lower bound on the sample size. Experimental evaluations demonstrate that our methods outperform the state-of-the-art method up to an order of magnitude both in terms of accuracy and time cost.
机译:小图是诱导的子图模式,并经常用于表征跨各种域(例如,在线社交网络(OSN)和生物网络)的图的局部拓扑结构。发现和计算小图统计数据极具挑战性。首先,现实世界图的庞大规模使得对图集的精确计算极其昂贵。其次,图拓扑可能不容易获得,因此人们不得不诉诸使用可用的应用程序编程接口(API)进行Web爬网。在这项工作中,我们提出了一个通用且新颖的框架来估计“任何大小”的graphlet统计量。我们的框架基于通过随机游走的连续步骤收集样本。我们得出了样本量的分析界限(通过Chernoff-Hoeffding技术),以确保我们的无偏估计量的收敛。为了进一步提高准确性,我们引入了两种新颖的优化技术来减小样本大小的下限。实验评估表明,我们的方法在准确性和时间成本方面都比最先进的方法高出一个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号