【24h】

Estimating Descriptors for Large Graphs

机译:估计大图的描述符

获取原文

摘要

Embedding networks into a fixed dimensional feature space, while preserving its essential structural properties is a fundamental task in graph analytics. These feature vectors (graph descriptors) are used to measure the pairwise similarity between graphs. This enables applying data mining algorithms (e.g classification, clustering, or anomaly detection) on graph-structured data which have numerous applications in multiple domains. State-of-the-art algorithms for computing descriptors require the entire graph to be in memory, entailing a huge memory footprint, and thus do not scale well to increasing sizes of real-world networks. In this work, we propose streaming algorithms to efficiently approximate descriptors by estimating counts of sub-graphs of order k ≤ 4, and thereby devise extensions of two existing graph comparison paradigms: the Graphlet Kernel and NetSimile. Our algorithms require a single scan over the edge stream, have space complexity that is a fraction of the input size, and approximate embeddings via a simple sampling scheme. Our design exploits the trade-off between available memory and estimation accuracy to provide a method that works well for limited memory requirements. We perform extensive experiments on real-world networks and demonstrate that our algorithms scale well to massive graphs.
机译:将网络嵌入到固定尺寸的特征空间中,同时保留其基本的结构属性是图形分析的基本任务。这些特征向量(图形描述符)用于测量图形之间的成对相似性。这使得可以对图结构化数据应用数据挖掘算法(例如分类,聚类或异常检测),而图结构化数据在多个领域中都有大量应用。用于计算描述符的最新算法要求整个图形都在内存中,这会占用巨大的内存空间,因此无法很好地扩展以适应实际网络的规模。在这项工作中,我们提出了流算法,通过估计k≤4的子图计数来有效地近似描述符,从而设计了两个现有图比较范式的扩展:Graphlet Kernel和NetSimile。我们的算法需要对边缘流进行一次扫描,其空间复杂度仅为输入大小的一小部分,并通过简单的采样方案进行近似嵌入。我们的设计利用了可用内存和估计精度之间的权衡,以提供一种适用于有限内存需求的方法。我们在现实世界的网络上进行了广泛的实验,并证明了我们的算法可以很好地扩展到大量图形。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号