...
首页> 外文期刊>SIGKDD explorations >Streaming Graph Partitioning for Large Distributed Graphs
【24h】

Streaming Graph Partitioning for Large Distributed Graphs

机译:大型分布图的流图分区

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Extracting knowledge by performing computations on graphs is becoming increasingly challenging as graphs grow in size. A standard approach distributes the graph over a cluster of nodes, but performing computations on a distributed graph is expensive if large amount of data have to be moved. Without partitioning the graph, communication quickly becomes a limiting factor in scaling the system up. Existing graph partitioning heuristics incur high computation and communication cost on large graphs, sometimes as high as the future computation itself. Observing that the graph has to be loaded into the cluster, we ask if the partitioning can be done at the same time with a lightweight streaming algorithm. We propose natural, simple heuristics and compare their performance to hashing and METIS, a fast, offline heuristic. We show on a large collection of graph datasets that our heuristics are a significant improvement, with the best obtaining an average gain of 76%. The heuristics are scalable in the size of the graphs and the number of partitions. Using our streaming partitioning methods, we are able to speed up PageRank computations on Spark [32], a distributed computation system, by 18% to 39% for large social networks.
机译:随着图形尺寸的增加,通过对图形执行计算来提取知识变得越来越具有挑战性。一种标准方法是将图形分布在节点群集上,但是如果必须移动大量数据,则对分布式图形执行计算是昂贵的。如果不对图形进行分区,通信将迅速成为扩大系统规模的限制因素。现有的图分区启发法在大型图上会导致较高的计算和通信成本,有时甚至会与将来的计算本身一样高。观察到必须将图加载到群集中后,我们询问是否可以使用轻量级流算法同时完成分区。我们提出自然,简单的启发式方法,并将其性能与哈希和METIS(一种快速,离线的启发式方法)进行比较。我们在大量的图形数据集上表明,我们的启发式方法有了显着的改进,最好的方法获得了76%的平均收益。试探法在图的大小和分区数量上是可扩展的。使用我们的流分区方法,对于大型社交网络,我们可以将Spark [32](一种分布式计算系统)上的PageRank计算速度提高18%到39%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号