首页> 外文会议>International conference on very large data bases >From 'Think Like a Vertex' to 'Think Like a Graph'
【24h】

From 'Think Like a Vertex' to 'Think Like a Graph'

机译:从“像顶点一样思考”到“像图一样思考”

获取原文

摘要

To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a "think like a vertex" programming model to support iterative graph computation. This vertex-centric model is easy to program and has been proved useful for many graph algorithms. However, this model hides the partitioning information from the users, thus prevents many algorithm-specific optimizations. This often results in longer execution time due to excessive network messages (e.g. in Pregel) or heavy scheduling overhead to ensure data consistency (e.g. in GraphLab). To address this limitation, we propose a new "think like a graph" programming paradigm. Under this graph-centric model, the partition structure is opened up to the users, and can be utilized so that communication within a partition can bypass the heavy message passing or scheduling machinery. We implemented this model in a new system, called Giraph-H-, based on Apache Giraph, an open source implementation of Pregel. We explore the applicability of the graph-centric model to three categories of graph algorithms, and demonstrate its flexibility and superior performance, especially on well-partitioned data. For example, on a web graph with 118 million vertices and 855 million edges, the graph-centric version of connected component detection algorithm runs 63X faster and uses 204X fewer network messages than its vertex-centric counterpart.
机译:为了应对由现代应用程序创建的处理快速增长的图形和网络数据的挑战,出现了许多分布式图形处理系统,例如Pregel和GraphLab。所有这些系统将输入图划分为多个分区,并采用“像顶点一样的思维”编程模型来支持迭代图计算。这种以顶点为中心的模型易于编程,并且已被证明对许多图形算法有用。但是,此模型向用户隐藏了分区信息,因此阻止了许多特定于算法的优化。由于过多的网络消息(例如在Pregel中)或沉重的调度开销(以确保数据一致性)(例如在GraphLab中),这通常会导致执行时间更长。为了解决此限制,我们提出了一种新的“像图一样思考”的编程范例。在这种以图形为中心的模型下,分区结构向用户开放,并且可以被利用,以便分区内的通信可以绕开繁重的消息传递或调度机制。我们在基于Apache Giraph(Pregel的开源实现)的名为Giraph-H-的新系统中实现了该模型。我们探索了以图为中心的模型对三类图算法的适用性,并展示了它的灵活性和优越的性能,尤其是在划分良好的数据上。例如,在具有1.18亿个顶点和8.55亿条边的网络图上,以图为中心的版本的连接组件检测算法比以顶点为中心的版本运行速度快63倍,使用的网络消息少204倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号