首页> 外文会议>Simulation Multi-Conference >Fast Parallel Conversion of Edge List to Adjacency List for Large-Scale Graphs
【24h】

Fast Parallel Conversion of Edge List to Adjacency List for Large-Scale Graphs

机译:边缘列表的快速并行转换为大规模图形的邻接列表

获取原文

摘要

In the era of Bigdata, we are deluged with massive graph data emerged from numerous social and scientific applications. In most cases, graph data are generated as lists of edges (edge list), where an edge denotes a link between a pair of entities. However, most of the graph algorithms work efficiently when information of the adjacent nodes (adjacency list) for each node is readily available. Although the conversion from edge list to adjacency list can be trivially done on the fly for small graphs, such conversion becomes challenging for the emerging large-scale graphs consisting billions of nodes and edges. These graphs do not fit into the main memory of a single computing machine and thus require distributed-memory parallel or external-memory algorithms. In this paper, we present efficient MPI-based distributed memory parallel algorithms for converting edge lists to adjacency lists. To the best of our knowledge, this is the first work on this problem. To address the critical load balancing issue, we present a parallel load balancing scheme which improves both time and space efficiency significantly. Our fast parallel algorithm works on massive graphs, achieves very good speedups, and scales to large number of processors. The algorithm can convert an edge list of a graph with 20 billion edges to the adjacency list in less than 2 minutes using 1024 processors. Denoting the number of nodes, edges and processors by n, m, and P, respectively, the time complexity of our algorithm is O(m/P + n + P) which provides a speedup factor of at least Ω(min{P, d_(avg)}), where d_(avg) is the average degree of the nodes. The algorithm has a space complexity of O(m/P), which is optimal.
机译:在Bigdata的时代,我们淹没了来自许多社会和科学应用的大规模图表数据。在大多数情况下,图形数据被生成为边缘(边缘列表)列表,其中边缘表示一对实体之间的链路。然而,大多数图形算法有效地工作,当每个节点的相邻节点(邻接列表)的信息很容易获得。尽管从边缘列表到邻接列表的转换可以在飞行中进行阶段,但是这种转换对于组成数十亿节点和边缘的新出现的大规模图来挑战。这些图形不适合单个计算机的主存储器,因此需要分布式存储器并行或外部存储器算法。在本文中,我们提出了高效的MPI的分布式存储器并行算法,用于将边缘列表转换为邻接列表。据我们所知,这是第一个关于这个问题的工作。为了解决关键负载均衡问题,我们提出了一种并行负载平衡方案,可显着提高时间和空间效率。我们的快速并行算法适用于大规模图形,实现了非常好的加速,并缩放到大量处理器。该算法可以使用1024处理器在不到2分钟的时间内将具有20亿边缘的图形的边缘列表转换为邻接列表。表示N,M和P的节点,边缘和处理器的数量,我们的算法的时间复杂度是O(m / p + n + p),其提供至少ω的加速因子(min {p, d_(avg)}),其中d_(avg)是节点的平均程度。该算法具有O(m / p)的空间复杂度,其是最佳的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号