首页> 外文会议>IEEE High Performance Extreme Computing Conference >TriC: Distributed-memory Triangle Counting by Exploiting the Graph Structure
【24h】

TriC: Distributed-memory Triangle Counting by Exploiting the Graph Structure

机译:TRIC:通过利用图形结构来计算分布式存储器三角形计数

获取原文

摘要

Graph analytics has emerged as an important tool in the analysis of large scale data from diverse application domains such as social networks, cyber security and bioinformatics. Counting the number of triangles in a graph is a fundamental kernel with several applications such as detecting the community structure of a graph or in identifying important vertices in a graph. The ubiquity of massive datasets is driving the need to scale graph analytics on parallel systems. However, numerous challenges exist in efficiently parallelizing graph algorithms, especially on distributed-memory systems. Irregular memory accesses and communication patterns, low computation to communication ratios, and the need for frequent synchronization are some of the leading challenges. In this paper, we present TriC, our distributed-memory implementation of triangle counting in graphs using the Message Passing Interface (MPI), as a submission to the 2020 Graph Challenge competition. Using a set of synthetic and real-world inputs from the challenge, we demonstrate a speedup of up to 90 x relative to previous work on 32 processor-cores of a NERSC Cori node. We also provide details from distributed runs with up to 8192 processes along with strong scaling results. The observations presented in this work provide an understanding of the system-level bottlenecks at scale that specifically impact sparse-irregular workloads and will therefore benefit other efforts to parallelize graph algorithms.
机译:图表分析已成为分析来自各种应用领域的大规模数据的重要工具,例如社交网络,网络安全和生物信息学等多种应用领域。计算图中的三角形的数量是具有若干应用的基本内核,例如检测图的社区结构或在图中识别重要顶点。大规模数据集的难以在并行系统上缩放图形分析的需要。然而,有效地并行化图形算法中存在许多挑战,特别是在分布式存储系统上。不规则内存访问和通信模式,对通信比率的低计算,以及对频繁同步的需求是一些主要挑战。在本文中,我们呈现TRIC,我们使用消息传递接口(MPI)在图中的三角形计数的分布式存储器实现,作为2020年图形挑战竞争的提交。使用来自挑战的一组合成和实际输入,我们展示了相对于先前的第32个Processor-Cori节点的Processor-Cores的工作中最多90倍的加速。我们还提供分布式运行的详细信息,最多可提供8192个进程以及强大的缩放结果。本文中提出的观察结果提供了对专门的系统级瓶颈的理解,特别影响稀疏 - 不规则的工作负载,因此将有益于并行化图形算法的其他努力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号