首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition
【24h】

GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition

机译:GraphP:通过有效的数据分区减少基于PIM的图处理的通信

获取原文

摘要

Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of “big data” and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified by the recent research interests on PIM-based acceleration. Among them, TESSERACT is a PIM-enabled parallel graph processing architecture based on Micron's Hybrid Memory Cube (HMC), one of the most prominent 3D-stacked memory technologies. It implements a Pregel-like vertex-centric programming model, so that users could develop programs in the familiar interface while taking advantage of PIM. Despite the orders of magnitude speedup compared to DRAM-based systems, TESSERACT generates excessive crosscube communications through SerDes links, whose bandwidth is much less than the aggregated local bandwidth of HMCs. Our investigation indicates that this is because of the restricted data organization required by the vertex programming model. In this paper, we argue that a PIM-based graph processing system should take data organization as a first-order design consideration. Following this principle, we propose GraphP, a novel HMC-based software/hardware co-designed graph processing system that drastically reduces communication and energy consumption compared to TESSERACT. GraphP features three key techniques. 1) “Source-cut” partitioning, which fundamentally changes the cross-cube communication from one remote put per cross-cube edge to one update per replica. 2) “Two-phase Vertex Program”, a programming model designed for the “source-cut” partitioning with two operations: GenUpdate and ApplyUpdate. 3) Hierarchical communication and overlapping, which further improves performance with unique opportunities offered by the proposed partitioning and programming model. We evaluate GraphP using a cycle accurate simulator with 5 real-world graphs and 4 algorithms. The results show that it provides on average 1.7 speedup and 89% energy saving compared to TESSERACT.
机译:内存中处理(PIM)是一种有效的技术,可通过将处理单元集成到内存中来减少数据移动。 “大数据”和3D堆栈技术的最新发展使PIM成为现代数据处理工作负载的实用且可行的解决方案。最近对基于PIM的加速的研究兴趣就是例证。其中,TESSERACT是基于Micron的Hybrid Memory Cube(HMC)的PIM并行图处理体系结构,HMC是最著名的3D堆栈存储技术之一。它实现了类似于Pregel的以顶点为中心的编程模型,因此用户可以在熟悉的界面中开发程序,同时利用PIM。尽管与基于DRAM的系统相比,速度提高了几个数量级,但TESSERACT通过SerDes链接生成了过多的交叉立方体通信,其带宽远小于HMC的合计本地带宽。我们的调查表明,这是由于顶点编程模型所需的受限制的数据组织所致。在本文中,我们认为基于PIM的图形处理系统应将数据组织作为一阶设计考虑因素。遵循这一原理,我们提出了GraphP,这是一种基于HMC的新型软件/硬件共同设计的图形处理系统,与TESSERACT相比,它可以大大减少通信和能耗。 GraphP具有三种关键技术。 1)“源剪切”分区,从根本上将跨多维数据集通信从每个跨多维数据集边缘的一个远程放置更改为每个副本一个更新。 2)“两阶段顶点程序”,一种为“源剪切”分区设计的编程模型,具有两个操作:GenUpdate和ApplyUpdate。 3)分层通信和重叠,通过提议的分区和编程模型提供的独特机会进一步提高了性能。我们使用具有5个实际图形和4种算法的周期精确模拟器来评估GraphP。结果表明,与TESSERACT相比,它平均提供1.7的加速和89%的节能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号