GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

Dai Guohao; Huang Tianhao; Chi Yuze; Zhao Jishen; Sun Guangyu; Liu Yongpan; Wang Yu; Xie Yuan; Yang Huazhong

首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

【24h】

GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

机译：图表：用于大型图形处理的加工内存架构

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large-scale graph processing requires the high bandwidth of data access. However, as graph computing continues to scale, it becomes increasingly challenging to achieve a high bandwidth on generic computing architectures. The primary reasons include: the random access pattern causing local bandwidth degradation, the poor locality leading to unpredictable global data access, heavy conflicts on updating the same vertex, and unbalanced workloads across processing units. Processing-in-memory (PIM) has been explored as a promising solution to providing high bandwidth, yet open questions of graph processing on PIM devices remain in: 1) how to design hardware specializations and the interconnection scheme to fully utilize bandwidth of PIM devices and ensure locality and 2) how to allocate data and schedule processing flow to avoid conflicts and balance workloads. In this paper, we propose GraphH, a PIM architecture for graph processing on the hybrid memory cube array, to tackle all four problems mentioned above. From the architecture perspective, we integrate SRAM-based on-chip vertex buffers to eliminate local bandwidth degradation. We also introduce reconfigurable double-mesh connection to provide high global bandwidth. From the algorithm perspective, partitioning and scheduling methods like index mapping interval-block and round interval pair are introduced to GraphH, thus workloads are balanced and conflicts arc avoided. Two optimization methods are further introduced to reduce synchronization overhead and reuse on-chip data. The experimental results on graphs with billions of edges demonstrate that GraphH outperforms DDR-based graph processing systems by up to two orders of magnitude and 5.12x speedup against the previous PIM design.

机译：大规模图形处理需要数据访问的高带宽。然而，随着图形计算继续规模，在通用计算架构上实现高带宽变得越来越具有挑战性。主要原因包括：随机访问模式导致本地带宽劣化，导致无法预测的全局数据访问的差的众多，更新在处理单元上更新相同的顶点和不平衡工作负载。正在探索内存（PIM）作为提供高带宽的有希望的解决方案，但PIM器件上的图形处理的打开问题仍然存在：1）如何设计硬件专业化和互连方案以充分利用PIM器件的带宽并确保局部性和2）如何分配数据和计划处理流以避免冲突和平衡工作负载。在本文中，我们提出了关于混合存储器立方体阵列上的图形处理的PIM架构，以解决上述所有四个问题。从架构的角度来看，我们将基于SRAM的上芯片缓冲区集成以消除本地带宽劣化。我们还介绍可重新配置的双网格连接，以提供高全局带宽。从算法的透视图，将索引映射间隔和圆间隔对等划分和调度方法被引入GraphH，因此工作负载是平衡的并且避免了冲突弧。进一步引入了两种优化方法以减少同步开销并重用片上数据。数十亿边缘的图表上的实验结果表明，Graphh以最多两个数量级和5.12倍加速，而是针对先前的PIM设计的比例突出地表现出基于DDR的图形处理系统。

著录项

来源
《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》 |2019年第4期|640-653|共14页
作者
Dai Guohao; Huang Tianhao; Chi Yuze; Zhao Jishen; Sun Guangyu; Liu Yongpan; Wang Yu; Xie Yuan; Yang Huazhong;
展开▼
作者单位

Tsinghua Univ Dept Elect Engn Beijing 100084 Peoples R China|Tsinghua Univ Beijing Natl Res Ctr Informat Sci & Technol Beijing 100084 Peoples R China;

Tsinghua Univ Dept Elect Engn Beijing 100084 Peoples R China|Tsinghua Univ Beijing Natl Res Ctr Informat Sci & Technol Beijing 100084 Peoples R China;

Univ Calif Los Angeles Dept Comp Sci Los Angeles CA 90095 USA;

Univ Calif San Diego Jacobs Sch Engn Comp Sci & Engn Dept La Jolla CA 92093 USA;

Peking Univ Sch EECS Ctr Energy Efficient Comp & Applicat Beijing 100871 Peoples R China;

Tsinghua Univ Dept Elect Engn Beijing 100084 Peoples R China|Tsinghua Univ Beijing Natl Res Ctr Informat Sci & Technol Beijing 100084 Peoples R China;

Tsinghua Univ Dept Elect Engn Beijing 100084 Peoples R China|Tsinghua Univ Beijing Natl Res Ctr Informat Sci & Technol Beijing 100084 Peoples R China;

Univ Calif Santa Barbara Dept Elect & Comp Engn Santa Barbara CA 93106 USA;

Tsinghua Univ Dept Elect Engn Beijing 100084 Peoples R China|Tsinghua Univ Beijing Natl Res Ctr Informat Sci & Technol Beijing 100084 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Hybrid memory cube (HMC); large-scale graph processing; memory hierarchy; on-chip networks;

机译：混合内存立方体（HMC）;大规模图形处理;内存层次结构;片上网络;

相似文献

外文文献
中文文献
专利

1. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing [J] . Dai Guohao, Huang Tianhao, Chi Yuze, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2019,第4期

机译：图：用于大规模图处理的内存中处理架构
2. A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal [J] . Han Lei, Shen Zhaoyan, Liu Duo, ACM Transactions on Storage . 2018,第1期

机译：基于RERAM的基于RERAN的内存内存内存架构，用于图形遍历
3. A 2 × 30k-Spin Multi-Chip Scalable CMOS Annealing Processor Based on a Processing-in-Memory Approach for Solving Large-Scale Combinatorial Optimization Problems [J] . Takemoto Takashi, Hayashi Masato, Yoshimura Chihiro, IEEE Journal of Solid-State Circuits . 2020,第1期

机译：基于处理大规模组合优化问题的加工内存方法，2×30k-旋转多芯片可扩展CMOS退火处理器
4. Accelerating Force-directed Graph Layout with Processing-in-Memory Architecture [C] . Ruihao Li, Shuang Song, Qinzhe Wu, International Conference on High Performance Computing, Data, and Analytics . 2020

机译：使用加工内存架构加速力定向的图形布局
5. A Programmable Processing-In-Memory Architecture for Memory Intensive Applications [D] . Connolly, Mark. 2021

机译：用于内存密集型应用的可编程处理内存架构
6. A Processing-in-Memory Architecture Programming Paradigm for Wireless Internet-of-Things Applications [O] . Xu Yang, Yumin Hou, Hu He 2019

机译：无线物联网应用的内存中处理架构编程范例
7. A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures [O] . Kamil Khan, Sudeep Pasricha, Ryan Gary Kim 2020

机译：用于处理内存和近记忆处理架构的资源管理调查

GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

摘要

著录项

相似文献

相关主题

期刊订阅