Processing Grid-format Real-world Graphs on DRAM-based FPGA Accelerators with Application-specific Caching Mechanisms

Shao Zhiyuan; Liu Chenhao; Li Ruoshi; Liao Xiaofei; Jin Hai

首页> 外文期刊>ACM transactions on reconfigurable technology and systems >Processing Grid-format Real-world Graphs on DRAM-based FPGA Accelerators with Application-specific Caching Mechanisms

【24h】

Processing Grid-format Real-world Graphs on DRAM-based FPGA Accelerators with Application-specific Caching Mechanisms

机译：处理基于DRAM的FPGA加速器网格格式实际图表，具有特定于应用的缓存机制

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Graph processing is one of the important research topics in the big-data era. To build a general framework for graph processing by using a DRAM-based FPGA board with deep memory hierarchy, one of the reasonable methods is to partition a given big graph into multiple small subgraphs, represent the graph with a two-dimensional grid, and then process the subgraphs one after another to divide and conquer the whole problem. Such a method (grid-graph processing) stores the graph data in the off-chip memory devices (e.g., on-board or host DRAM) that have large storage capacities but relatively small bandwidths, and processes individual small subgraphs one after another by using the on-chip memory devices (e.g., FFs, BRAM, and URAM) that have small storage capacities but superior random access performances. However, directly exchanging graph (vertex and edge) data between the processing units in FPGA chip with slow off-chip DRAMs during gridgraph processing leads to limited performances and excessive data transmission amounts between the FPGA chip and off-chip memory devices.In this article, we show that it is effective in improving the performance of grid-graph processing on DRAM-based FPGA hardware accelerators by leveraging the flexibility and programmability of FPGAs to build application-specific caching mechanisms, which bridge the performance gaps between on-chip and off-chip memory devices, and reduce the data transmission amounts by exploiting the localities on data accessing. We design two application-specific caching mechanisms (i.e., vertex caching and edge caching) to exploit two types of localities (i.e., vertex locality and subgraph locality) that exist in grid-graph processing, respectively. Experimental results show that with the vertex caching mechanism, our system (named as FabGraph) achieves up to 3.1x and 2.5x speedups for BFS and PageRank, respectively, over ForeGraph when processing medium graphs stored in the on-board DRAM. With the edge caching mechanism, the extension of FabGraph (named as FabGraph+) achieves up to 9.96x speedups for BFS over FPGP when processing large graphs stored in the host DRAM.

机译：图形处理是大数据时代的重要研究主题之一。要通过使用基于DRAM的FPGA板构建图形处理的一般框架，其中一个合理的方法是将给定的大图分为多个小子图，表示具有二维网格的图表，然后表示一个接一个地处理子图，分裂并征服整个问题。这种方法（网格图处理）将图形数据存储在具有大存储容量但相对较小的带宽中的片外存储器设备（例如，车载或主机DRAM）中，并通过使用片上存储器设备（例如，FF，BRAM和URAM），其具有小的存储容量，而是卓越的随机接入性能。然而，在网格处理过程中直接在FPGA芯片中的处理单元之间的图形（顶点和边缘）数据与慢芯片DRAM在网格处理中导致FPGA芯片和片外存储器设备之间的有限性能和过度的数据传输量。在本文中我们认为，通过利用FPGA的灵活性和可编程来构建特定于应用的缓存机制，提高基于DRAM的FPGA硬件加速器的网格图处理的性能是有效的，这弥合了片上和关闭之间的性能间隙-Chip存储器设备，通过利用数据访问的本地来减少数据传输量。我们设计了两个特定于应用程序的缓存机制（即，顶点缓存和边缘缓存），分别利用网格图处理中存在的两种类型的本地（即顶点位置和子图）。实验结果表明，随着顶点缓存机制，我们的系统（命名为Fabgraph），分别为BFS和PageRank的加速度高达3.1倍和2.5倍，在处理载机中的DRAM中的介质图时，可以通过Foregraph。利用边缘缓存机制，在处理存储在主机DRAM中的大图时，FABGRAMPRE的延伸（命名为FABGRAGH +）在FPGP上实现了高达9.96倍的加速。

著录项

来源
《ACM transactions on reconfigurable technology and systems》 |2020年第3期|11.1-11.33|共33页
作者
Shao Zhiyuan; Liu Chenhao; Li Ruoshi; Liao Xiaofei; Jin Hai;
展开▼
作者单位

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Hardware accelerators; graph analytics; large graph processing;

机译：硬件加速器;图分析;大图处理;

相似文献

外文文献
中文文献
专利

1. Hardware reuse in modern application-specific processors and accelerators [J] . Alexandre S. Nery, Lech Jozwiak, Menno Lindwer, Microprocessors and microsystems . 2013,第6a7期

机译：现代专用处理器和加速器中的硬件重用
2. Energy optimization of Application-Specific Instruction-Set Processors by using hardware accelerators in semicustom ICs technology [J] . Uwe Meyer-Baese, Guillermo Botella, Soumak Mookherjee, Microprocessors and microsystems . 2012,第2期

机译：通过半定制IC技术中的硬件加速器来优化专用指令集处理器的能量
3. Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms [J] . HajiRassouliha Amir, Taberner Andrew J., Nash Martyn P., Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2018,第期

机译：适用于电脑视觉和图像处理算法的最近硬件加速器（DSP，FPGA和GPU）的适用性
4. Automatic generation of application-specific accelerators for FPGAs from python loop nests [C] . Sheffield David, Anderson Michael, Keutzer Kurt 2012 22nd International conference on field programmable logic and applications. . 2012

机译：从python循环嵌套自动生成FPGA的专用加速器
5. LAMP: Tools for creating application-specific FPGA coprocessors. [D] . VanCourt, Thomas David. 2006

机译：LAMP：用于创建专用FPGA协处理器的工具。
6. 7. RETINAL FUNCTIONS EXPRESSED IN RETINAL IMAGING CONTRAST PROCESSING AND ELECTRORETINOGRAPHY MAY DECRYPT EARLY RISK MECHANISMS AND PATHOPHYSIOLOGY OF SCHIZOPHRENIA AND MOOD DISORDERS AND ACCELERATE TRANSLATION TO THE CLINIC [O] . Michel Maziade 2018

机译：7.视网膜成像造影剂处理和电子视网膜照相术中表现出的视网膜功能可能会降低精神分裂症和情绪障碍的早期风险机制和病理生理并加速向诊所的翻译
7. Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems [O] . Stephen Brown, Tomasz Czajkowski 2015

机译：高速缓存架构和接口对基于FpGa的处理器/并行加速器系统性能和面积的影响

Processing Grid-format Real-world Graphs on DRAM-based FPGA Accelerators with Application-specific Caching Mechanisms

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅