首页> 外文期刊>ACM transactions on reconfigurable technology and systems >Processing Grid-format Real-world Graphs on DRAM-based FPGA Accelerators with Application-specific Caching Mechanisms
【24h】

Processing Grid-format Real-world Graphs on DRAM-based FPGA Accelerators with Application-specific Caching Mechanisms

机译:处理基于DRAM的FPGA加速器网格格式实际图表,具有特定于应用的缓存机制

获取原文
获取原文并翻译 | 示例

摘要

Graph processing is one of the important research topics in the big-data era. To build a general framework for graph processing by using a DRAM-based FPGA board with deep memory hierarchy, one of the reasonable methods is to partition a given big graph into multiple small subgraphs, represent the graph with a two-dimensional grid, and then process the subgraphs one after another to divide and conquer the whole problem. Such a method (grid-graph processing) stores the graph data in the off-chip memory devices (e.g., on-board or host DRAM) that have large storage capacities but relatively small bandwidths, and processes individual small subgraphs one after another by using the on-chip memory devices (e.g., FFs, BRAM, and URAM) that have small storage capacities but superior random access performances. However, directly exchanging graph (vertex and edge) data between the processing units in FPGA chip with slow off-chip DRAMs during gridgraph processing leads to limited performances and excessive data transmission amounts between the FPGA chip and off-chip memory devices.In this article, we show that it is effective in improving the performance of grid-graph processing on DRAM-based FPGA hardware accelerators by leveraging the flexibility and programmability of FPGAs to build application-specific caching mechanisms, which bridge the performance gaps between on-chip and off-chip memory devices, and reduce the data transmission amounts by exploiting the localities on data accessing. We design two application-specific caching mechanisms (i.e., vertex caching and edge caching) to exploit two types of localities (i.e., vertex locality and subgraph locality) that exist in grid-graph processing, respectively. Experimental results show that with the vertex caching mechanism, our system (named as FabGraph) achieves up to 3.1x and 2.5x speedups for BFS and PageRank, respectively, over ForeGraph when processing medium graphs stored in the on-board DRAM. With the edge caching mechanism, the extension of FabGraph (named as FabGraph+) achieves up to 9.96x speedups for BFS over FPGP when processing large graphs stored in the host DRAM.
机译:图形处理是大数据时代的重要研究主题之一。要通过使用基于DRAM的FPGA板构建图形处理的一般框架,其中一个合理的方法是将给定的大图分为多个小子图,表示具有二维网格的图表,然后表示一个接一个地处理子图,分裂并征服整个问题。这种方法(网格图处理)将图形数据存储在具有大存储容量但相对较小的带宽中的片外存储器设备(例如,车载或主机DRAM)中,并通过使用片上存储器设备(例如,FF,BRAM和URAM),其具有小的存储容量,而是卓越的随机接入性能。然而,在网格处理过程中直接在FPGA芯片中的处理单元之间的图形(顶点和边缘)数据与慢芯片DRAM在网格处理中导致FPGA芯片和片外存储器设备之间的有限性能和过度的数据传输量。在本文中我们认为,通过利用FPGA的灵活性和可编程来构建特定于应用的缓存机制,提高基于DRAM的FPGA硬件加速器的网格图处理的性能是有效的,这弥合了片上和关闭之间的性能间隙-Chip存储器设备,通过利用数据访问的本地来减少数据传输量。我们设计了两个特定于应用程序的缓存机制(即,顶点缓存和边缘缓存),分别利用网格图处理中存在的两种类型的本地(即顶点位置和子图)。实验结果表明,随着顶点缓存机制,我们的系统(命名为Fabgraph),分别为BFS和PageRank的加速度高达3.1倍和2.5倍,在处理载机中的DRAM中的介质图时,可以通过Foregraph。利用边缘缓存机制,在处理存储在主机DRAM中的大图时,FABGRAMPRE的延伸(命名为FABGRAGH +)在FPGP上实现了高达9.96倍的加速。

著录项

  • 来源
  • 作者单位

    Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Peoples R China;

    Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Peoples R China;

    Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Peoples R China;

    Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Peoples R China;

    Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Hardware accelerators; graph analytics; large graph processing;

    机译:硬件加速器;图分析;大图处理;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号