...
首页> 外文期刊>Journal of systems architecture >A comprehensive reconfigurable computing approach to memory wall problem of large graph computation
【24h】

A comprehensive reconfigurable computing approach to memory wall problem of large graph computation

机译:解决大型图计算内存墙问题的全面可重构计算方法

获取原文
获取原文并翻译 | 示例
           

摘要

Graph computation problems that exhibit irregular memory access patterns are known to show poor performance on multiprocessor architectures. Although recent studies use FPGA technology to tackle the memory wall problem of graph computation by adopting a massively multi-threaded architecture, the performance is still far less than optimal memory performance due to the long memory access latency. In this paper, we propose a comprehensive reconfigurable computing approach to address the memory wall problem. First, we present an extended edge-streaming model with massive partitions to provide better load balance while taking advantage of the streaming bandwidth of external memory in processing large graphs. Second, we propose a two-level shuffle network architecture to significantly reduce the on chip memory requirement while provide high processing throughput that matches the bandwidth of the external memory. Third, we introduce a compact storage design based on graph compression schemes and propose the corresponding encoding and decoding hardware to reduce the data volume transferred between the processing engines and external memory. We validate the effectiveness of the proposed architecture by implementing three frequently-used graph algorithms on ML605 board, showing an up to 3.85 x improvement in terms of performance to bandwidth ratio over previously published FPGA-based implementations. (C) 2016 Elsevier B.V. All rights reserved.
机译:已知显示不规则内存访问模式的图形计算问题在多处理器体系结构上显示出较差的性能。尽管最近的研究使用FPGA技术通过采用大规模多线程体系结构来解决图形计算的内存墙问题,但是由于较长的内存访问等待时间,其性能仍远远低于最佳内存性能。在本文中,我们提出了一种全面的可重构计算方法来解决内存墙问题。首先,我们提出了具有大量分区的扩展边缘流模型,以提供更好的负载平衡,同时在处理大型图形时利用外部内存的流带宽。其次,我们提出了一种两级混洗网络架构,以显着减少片上存储器的需求,同时提供与外部存储器的带宽相匹配的高处理吞吐量。第三,我们介绍了一种基于图压缩方案的紧凑型存储设计,并提出了相应的编解码硬件,以减少处理引擎和外部存储器之间传输的数据量。我们通过在ML605板上实施三种常用的图形算法来验证所提出体系结构的有效性,与以前发布的基于FPGA的实现相比,在性能与带宽比方面显示出高达3.85倍的改进。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号