首页> 外文期刊>International journal of reconfigurable computing >An NoC Traffic Compiler for Efficient FPGA Implementation of Sparse Graph-Oriented Workloads
【24h】

An NoC Traffic Compiler for Efficient FPGA Implementation of Sparse Graph-Oriented Workloads

机译:一种NoC流量编译器,可有效实现稀疏的面向图形的工作负载的FPGA

获取原文
获取外文期刊封面目录资料

摘要

Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Dataflow compute models generate highly-structured communication workloads from messages propagating along graph edges. We can statially expose this structure to traffic compilers and optimization tools to reshape and reduce traffic for higher performance (or lower area, lower energy, lower cost). Such offline traffic optimization eliminates the need for complex, runtime NoC hardware and enables lightweight, scalable NoCs. We perform load balancing, placement, fanout routing, and fine-grained synchronization to optimize our workloads for large networks up to 2025 parallel elements for BSP model and 25 parallel elements for Token Dataflow. This allows us to demonstrate speedups between 1.2× and 22× (3.5× mean), area reductions (number of Processing Elements) between 3× and 15× (9× mean) and dynamic energy savings between 2× and 3.5× (2.7× mean) over a range of real-world graph applications in the BSP compute model. We deliver speedups of 0.5–13× (geomean 3.6×) for Sparse Direct Matrix Solve (Token Dataflow compute model) applied to a range of sparse matrices when using a high-quality placement algorithm. We expect such traffic optimization tools and techniques to become an essential part of the NoC application-mapping flow.
机译:大容量同步并行(BSP)和令牌数据流计算模型中表示的面向图形的并行应用程序通过沿图形边缘传播的消息生成高度结构化的通信工作负载。我们可以将此结构静态地提供给流量编译器和优化工具,以重塑形状并减少流量,以获得更高的性能(或更低的面积,更低的能耗,更低的成本)。这种离线流量优化无需复杂的运行时NoC硬件,并实现了轻量级,可扩展的NoC。我们执行负载平衡,布局,扇出路由和细粒度同步,以优化大型网络的工作负载,对于BSP模型,最多2025个并行元素,对于令牌数据流,最多25个并行元素。这使我们能够演示1.2倍至22倍(平均3.5倍)的加速,3倍至15倍(平均9倍)的面积减少(处理元件数量)以及2倍至3.5倍(2.7倍)的动态节能。均值)在BSP计算模型中的一系列实际图形应用中。当使用高质量的放置算法时,我们将适用于一系列稀疏矩阵的稀疏直接矩阵求解(令牌数据流计算模型)的速度提高了0.5–13倍(几何数为3.6倍)。我们希望这些流量优化工具和技术将成为NoC应用程序映射流程的重要组成部分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号