首页> 外文会议>International Conference on Field-Programmable Technology >A Reconfigurable Compute-in-the-Network FPGA Assistant for High-Level Collective Support with Distributed Matrix Multiply Case Study

【24h】

A Reconfigurable Compute-in-the-Network FPGA Assistant for High-Level Collective Support with Distributed Matrix Multiply Case Study

机译：具有分布式矩阵乘法案例研究的高级集体支持的可重新配置的网络内FPGA助理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Collectives are a fundamental part of HPC applications and their optimization has undergone decades of study. In recent years collectives have been accelerated with in-network hardware support, initially in the NIC, but recently also in the switch. This support is limited, however, to a very small set of scalar operations. In this work, we first propose that these collectives be extended to operations on composite data types such as matrices. We then demonstrate how these high-level collectives can be supported in an FPGA-based switch. In this paper, we propose a reconfigurable compute-in-the-network FPGA assistant, FPin, to implement high-level collectives in MPI. To maintain streaming packet processing while retaining reuse-based compute-intensive processing we propose a bulk-streaming message passing interface along with a methodology to tune communication-computation overlap. As a proof of concept, we evaluate the efficiency of the FPGA assistant with the ubiquitous distributed matrix multiply kernel, PGEMM. Experimental results show that PGEMM accelerated with high-level collective support can achieve, on average, 2.4× and 1.8× speedups on an FPGA cluster compared to the state-of-the-art COSMA algorithm on Stampede2 Skylake for float and complex float data types, respectively.

机译：集体是HPC申请的基本部分，其优化经历了几十年的研究。近年来，集体已经加速了网络内硬件支持，最初在NIC中，但最近也在开关中。然而，这种支持是有限的，这是一组非常小的标量操作。在这项工作中，我们首先建议这些集体扩展到矩阵中的复合数据类型的操作。然后，我们展示了基于FPGA的开关中可以支持这些高级集体。在本文中，我们提出了一种可重新配置的网络内FPGA助手FPIN，实现MPI中的高级集体。为了维护流媒体分组处理，同时保留基于重用的计算密集型处理，我们提出了批量流传递界面以及曲调通信计算重叠的方法。作为概念证明，我们评估了FPGA助手的效率与普遍存在的分布式矩阵乘法核，PGEMM。实验结果表明，与高级别集体支持的PGEMM加速，平均而言，与FPGA集群相比，在FPGA集群上的加速度，与浮动浮动和复杂的浮动数据类型的Sckede2 Skylake上的最先进的Cosma算法相比，FPGA集群上的加速，分别。

著录项

来源
《International Conference on Field-Programmable Technology 》|2020年|159-164|共6页
会议地点
作者
Pouya Haghi; Anqi Guo; Tong Geng; Justin Broaddus; Derek Schafer; Anthony Skjellum; Martin Herbordt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Message passing; Clustering algorithms; Switches; Hardware; Acceleration; Kernel; Field programmable gate arrays;

机译：消息传递;聚类算法;开关;硬件;加速;内核;现场可编程门阵列;

相似文献

外文文献
中文文献
专利

1. Run-Time-Reconfigurable Multi-Precision Floating-Point Matrix Multiplier Intellectual Property Core on FPGA [J] . Arish S., Sharma R. K. Circuits, systems, and signal processing . 2017 ,第3期

机译：FPGA上的运行时可重新配置的多精度浮点矩阵乘法器知识产权内核
2. SLOPES: Hardware–Software Cosynthesis of Low-Power Real-Time Distributed Embedded Systems With Dynamically Reconfigurable FPGAs [J] . Li Shang, Robert P. Dick, Niraj K. Jha IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2007 ,第期

机译：斜率：具有动态可重配置FPGA的低功耗实时分布式嵌入式系统的软硬件综合
3. High-Level Synthesis Algorithm for the Design of Reconfigurable Constant Multiplier [J] . Chen J., Chang C.-H. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on . 2009 ,第12期

机译：可重构常数乘法器设计的高级综合算法
4. High-Level Synthesis for Large Bit-Width Multipliers on FPGAs: A Case Study [C] . Gang Quan, James P. Davis, Siddhaveerasharan Devarkal, IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis . 2005

机译：FPGA上大型位宽度乘子的高级合成：一个案例研究
5. Cooperative High-performance Computing with FPGAs - Matrix Multiply Case-study [D] . Munafo, Robert P. 2018

机译：与FPGA的合作高性能计算 - 矩阵乘法案例研究
6. A Model-Based Design Floating-Point Accumulator. Case of Study: FPGA Implementation of a Support Vector Machine Kernel Function [O] . Marco Bassoli, Valentina Bianchi, Ilaria De Munari 2020

机译：基于模型的设计浮点累加器。研究案例：支持向量机内核功能的FPGA实现
7. Distributed control for reconfigurable FPGA systems: a high-level design approach [O] . Trabelsi Chiraz, Meftali Samy, Dekeyser Jean-Luc 2012

机译：可重构FpGa系统的分布式控制：高级设计方法

A Reconfigurable Compute-in-the-Network FPGA Assistant for High-Level Collective Support with Distributed Matrix Multiply Case Study

摘要

著录项

相似文献

相关主题

期刊订阅