Towards a Universal FPGA Matrix-Vector Multiplication Architecture

机译：迈向通用FPGA矩阵向量乘法架构

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present the design and implementation of a universal, single-bit stream library for accelerating matrix-vector multiplication using FPGAs. Our library handles multiple matrix encodings ranging from dense to multiple sparse formats. A key novelty in our approach is the introduction of a hardware-optimized sparse matrix representation called Compressed Variable-Length Bit Vector (CVBV), which reduces the storage and bandwidth requirements up to 43% (on average 25%) compared to compressed sparse row (CSR) across all the matrices from the University of Florida Sparse Matrix Collection. Our hardware incorporates a runtime-programmable decoder that performs on-the-fly-decoding of various formats such as Dense, COO, CSR, DIA, and ELL. The flexibility and scalability of our design is demonstrated across two FPGA platforms: (1) the BEE3 (Virtex-5 LX155T with 16GB of DRAM) and (2) ML605 (Virtex-6 LX240T with 2GB of DRAM). For dense matrices, our approach scales to large data sets with over 1 billion elements, and achieves robust performance independent of the matrix aspect ratio. For sparse matrices, our approach using a compressed representation reduces the overall bandwidth while also achieving comparable efficiency relative to state-of-the-art approaches.

机译：我们介绍了一种通用的单比特流库的设计和实现，该库用于使用FPGA加速矩阵矢量乘法。我们的库处理从密集到稀疏格式的多种矩阵编码。我们方法的一个关键新颖之处是引入了一种硬件优化的稀疏矩阵表示形式，称为压缩可变长度位向量（CVBV），与压缩稀疏行相比，它最多可将存储和带宽需求减少多达43％（平均25％）（CSR）来自佛罗里达大学稀疏矩阵集合的所有矩阵。我们的硬件包含一个运行时可编程的解码器，该解码器可以对各种格式（例如Dense，COO，CSR，DIA和ELL）进行即时解码。我们在两个FPGA平台上展示了我们设计的灵活性和可扩展性：（1）BEE3（具有16GB DRAM的Virtex-5 LX155T）和（2）ML605（具有2GB DRAM的Virtex-6 LX240T）。对于稠密矩阵，我们的方法可扩展到包含10亿个元素的大型数据集，并获得与矩阵长宽比无关的强大性能。对于稀疏矩阵，我们使用压缩表示的方法减少了总带宽，同时还实现了与最新方法相当的效率。

著录项

来源
《Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on》|2012年|p.9- 16|共8页
会议地点 Toronto(CA)
作者
Kestur Srinidhi; Davis John D.; Chung Eric S.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类电子数字计算机（不连续作用电子计算机）;
关键词

相似文献

外文文献
中文文献
专利

1. FPGA architecture and implementation of sparse matrix-vector multiplication for the finite element method [J] . Elkurdi Y, Fernandez D, Souleimanov E, Computer physics communications . 2008,第8期

机译：有限元方法的稀疏矩阵向量乘法的FPGA体系结构和实现
2. An I/O Bandwidth-Sensitive Sparse Matrix-Vector Multiplication Engine on FPGAs [J] . Sun S., Monga M., Jones P. H., Circuits and Systems I: Regular Papers, IEEE Transactions on . 2012,第1期

机译：FPGA上的I / O带宽敏感的稀疏矩阵矢量乘法引擎
3. High performance sparse matrix-vector multiplication on FPGA [J] . Dan Zou, Shice Ni, Song Guo, IEICE Electronics Express . 2013,第17期

机译：FPGA上的高性能稀疏矩阵矢量乘法
4. Towards a Universal FPGA Matrix-Vector Multiplication Architecture [C] . Kestur Srinidhi, Davis John D., Chung Eric S. IEEE Annual International Symposium on Field-Programmable Custom Computing Machines . 2012

机译：迈向通用FPGA矩阵矢量乘法架构
5. A Scalable and Flexible Framework for Gaussian Processes via Matrix-Vector Multiplication [D] . Pleiss, Geoff. 2020

机译：通过矩阵矢量乘法可扩展和灵活的高斯过程框架
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. Sparse matrix-vector multiplication for finite element method matrices on FPGAs [O] . Yousef El-kurdi Warren J. Gross 2006

机译：FpGa上有限元法矩阵的稀疏矩阵向量乘法

Towards a Universal FPGA Matrix-Vector Multiplication Architecture

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅