【24h】

Towards a Universal FPGA Matrix-Vector Multiplication Architecture

机译:迈向通用FPGA矩阵向量乘法架构

获取原文
获取原文并翻译 | 示例

摘要

We present the design and implementation of a universal, single-bit stream library for accelerating matrix-vector multiplication using FPGAs. Our library handles multiple matrix encodings ranging from dense to multiple sparse formats. A key novelty in our approach is the introduction of a hardware-optimized sparse matrix representation called Compressed Variable-Length Bit Vector (CVBV), which reduces the storage and bandwidth requirements up to 43% (on average 25%) compared to compressed sparse row (CSR) across all the matrices from the University of Florida Sparse Matrix Collection. Our hardware incorporates a runtime-programmable decoder that performs on-the-fly-decoding of various formats such as Dense, COO, CSR, DIA, and ELL. The flexibility and scalability of our design is demonstrated across two FPGA platforms: (1) the BEE3 (Virtex-5 LX155T with 16GB of DRAM) and (2) ML605 (Virtex-6 LX240T with 2GB of DRAM). For dense matrices, our approach scales to large data sets with over 1 billion elements, and achieves robust performance independent of the matrix aspect ratio. For sparse matrices, our approach using a compressed representation reduces the overall bandwidth while also achieving comparable efficiency relative to state-of-the-art approaches.
机译:我们介绍了一种通用的单比特流库的设计和实现,该库用于使用FPGA加速矩阵矢量乘法。我们的库处理从密集到稀疏格式的多种矩阵编码。我们方法的一个关键新颖之处是引入了一种硬件优化的稀疏矩阵表示形式,称为压缩可变长度位向量(CVBV),与压缩稀疏行相比,它最多可将存储和带宽需求减少多达43%(平均25%) (CSR)来自佛罗里达大学稀疏矩阵集合的所有矩阵。我们的硬件包含一个运行时可编程的解码器,该解码器可以对各种格式(例如Dense,COO,CSR,DIA和ELL)进行即时解码。我们在两个FPGA平台上展示了​​我们设计的灵活性和可扩展性:(1)BEE3(具有16GB DRAM的Virtex-5 LX155T)和(2)ML605(具有2GB DRAM的Virtex-6 LX240T)。对于稠密矩阵,我们的方法可扩展到包含10亿个元素的大型数据集,并获得与矩阵长宽比无关的强大性能。对于稀疏矩阵,我们使用压缩表示的方法减少了总带宽,同时还实现了与最新方法相当的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号