A universal single-bitstream FPGA library or ASIC implementation accelerates matrix-vector multiplication processing multiple matrix encodings including dense and multiple sparse formats. A hardware-optimized sparse matrix representation referred to herein as the Compressed Variable-Length Bit Vector (CVBV) format is used to take advantage of the capabilities of FPGAs and reduce storage and bandwidth requirements across the matrices compared to that typically achieved when using the Compressed Sparse Row (CSR) format in typical CPU- and GPU-based approaches. Also disclosed is a class of sparse matrix formats that are better suited for FPGA implementations than existing formats reducing storage and bandwidth requirements. A partitioned CVBV format is described to enable parallel decoding.
展开▼