Many application areas, such as digital communication and image processing, make extensive use of ma-trix multiplication operations, and the computational performance of this operation is critical for the whole system. A parallel double-precision floating-point matrix multiplier with pipeline architecture was designed to improve the computational performance. The design was implemented in a Xilinx Virtex-5 LX155 field programmable gate array ( FPGA). Up to 10 processing elements were integrated in a single FPGA device, and they were arranged as an ar-ray to achieve parallel computation. The processing elements employed pipelined architecture to increase the speed, and C-slow retiming was applied to solve the data-related conflicts issues on the loop pipeline. The post-Route sim-ulation results show that the peak performance of the matrix multiplier can achieve 5 000 MFLOPS. In addition, the matrix multiplication experiments with differenl dimensions were carried out, and the results confirm that the design achieved high computational performance.%在数字通信、图像处理等应用领域中需要用到大量的矩阵乘法运算,并且它的计算性能是影响系统性能的关键因素.设计了一个全流水结构的并行双精度浮点矩阵乘法器以提高计算性能,并在Xilinx Virtex-5 LX155现场可编程门阵列(FPGA)上完成了方案的实现.乘法器中处理单元(PE)按阵列形式排列,在一个FPGA芯片上可集成10个PE单元实现并行计算.为了提高工作频率,PE单元采用流水线结构,并运用C-slow时序重排技术解决了环路流水线上“数据相关冲突”的问题.仿真结果表明,该乘法器的峰值计算性能可达到5000 MFLOPS.此外,对不同维数的矩阵乘法进行了实验,其结果也证实了该设计达到了较高的计算性能.
展开▼