...
首页> 外文期刊>Progress in Artificial Intelligence >FPGA-Based Hardware Matrix Inversion Architecture Using Hybrid Piecewise Polynomial Approximation Systolic Cells
【24h】

FPGA-Based Hardware Matrix Inversion Architecture Using Hybrid Piecewise Polynomial Approximation Systolic Cells

机译:基于FPGA的硬件矩阵反转架构使用混合分段多项式近似收缩细胞

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The hardware of the matrix inversion architecture using QR decomposition with Givens Rotations (GR) and a back substitution (BS) block is required for many signal processing algorithms. However, the hardware of the GR algorithm requires the implementation of complex operations, such as the reciprocal square root (RSR), which is typically implemented using LookUp Table (LUT) and COordinate Rotation DIgital Computer (CORDICs), among others, conveying to either high-area consumption or low throughput. This paper introduces an Field-Programmable Gate Array (FPGA)-based full matrix inversion architecture using hybrid piecewise polynomial approximation systolic cells. In the design, a hybrid segmentation technique was incorporated for the implementation of piecewise polynomial systolic cells. This hybrid approach is composed by an external and internal segmentation, where the first is nonuniform and the second is uniform, fitting the curve shape of the complex functions achieving a better signal-quantization-to noise-ratio; furthermore, it improves the time performance and area resources. Experimental results reveal a well-balanced improvement in the design achieving high throughput and, hence, less resource utilization in comparison to state-of-the-art FPGA-based architectures. In our study, the proposed design achieves 7.51 Mega-Matrices per second for performing 4 x 4 matrix operations with a latency of 12 clock cycles; meanwhile, the hardware design requires only 1474 slice registers, 1458 LUTs in an FPGA Virtex-5 XC5VLX220T, and 1474 slice registers and 1378 LUTs when a FPGA Virtex-6 XC6VLX240T is used.
机译:对于许多信号处理算法,需要使用与Givens旋转(GR)和后替换(BS)块使用QR分解的矩阵反转架构的硬件。但是,GR算法的硬件需要实现复杂的操作,例如倒数平方根(RSR),其通常使用查找表(LUT)和坐标旋转数字计算机(CORDICS)等来实现传送到任何一种高面积消耗或低吞吐量。本文介绍了一种现场可编程门阵列(FPGA) - 基于混合分段多项式近似收缩单元的全矩阵反转架构。在设计中,结合了混合分割技术,用于实施分段多项式收缩细胞。这种混合方法由外部和内部分割组成,其中第一是不均匀的,第二是均匀的,拟合复杂功能的曲线形状,实现更好的信号量化 - 噪声比。此外,它提高了时间性能和面积资源。实验结果揭示了实现高吞吐量的设计良好的改进,从而与最先进的基于FPGA的架构相比,资源利用较少。在我们的研究中,拟议的设计实现了7.51兆矩阵,每秒执行4×4矩阵操作,延迟为12个时钟周期;同时,硬件设计只需要1474个切片寄存器,在FPGA Virtex-5 XC5VLX220T中的1458 LUT,1474片寄存器和1378 LUT时使用,当使用FPGA Virtex-6 XC6VLX240T时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号