...
首页> 外文期刊>EURASIP journal on advances in signal processing >Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usage
【24h】

Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usage

机译:指令调度启发式算法,可在VLIW处理器中实现高效FFT,并且资源使用平衡

获取原文
   

获取外文期刊封面封底 >>

       

摘要

The fast Fourier transform (FFT) is perhaps today’s most ubiquitous algorithm used with digital data; hence, it is still being studied extensively. Besides the benefit of reducing the arithmetic count in the FFT algorithm, memory references and scheme’s projection on processor’s architecture are critical for a fast and efficient implementation. One of the main bottlenecks is in the long latency memory accesses to butterflies’ legs and in the redundant references to twiddle factors. In this paper, we describe a new FFT implementation on high-end very long instruction word (VLIW) digital signal processors (DSP), which presents improved performance in terms of clock cycles due to the resulting low-level resource balance and to the reduced memory accesses of twiddle factors. The method introduces a tradeoff parameter between accuracy and speed. Additionally, we suggest a cache-efficient implementation methodology for the FFT, dependently on the provided VLIW hardware resources and cache structure. Experimental results on a TI VLIW DSP show that our method reduces the number of clock cycles by an average of 51 % (2 times acceleration) when compared to the most assembly-optimized and vendor-tuned FFT libraries. The FFT was generated using an instruction-level scheduling heuristic. It is a modulo-based register-sensitive scheduling algorithm, which is able to compute an aggressively efficient sequence of VLIW instructions for the FFT, maximizing the parallelism rate and minimizing clock cycles and register usage.
机译:快速傅里叶变换(FFT)可能是当今使用最广泛的数字数据算法。因此,它仍在被广泛研究。除了减少FFT算法中的算术计数的好处之外,内存引用和方案在处理器体系结构上的投影对于快速高效地实施也至关重要。主要瓶颈之一是对蝴蝶腿的长时间访问内存访问以及对旋转因子的冗余引用。在本文中,我们描述了在高端超长指令字(VLIW)数字信号处理器(DSP)上的新FFT实现,由于产生的低级资源平衡和减少了资源消耗,该方法在时钟周期方面表现出更高的性能旋转因子的内存访问。该方法引入了精度和速度之间的折衷参数。此外,根据所提供的VLIW硬件资源和缓存结构,我们建议一种用于FFT的缓存有效的实现方法。在TI VLIW DSP上进行的实验结果表明,与最优化装配和厂商调整的FFT库相比,我们的方法将时钟周期数平均减少了51%(加速的2倍)。 FFT是使用指令级调度启发法生成的。它是基于模块的寄存器敏感调度算法,能够为FFT计算VLIW指令的有效序列,从而最大程度地提高了并行速率,并最小化了时钟周期和寄存器使用率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号