Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usage

Mounir Bahtat; Said Belkouch; Philippe Elleaume; Philippe Le Gall

首页> 外文期刊>EURASIP journal on advances in signal processing >Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usage

【24h】

Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usage

机译：指令调度启发式算法，可在VLIW处理器中实现高效FFT，并且资源使用平衡

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The fast Fourier transform (FFT) is perhaps today’s most ubiquitous algorithm used with digital data; hence, it is still being studied extensively. Besides the benefit of reducing the arithmetic count in the FFT algorithm, memory references and scheme’s projection on processor’s architecture are critical for a fast and efficient implementation. One of the main bottlenecks is in the long latency memory accesses to butterflies’ legs and in the redundant references to twiddle factors. In this paper, we describe a new FFT implementation on high-end very long instruction word (VLIW) digital signal processors (DSP), which presents improved performance in terms of clock cycles due to the resulting low-level resource balance and to the reduced memory accesses of twiddle factors. The method introduces a tradeoff parameter between accuracy and speed. Additionally, we suggest a cache-efficient implementation methodology for the FFT, dependently on the provided VLIW hardware resources and cache structure. Experimental results on a TI VLIW DSP show that our method reduces the number of clock cycles by an average of 51 % (2 times acceleration) when compared to the most assembly-optimized and vendor-tuned FFT libraries. The FFT was generated using an instruction-level scheduling heuristic. It is a modulo-based register-sensitive scheduling algorithm, which is able to compute an aggressively efficient sequence of VLIW instructions for the FFT, maximizing the parallelism rate and minimizing clock cycles and register usage.

机译：快速傅里叶变换（FFT）可能是当今使用最广泛的数字数据算法。因此，它仍在被广泛研究。除了减少FFT算法中的算术计数的好处之外，内存引用和方案在处理器体系结构上的投影对于快速高效地实施也至关重要。主要瓶颈之一是对蝴蝶腿的长时间访问内存访问以及对旋转因子的冗余引用。在本文中，我们描述了在高端超长指令字（VLIW）数字信号处理器（DSP）上的新FFT实现，由于产生的低级资源平衡和减少了资源消耗，该方法在时钟周期方面表现出更高的性能旋转因子的内存访问。该方法引入了精度和速度之间的折衷参数。此外，根据所提供的VLIW硬件资源和缓存结构，我们建议一种用于FFT的缓存有效的实现方法。在TI VLIW DSP上进行的实验结果表明，与最优化装配和厂商调整的FFT库相比，我们的方法将时钟周期数平均减少了51％（加速的2倍）。 FFT是使用指令级调度启发法生成的。它是基于模块的寄存器敏感调度算法，能够为FFT计算VLIW指令的有效序列，从而最大程度地提高了并行速率，并最小化了时钟周期和寄存器使用率。

著录项

来源
《EURASIP journal on advances in signal processing》 |2016年第1期|共页
作者
Mounir Bahtat; Said Belkouch; Philippe Elleaume; Philippe Le Gall;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类通信;
关键词
FFTVLIWScheduling heuristicTwiddle factorsModulo schedulingSoftware pipelining;

机译：FFTVLI调度启发式纠缠因子模调度软件流水线;

相似文献

外文文献
中文文献
专利

1. A Rough Programming Approach to Power-Balanced Instruction Scheduling for VLIW Digital Signal Processors [J] . Shu Xiao, Lai E.M.-K. IEEE Transactions on Signal Processing . 2008,第4期

机译：VLIW数字信号处理器的功率平衡指令调度的粗糙编程方法
2. Instruction scheduling and transformation for a VLIW unified reduced instruction set computer/digital signal processor processor with shared register architecture [J] . Cheng-Yu Lee, Min-Chin Hung, Rong-Guey Chang Concurrency and computation: practice and experience . 2014,第1期

机译：具有共享寄存器架构的VLIW统一精简指令集计算机/数字信号处理器处理器的指令调度和转换
3. Instruction scheduling with k-successor tree for clustered VLIW processors [J] . Xuemeng Zhang, Hui Wu, Jingling Xue Design automation for embedded systems . 2013,第2期

机译：集群VLIW处理器的k后继树指令调度
4. An efficient heuristic for instruction scheduling on clustered VLIW processors [C] . Zhang Xuemeng, Wu Hui, Xue Jingling 2011 Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems . 2011

机译：在集群VLIW处理器上进行指令调度的有效启发式方法
5. VLIW processors: Efficiently exploiting instruction level parallelism. [D] . Rudd, Kevin William. 2000

机译：VLIW处理器：有效利用指令级并行性。
6. Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures [O] . Haijing Tang, Xu Yang, Siye Wang, 2013

机译：连接寄存器文件的集群式VLIW架构的优化指令调度和寄存器分配
7. Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usage [O] . Mounir Bahtat, Said Belkouch, Philippe Elleaume, 2016

机译：指令调度启发式算法，可在VLIW处理器中实现高效FFT，并且资源使用平衡

Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usage

摘要

著录项

相似文献

相关主题

期刊订阅