首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores
【24h】

A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores

机译:基于混合线性代数/ FFT核的高效多核浮点FFT架构

获取原文
获取原文并翻译 | 示例
           

摘要

FFT algorithms have memory access patterns that prevent many architectures from achieving high computational utilization, particularly when parallel processing is required to achieve the desired levels of performance. Starting with a highly efficient hybrid linear algebra/FFT core, we co-design the on-chip memory hierarchy, on-chip interconnect, and FFT algorithms for a multicore FFT processor. We show that it is possible to to achieve excellent parallel scaling while maintaining power and area efficiency comparable to that of the single-core solution. The result is an architecture that can effectively use up to 16 hybrid cores for transform sizes that can be contained in on-chip SRAM. When configured with 12MiB of on-chip SRAM, our technology evaluation shows that the proposed 16-core FFT accelerator should sustain 388 GFLOPS of nominal double-precision performance, with power and area efficiencies of 30 GFLOPS/W and 2.66 GFLOPS/mm~2, respectively.
机译:FFT算法具有内存访问模式,这种模式会阻止许多体系结构实现较高的计算利用率,尤其是在需要并行处理以实现所需性能水平时尤其如此。从高效的混合线性代数/ FFT核开始,我们共同设计了多核FFT处理器的片上存储器层次结构,片上互连和FFT算法。我们证明,可以实现出色的并行缩放,同时保持与单核解决方案相当的功耗和面积效率。结果是,该架构可以有效地使用多达16个混合内核来实现可包含在片上SRAM中的转换大小。当配置有12MiB的片上SRAM时,我们的技术评估表明,建议的16核FFT加速器应维持标称双精度性能的388 GFLOPS,功率和面积效率分别为30 GFLOPS / W和2.66 GFLOPS / mm〜2 , 分别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号