首页> 外文期刊>Concurrency and computation: practice and experience >Compiler supports for VLIW DSP processors with SIMD intrinsics
【24h】

Compiler supports for VLIW DSP processors with SIMD intrinsics

机译:编译器支持具有SIMD内在函数的VLIW DSP处理器

获取原文
获取原文并翻译 | 示例

摘要

To sustain growing multimedia workload, modern digital signal processing (DSP) processors are commonly equipped with subword instructions to accelerate signal processing. Besides subword, functional units of very long instruction word (VLIW) DSP processors can also be employed to process multiple data streams in parallel. However, because of power and area concerns, many embedded VLIW DSP processors adopt distributed register files to reduce read/write ports and wire connection by privatizing register files for clusters and even for functional units. The distributed design presents great challenges to compilers in distributing single instruction, multiple data (SIMD) workload to functional units. In this paper, we address the issue in supporting SIMD parallelism on VLIW DSP processors with subword instructions and distributed register files. Currently, industrial practices have adopted intrinsics that enable developers to utilize hardware resources and compete with hand-coded assembly in performance. However, it is still an open issue to provide such a solution for VLIW DSP processors with distributed register files. In this work, we provide SIMD intrinsics to allow programmers to write highly optimized codes by following given programming guides. In addition, an enhanced register allocation scheme and data replication optimizations are devised to enable efficient code generation. In our experiments, DSPstone benchmark and a set of H.264 kernels are used to evaluate the proposed programming and optimization schemes. The result shows that by combining SIMD intrinsics and compiler optimizations, one is able to obtain remarkable performance improvements, speedups of 2.9 and 3.5 for DSPstone and H.264 kernels, respectively.
机译:为了承受不断增长的多媒体工作量,现代数字信号处理(DSP)处理器通常配备子字指令以加速信号处理。除子字外,超长指令字(VLIW)DSP处理器的功能单元也可用于并行处理多个数据流。但是,由于功率和面积的考虑,许多嵌入式VLIW DSP处理器采用分布式寄存器文件,以通过私有化群集甚至功能单元的寄存器文件来减少读/写端口和有线连接。分布式设计在向功能单元分配单指令,多数据(SIMD)工作负载方面给编译器带来了巨大挑战。在本文中,我们通过子字指令和分布式寄存器文件解决了在VLIW DSP处理器上支持SIMD并行性的问题。当前,工业实践已经采用了使开发人员能够利用硬件资源并在性能上与手工编码组件竞争的内在函数。但是,为具有分布式寄存器文件的VLIW DSP处理器提供这样的解决方案仍然是一个悬而未决的问题。在这项工作中,我们提供SIMD内在函数,使程序员可以按照给定的编程指南来编写高度优化的代码。另外,设计了增强的寄存器分配方案和数据复制优化以实现有效的代码生成。在我们的实验中,使用DSPstone基准测试和一组H.264内核来评估所提出的编程和优化方案。结果表明,通过将SIMD内在函数和编译器优化相结合,可以显着提高性能,DSPstone和H.264内核的速度分别提高2.9和3.5。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号