首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A complete compiler approach to auto-parallelizing C programs for multi-DSP systems
【24h】

A complete compiler approach to auto-parallelizing C programs for multi-DSP systems

机译:用于多DSP系统的C程序自动并行化的完整编译器方法

获取原文
获取原文并翻译 | 示例

摘要

Auto-parallelizing compilers for embedded applications have been unsuccessful due to the widespread use of pointer arithmetic and the complex memory model of multiple-address space digital signal processors (DSPs). This work develops, for the first time, a complete auto-parallelization approach, which overcomes these issues. It first combines a pointer conversion technique with a new modulo elimination transformation for program recovery enabling later parallelization stages. Next, it integrates a novel data transformation technique that exposes the processor location of partitioned data. When this is combined with a new address resolution mechanism, it generates efficient programs that run on multiple address spaces without using message passing. Furthermore, as DSPs do not possess any data cache structure, an optimization is presented which transforms the program to both exploit remote data locality and local memory bandwidth. This parallelization approach is applied to the DSPstone and UTDSP benchmark suites, giving an average speedup of 3.78 on four analog devices TigerSHARC TS-101 processors.
机译:由于指针算法的广泛使用和多地址空间数字信号处理器(DSP)的复杂存储模型,用于嵌入式应用程序的自动并行化编译器一直未获得成功。这项工作首次开发了一种完整的自动并行化方法,可以克服这些问题。它首先将指针转换技术与新的模消除转换相结合,以实现程序恢复,从而支持后续的并行化阶段。接下来,它集成了一种新颖的数据转换技术,该技术可以显示分区数据的处理器位置。与新的地址解析机制结合使用时,它将生成在多个地址空间上运行的高效程序,而无需使用消息传递。此外,由于DSP不具有任何数据高速缓存结构,因此提出了一种优化方法,该优化方法将程序转换为利用远程数据局部性和本地内存带宽。这种并行化方法适用于DSPstone和UTDSP基准套件,在四个模拟设备TigerSHARC TS-101处理器上的平均加速速度为3.78。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号