首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit
【24h】

Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit

机译:具有多存储单元的SIMD处理器基于编译器的性能评估

获取原文
获取原文并翻译 | 示例

摘要

The single instruction multiple data (SIMD) architecture is very efficient for executing arithmetic intensive programs, but frequently suffers from data-alignment problems. The data-alignment problem not only induces extra time overhead but also hinders automatic vectorization of the SIMD compiler. In this paper, we compare three on-chip memory systems, which are single-bank, multi-bank, and multi-port, for the SIMD architecture to resolve the data-alignment problems. The single-bank memory is the simplest, but supports only the aligned accesses. The multi-bank memory requires a little higher complexity, but enables the unaligned accesses and the stride accesses with a bank-conflict limitation. The multi-port memory is capable of both the unaligned and stride accesses without any restriction, but needs quite much expensive hardware. We also developed a vectorizing compiler that can conduct dynamic memory allocation and SIMD code generation. The performances of the three memory systems with our SIMD compiler are evaluated using several digital signal processing kernels and the MPEG2 encoder. The experimental results show that the multi-bank memory can carry out MPEG2 encoding 5.8 times faster, whereas the single-bank memory only achieves 2.9 times speed-up when employed in a multimedia system with a 2-issue host processor andrnan 8-way SIMD coprocessor. The multi-port memory obviously shows the best performance, which is however an impractical improvement over the multi-bank memory when the hardware cost is considered.
机译:单指令多数据(SIMD)架构对于执行算术密集型程序非常有效,但经常会遇到数据对齐问题。数据对齐问题不仅会导致额外的时间开销,还会阻碍SIMD编译器的自动矢量化。在本文中,我们针对SIMD架构比较了三种单芯片,多芯片和多端口的片上存储系统,以解决数据对齐问题。单排存储器是最简单的,但仅支持对齐的访问。多库内存要求稍微高一点的复杂性,但是可以通过库冲突限制实现未对齐的访问和跨步访问。多端口存储器能够不受限制地进行未对齐访问和跨步访问,但是需要非常昂贵的硬件。我们还开发了矢量化编译器,可以执行动态内存分配和SIMD代码生成。我们的SIMD编译器使用三个数字信号处理内核和MPEG2编码器来评估这三个存储系统的性能。实验结果表明,多存储体存储器可以更快地执行MPEG2编码5.8倍,而单存储体存储器在带有2个问题主处理器和rnan 8路SIMD的多媒体系统中使用时,只能实现2.9倍的加速。协处理器。多端口存储器显然表现出最好的性能,但是,考虑到硬件成本,这是对多库存储器的不切实际的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号