Decoupled iteration mapping: improving dependency-loop performance on SIMD processors

Huanyao Dai; Hui Yang; Jianghua Wan; Shuming Chen

首页> 外文期刊>IEICE Electronics Express >Decoupled iteration mapping: improving dependency-loop performance on SIMD processors

【24h】

Decoupled iteration mapping: improving dependency-loop performance on SIMD processors

机译：解耦的迭代映射：提高SIMD处理器上的依赖循环性能

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

References(7) Wide Single Instruction Multiple Data (SIMD) architectures are very important in the compute-intensive applications, but less efficient for applications with cross-iteration dependency loops which are difficult to parallelize and vectorize. This paper introduces Decoupled Iteration Mapping (DIM), a technique that dynamically maps a cross-iteration dependency loop onto the improved SIMD architecture which achieved multicore-like thread-parallel performance. The minor modification on the baseline architecture is composed of a Prefetch Unit & Instruction Buffer Array (PU&IBA), a Loop Control Unit & Instruction Dispatch Unit (LCU&IDU), and a Data Buffer Chain (DBC). Experimental results show that, the proposed DIM scheme can achieve average 3.04x performance speedup with a cost of only 6.44% area overhead.

机译：参考文献（7）宽单指令多数据（SIMD）架构在计算密集型应用程序中非常重要，但对于具有交叉迭代依赖性循环却难以并行化和矢量化的应用程序，效率较低。本文介绍了去耦迭代映射（DIM），该技术可将交叉迭代依赖性循环动态映射到改进的SIMD体系结构上，从而实现了类似多核的线程并行性能。对基准体系结构的次要修改由预取单元和指令缓冲区阵列（PU＆IBA），循环控制单元和指令分配单元（LCU＆IDU）以及数据缓冲区链（DBC）组成。实验结果表明，所提出的DIM方案可以实现平均3.04倍的性能提升，而面积开销仅为6.44％。

著录项

来源
《IEICE Electronics Express》 |2013年第21期|共页
作者
Huanyao Dai; Hui Yang; Jianghua Wan; Shuming Chen;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类通信;
关键词

相似文献

外文文献
中文文献
专利

1. Improved SIMD Architecture for High Performance Video Processors [J] . Lo W.-Y., Lun D. P.-K., Siu W.-C., Circuits and Systems for Video Technology, IEEE Transactions on . 2011,第12期

机译：针对高性能视频处理器的改进的SIMD架构
2. Efficient Utilization of Vector Registers to Improve FFT Performance on SIMD Microprocessors [J] . Feng YU, Ruifeng GE, Zeke WANG IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences . 2013,第7期

机译：向量寄存器的有效利用以提高SIMD微处理器上的FFT性能
3. An Enhanced Memory Address Mapping Scheme for Improved Memory Access Performance of 2-D DWT Processing Systems [J] . SZE-WEI LEE, SOON-CHIEH LIM Journal of VLSI signal processing systems . 2007,第3期

机译：一种增强的内存地址映射方案，用于提高2-D DWT处理系统的内存访问性能
4. Mapping a VLIW脳SIMD Processor on an FPGA: Scalability and Performance [C] . Nelissen Micha, van Berkel Kees, Sawitzki Sergei, International Conference on Field Programmable Logic and Applications . 2007

机译：在FPGA上映射VLIW脳IMD处理器：可伸缩性和性能
5. Comparison of the Performance of NVIDIA Accelerators with SIMD and Associative Processors on Real-Time Applications [D] . Shaker, Alfred. 2017

机译：利用SIMD和关联处理器对实时应用的促进剂的性能比较
6. High-Performance Iterative Electron Tomography Reconstruction with Long-Object Compensation using Graphics Processing Units (GPUs) [O] . Wei Xu, Fang Xu, Mel Jones, -1

机译：使用图形处理单元（GPU）具有长对象补偿的高性能迭代电子断层扫描重建
7. Decoupled iteration mapping: improving dependency-loop performance on SIMD processors [O] . Hui Yang, Shuming Chen, Jianghua Wan, 2013

机译：解耦迭代映射：提高SIMD处理器上的依赖性循环性能
8. Improved FFT (Fast Fourier Transform) for a Four Parallel Pipe SIMD Arithmetic Processor [R] . Tylaska, T. T., Choinski, T. C. 1985

机译：改进的FFT（快速傅立叶变换）用于四并行管道sImD算术处理器

Decoupled iteration mapping: improving dependency-loop performance on SIMD processors

摘要

著录项

相似文献

相关主题

期刊订阅