SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures

Yongjun Park; Sangwon Seo; Hyunchul Park; Hyoun Kyu Cho; Scott Mahlke

首页> 外文期刊>Computer architecture news >SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures

【24h】

SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures

机译：SIMD Defragmenter：在数据并行体系结构上的高效ILP实现

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Single-instruction multiple-data (SIMD) accelerators provide an energy-efficient platform to scale the performance of mobile systems while still retaining post-programmability. The central challenge is translating the parallel resources of the SIMD hardware into real application performance. In scientific applications, automatic vectorization techniques have proven quite effective at extracting large levels of data-level parallelism (DLP). However, vectorization is often much less effective for media applications due to low trip count loops, complex control flow, and non-uniform execution behavior. As a result, SIMD lanes remain idle due to insufficient DLP. To attack this problem, this paper proposes a new vectorization pass called SIMD Defragmenter to uncover hidden DLP that lurks below the surface in the form of instruction-level parallelism (ILP). The difficulty is managing the data packing/unpacking overhead that can easily exceed the benefits gained through SIMD execution. The SIMD degragmenter overcomes this problem by identifying groups of compatible instructions (subgraphs) that can be executed in parallel across the SIMD lanes. By SIMDizing in bulk at the subgraph level, packing/unpacking overhead is minimized. On a 16-lane SIMD processor, experimental results show that SIMD defragmentation achieves a mean 1.6x speedup over traditional loop vectorization and a 31% gain over prior research approaches for converting ILP to DLP.

机译：单指令多数据（SIMD）加速器提供了一个节能平台，可扩展移动系统的性能，同时仍保留后可编程性。面临的主要挑战是将SIMD硬件的并行资源转换为实际的应用程序性能。在科学应用中，自动矢量化技术已证明在提取大量数据级并行性（DLP）方面非常有效。但是，由于跳闸次数循环少，控制流复杂以及执行行为不统一，矢量化对于媒体应用而言通常效率不高。结果，由于DLP不足，SIMD通道保持空闲。为了解决这个问题，本文提出了一种新的向量化通道，称为SIMD碎片整理程序，以发现以指令级并行（ILP）形式潜伏在表面之下的隐藏DLP。困难在于管理数据打包/拆包开销，该开销很容易超过通过SIMD执行所获得的收益。 SIMD碎片整理程序通过识别可跨SIMD通道并行执行的兼容指令（子图）组来解决此问题。通过在子图级别进行批量SIMD，可以最大程度地减少打包/拆包的开销。在16通道SIMD处理器上，实验结果表明，与传统的循环矢量化相比，SIMD碎片整理的平均速度提高了1.6倍，比将ILP转换为DLP的现有研究方法的平均速度提高了31％。

著录项

来源
《Computer architecture news》 |2012年第1期|p.363-374|共12页
作者
Yongjun Park; Sangwon Seo; Hyunchul Park; Hyoun Kyu Cho; Scott Mahlke;
展开▼
作者单位

Advanced Computer Architecture Laboratory University of Michigan - Ann Arbor, MI;

Qualcomm Incorporated, San Diego, CA;

Programming Systems Lab, Intel Labs, Santa Clara, CA;

Advanced Computer Architecture Laboratory University of Michigan - Ann Arbor, MI;

Advanced Computer Architecture Laboratory University of Michigan - Ann Arbor, MI;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
compiler; SIMD architecture; optimization;

机译：编译器信德建筑;Aptimaijasan;

相似文献

外文文献
中文文献
专利

1. SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures [J] . Yongjun Park, Sangwon Seo, Hyunchul Park, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2012,第4期

机译：SIMD Defragmenter：数据并行架构上的高效ILP实现
2. SIMDE: An Educational Simulator of ILP Architectures With Dynamic and Static Scheduling [J] . I. CASTILLA, L. MORENO, C. GONZALEZ, Computer applications in engineering education . 2007,第3期

机译：SIMDE：具有动态和静态计划的ILP体系结构的教育模拟器
3. Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP [J] . Libo Huang, Nong Xiao, Zhiying Wang, Parallel Computing . 2013,第10期

机译：具有增强的SIMD引擎的高效多媒体协处理器，可利用ILP和DLP
4. SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures [C] . Yongjun Park, Sangwon Seo, Hyunchul Park, Seventeenth international conference on architectural support for programming languages and operating systems. . 2012

机译：SIMD Defragmenter：在数据并行体系结构上的高效ILP实现
5. ILP-SIMD: An instruction parallel SIMD architecture with short -wire interconnects. [D] . Chung, Kee Shik. 2000

机译：ILP-SIMD：具有短线互连的指令并行SIMD体系结构。
6. Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel [O] . Nikolaos Alachiotis, Simon A Berger, Alexandros Stamatakis 2012

机译：耦合SIMD和SIMT体系结构以提高系统发育感知比对内核的性能
7. Efficient Realizations of the Discrete and Continuous Wavelet Transforms: from Single Chip Implementations to Mappings on SIMD Array Computers [O] . Chaitali Chakrabarti, Mohan Vishwanath 1995

机译：离散和连续小波变换的有效实现：从单片机实现到sImD阵列计算机上的映射

SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅