Compiler techniques for improving SIMD parallelism

机译：改善SIMD并行性的编译器技术

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Modern CPUs are equipped with Single Instruction Multiple Data (SIMD) engines operating on short vectors, in order to meet the growing demands of accelerating multimedia and scientific applications. Automatic simdization performed by the compiler, as a convenient way of utilizing the SIMD ISA extensions, however, faces challenges of translating SIMD resource into actual performance. Despite the fact that discovering SIMD parallelism is sometimes difficult due to the imprecise dependence analysis, for some SIMD parallelism that can be identified, current vectorization techniques are ineffective. In this thesis, we examine mixed SIMD parallelism and partial SIMD parallelism, which are both poorly exploited by the existing approaches. Two new vectorization techniques, i.e., Loop-Mix and Paver, are proposed, to tackle themrespectively. Existing loop vectorization techniques exploit either intra- or inter-iteration SIMD parallelism alone in a code region, if one part of the region vectorized for one type of parallelism has data dependences (called mixed-parallelism-inhibitingdependences) on the other part of the region vectorized for the other type of parallelism. We consider a class of loops that exhibit both types of parallelism in its code regions that contain mixed-parallelism-inhibiting data dependences (i.e.,mixed SIMD parallelism). We present a new compiler approach Loop-Mix for exploiting such mixed SIMD parallelism effectively by reducing the data reorganization overhead incurred when one type of parallelism is switched to the other. Additionally, existing loop vectorization techniques are ineffective for loops that exhibit little loop-level parallelism but some limited superword-level parallelism (SLP), where the SIMD parallelism is insufficient to fulfil the SIMD datapath (i.e., partial SIMD parallelism). We show that effectively vectorizing such loops requires partial vector operations to be executed correctly and efficiently. We present a simple yet effective SLP compiler technique, called Paver (PArtial VEctorizeR), formulated as a generalization of the traditional SLP algorithm, to optimize such partially vectorizable loops. The key idea is to maximize SIMD utilization by widening vector instructions used while minimizing the overheads caused by memory access, packing/unpacking, and/or masking operations, without introducing new memory errors or new numeric exceptions. Both Loop-Mix and Paver are simple and have been implemented in LLVM. We evaluate them with several real-world programs containing mixed or partial SIMD parallelism and demonstrate their performance advantages over the state-of-the-art.

机译：现代CPU配备了以短向量运行的单指令多数据（SIMD）引擎，以满足日益增长的加速多媒体和科学应用的需求。作为使用SIMD ISA扩展的便捷方法，由编译器执行的自动仿真面临将SIMD资源转换为实际性能的挑战。尽管由于不精确的依赖关系分析有时发现SIMD并行性有时很困难，但对于某些可以识别的SIMD并行性，当前的矢量化技术无效。在本文中，我们研究了混合SIMD并行性和部分SIMD并行性，它们在现有方法中都很少被利用。提出了两种新的矢量化技术，即Loop-Mix和Paver，以分别解决它们。如果针对一种并行性进行矢量化的区域的一部分对区域的另一部分具有数据依赖性（称为混合并行抑制性的依赖性），则现有的循环矢量化技术仅在代码区域中利用迭代内或迭代间SIMD并行性。针对另一种并行性进行向量化。我们考虑一类在其代码区域中表现出两种并行性的循环，这些区域包含禁止混合并行性的数据相关性（即混合SIMD并行性）。我们提出了一种新的编译器方法Loop-Mix，可通过减少一种并行性转换为另一种并行性时引起的数据重组开销来有效利用这种混合SIMD并行性。此外，现有的循环矢量化技术对于显示出很少的循环级并行性但具有有限的超字级并行度（SLP）的循环无效，其中SIMD并行度不足以满足SIMD数据路径（即部分SIMD并行度）。我们表明，有效地向量化此类循环需要正确有效地执行部分向量运算。我们提出了一种简单而有效的SLP编译器技术，称为Paver（PArtial VEctorizeR），它是对传统SLP算法的概括，旨在优化这种部分可矢量化的循环。关键思想是通过扩展使用的向量指令来最大化SIMD利用率，同时最大程度地减少由内存访问，打包/拆包和/或屏蔽操作导致的开销，而不会引入新的内存错误或新的数字异常。 Loop-Mix和Paver都很简单，并已在LLVM中实现。我们使用包含混合或部分SIMD并行性的几个实际程序来评估它们，并展示它们在最新技术方面的性能优势。

著录项

作者
Zhou Hao Computer Science Engineering Faculty of Engineering UNSW;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. A Compiler Approach for Exploiting Partial SIMD Parallelism [J] . Zhou Hao, Xue Jingling ACM Transactions on Architecture and Code Optimization . 2016,第1期

机译：利用部分SIMD并行性的编译器方法
2. Improving SIMD Parallelism via Dynamic Binary Translation [J] . Hong Ding-Yong, Liu Yu-Ping, Fu Sheng-Yu, ACM Transactions on Embedded Computing Systems . 2018,第3期

机译：通过动态二进制转换改善SIMD并行性
3. Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism [J] . Jack L. Lo, Susan J. Eggers ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 1995,第6期

机译：通过提高指令级并行度的编译器优化来改善均衡调度
4. Vector Parallelism in JavaScript: Language and Compiler Support for SIMD [C] . Ivan Jibaja, Peter Jensen, Ningxin Hu, 2015 International Conference on Parallel Architecture and Compilation . 2015

机译：JavaScript中的矢量并行性：SIMD的语言和编译器支持
5. Extracting data-level parallelism from sequential programs for SIMD execution. [D] . Baumstark, Lewis Benton, Jr. 2004

机译：从顺序程序中提取数据级并行性以执行SIMD。
6. Rubus: A compiler for seamless and extensible parallelism [O] . Muhammad Adnan, Faisal Aslam, Zubair Nawaz, 2011

机译：Rubus：无缝和可扩展并行性的编译器
7. A compiler approach for exploiting partial SIMD parallelism [O] . Zhou, H, Xue, J 2016

机译：利用部分SIMD并行性的编译器方法
8. Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions. [R] . Yang, C., Sano, B., Lebeck, A. R. 2005

机译：利用通用处理器和浮点sImD指令开发几何处理中的并行性。

Compiler techniques for improving SIMD parallelism

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅