Improving the Effectiveness of Searching for Isomorphic Chains in Superword Level Parallelism

机译：提高超键平行中搜索同构枢尾链的有效性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most high-performance microprocessors come equipped with general purpose Single Instruction Multiple Data (SIMD) execution engines to enhance performance. Compilers use auto-vectorization techniques to identify vector parallelism and generate SIMD code so that applications can enjoy the performance benefits provided by SIMD units. Superword Level Parallelism (SLP), one such vectorization technique, forms vector operations by merging isomorphic instructions into a vector operation and linking many such operations into long isomorphic chains. However, effective grouping of isomorphic instructions remains a key challenge for SLP algorithms. In this work, we describe a new hierarchical approach for SLP. We decouple the selection of isomorphic chains and arrange them in a hierarchy of choices at the local and global levels. First, we form small local chains from a set of preferred patterns and rank them. Next, we form long global chains from the local chains using a few simple heuristics. Hierarchy allows us to balance the grouping choices of individual instructions more effectively within the context of larger local and global chains, thereby finding better opportunities for vectorization. We implement our algorithm in LLVM, and we compare it against prior work and the current SLP implementation in LLVM. A set of applications that benefit from vectorization are taken from the NAS Parallel Benchmarks and SPEC CPU 2006 suite to compare our approach and prior techniques. We demonstrate that our new algorithm finds better isomorphic chains. Our new approach achieves an 8.6% speedup, on average, compared to non-vectorized code and 2.5% speedup, on average, over LLVM-SLP. In the best case, the BT application has 11% fewer total dynamic instructions and achieves a 10.9% speedup over LLVM-SLP.

机译：大多数高性能微处理器配备通用单指令多数据（SIMD）执行引擎，以提高性能。编译器使用自动矢量化技术来识别矢量并行性并生成SIMD代码，以便应用程序可以享受SIMD单元提供的性能优势。卓遍的等级行度（SLP），一种这样的矢量化技术，通过将同构指令合并到向量操作中并将许多这样的操作链接到长同义链中来形成矢量操作。然而，有效分组的同构指令仍然是SLP算法的关键挑战。在这项工作中，我们描述了一种新的SLP的分层方法。我们脱钩了同构链的选择，并在当地和全球层面的选择层次中排列。首先，我们从一组优选的图案中形成小型本地链，并对它们进行排名。接下来，我们使用几个简单的启发式从当地链形成长全球链。层次结构允许我们更有效地在较大的本地和全球链的背景下更有效地平衡个别指示的分组选择，从而找到了矢量化的更好机会。我们在LLVM中实现了我们的算法，我们将其与LLVM中的事先工作和当前的SLP实现进行比较。从Vectiveization中受益的一组应用程序来自NAS并行基准和规范CPU 2006套件，以比较我们的方法和现有技术。我们展示了我们的新算法发现更好的同构轴。我们的新方法平均地实现了8.6％的加速，与非矢量化代码相比，平均而不是LLVM-SLP的加速2.5％。在最佳情况下，BT应用程序的总动态指令较少，并且在LLVM-SLP上实现了10.9％的加速。

著录项

来源
《International Symposium on Microarchitecture》|2017年|xix 825 p. :|共12页
会议地点
作者
Joonmoo Huh; James Tuck;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302-532;
关键词
instruction sets; multiprocessing systems; parallel processing; program compilers; vectors;

机译：指令集;多处理系统;并行处理;程序编译器;向量;

相似文献

外文文献
专利

1. Insufficient Vectorization: A New Method to Exploit Superword Level Parallelism [J] . Wei GAO, Lin HAN, Rongcai ZHAO, IEICE transactions on information and systems . 2017,第1期

机译：向量化不足：利用超字级并行性的新方法
2. A Compiler Framework for Extracting Superword Level Parallelism [J] . Jun Liu, Yuanrui Zhang, Ohyoung Jang, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2012,第6期

机译：提取超单词级并行性的编译器框架
3. Exploiting Superword Level Parallelism with Multimedia Instruction Sets [J] . Samuel Larsen, Saman Amarasinghe ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2000,第5期

机译：利用多媒体指令集开发超字级并行性
4. Improving the Effectiveness of Searching for Isomorphic Chains in Superword Level Parallelism [C] . Joonmoo Huh, James Tuck Annual IEEE/ACM International Symposium on Microarchitecture . 2017

机译：提高在超单词级并行度中搜索同构链的效率
5. Improving the Effectiveness of Searching for Isomorphic Chains in Superword Level Parallelism [D] . Huh, Joonmoo. 2017

机译：提高在超单词级并行度中搜索同构链的效率
6. On the Parallelism between the Three Thermic Mechanisms and Dr. Hughlings Jacksons Three Levels [O] . W. Hale White 1890

机译：三种热机制与休格斯·杰克逊博士的三个水平之间的平行性
7. Identifying superword level parallelism with directed graph reachability}{Identifying superword level parallelism with directed graph reachability [O] . Jie ZHAO, Rongcai ZHAO 2017

机译：用定向图到达性识别超字级并行性} {用定向图可达性识别超字级并行性

Improving the Effectiveness of Searching for Isomorphic Chains in Superword Level Parallelism

摘要

著录项

相似文献

相关主题

期刊订阅