Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels

机译：利用Lattice-QCD内核在Intel Xeon Phi上利用超宽矢量

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Our target in this work is to study ways of exploring the parallelism offered by vectorization on accelerators with very wide vector units. To this end, we implemented two kernels that derive from the Wilson Dslash operator and investigate several data layout techniques for increasing the scalability of lattice QCD scientific kernels suitable for the Intel Xeon Phi. In parts of the application where real numbers are used for computation, we see a 6.6x increase in bandwidth compared to scalar code, thanks to the auto-vectorization by the compiler. In other kernels where arithmetic operations on complex numbers dominate, our hand-vectorized code out-performs the auto-vectorization of the compiler. In this paper we find that our proposed Hopping Vector-friendly Ordering allows for more efficient vectorization of complex arithmetic floating point operations. Using this data layout, we manage to increase the sustained bandwidth by approximately 1.8x.

机译：我们在这项工作中的目标是研究探索矢量化在具有非常宽矢量单位的加速器上提供的并行性的方法。为此，我们实现了两个从Wilson Dslash运算符派生的内核，并研究了几种数据布局技术，以增加适用于Intel Xeon Phi的点阵QCD科学内核的可伸缩性。在应用中使用实数进行计算的部分中，由于编译器的自动向量化，与标量代码相比，带宽增加了6.6倍。在其他对复数进行算术运算占主导的内核中，我们的手工矢量化代码的性能优于编译器的自动矢量化。在本文中，我们发现我们提出的“跳向矢量友好排序”可以更有效地对复杂算术浮点运算进行矢量化。使用此数据布局，我们设法将持续带宽增加了约1.8倍。

著录项

来源
《Euromicro International Conference on Parallel, Distributed, and Network-Based Processing》|2016年|296-300|共5页
会议地点
作者
Andreas Diavastos; Giannos Stylianou; Giannis Koutsou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Lattice QCD; Xeon Phi; accelerators; many-cores;

机译：莱迪思QCD;至强融核;加速器;许多核心;

相似文献

外文文献
中文文献
专利

1. Exploiting Parallelism and Vectorisation in Breadth-First Search for the Intel Xeon Phi [J] . Mireya Paredes, Graham Riley, Mikel Luján Parallel and Distributed Systems, IEEE Transactions on . 2020,第1期

机译：在广度搜索英特尔Xeon Phi的广泛搜索方面的平行和载体
2. Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors [J] . Pawel Czarnul International journal of parallel programming . 2017,第5期

机译：大向量之间相似性度量的并行计算的混合英特尔至强/至强融核系统的基准性能
3. Effective SIMD Vectorization for Intel Xeon Phi Coprocessors [J] . XinminTian, HidekiSaito, Serguei V.Preis, Scientific programming . 2015,第4期

机译：适用于英特尔至强融核协处理器的有效SIMD矢量化
4. Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels [C] . Andreas Diavastos, Giannos Stylianou, Giannis Koutsou Euromicro International Conference on Parallel, Distributed, and Network-Based Processing . 2016

机译：利用Lattice-QCD内核在英特尔Xeon Phi上利用非常宽的向量
5. Porting and Tuning Numerical Kernels in Real-World Applications to Many-Core Intel Xeon Phi Accelerators. [D] . Khuvis, Samuel. 2016

机译：将实际应用中的数字内核移植和调整到多核Intel Xeon Phi加速器。
6. Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™ [O] . Jeremias M. Gomes, George Teodoro, Alba de Melo, -1

机译：英特尔®至强融核™上的高效不规则波前传播算法
7. Exploiting Parallelism and Vectorisation in Breadth-First Search for the Intel Xeon Phi [O] . Mireya Paredes, Graham Riley, Mikel Lujan 2020

机译：在广度搜索英特尔Xeon Phi的广泛搜索方面的平行和载体

Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels

摘要

著录项

相似文献

相关主题

期刊订阅