首页> 外文会议>Euromicro International Conference on Parallel, Distributed, and Network-Based Processing >Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels
【24h】

Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels

机译:利用Lattice-QCD内核在Intel Xeon Phi上利用超宽矢量

获取原文

摘要

Our target in this work is to study ways of exploring the parallelism offered by vectorization on accelerators with very wide vector units. To this end, we implemented two kernels that derive from the Wilson Dslash operator and investigate several data layout techniques for increasing the scalability of lattice QCD scientific kernels suitable for the Intel Xeon Phi. In parts of the application where real numbers are used for computation, we see a 6.6x increase in bandwidth compared to scalar code, thanks to the auto-vectorization by the compiler. In other kernels where arithmetic operations on complex numbers dominate, our hand-vectorized code out-performs the auto-vectorization of the compiler. In this paper we find that our proposed Hopping Vector-friendly Ordering allows for more efficient vectorization of complex arithmetic floating point operations. Using this data layout, we manage to increase the sustained bandwidth by approximately 1.8x.
机译:我们在这项工作中的目标是研究探索矢量化在具有非常宽矢量单位的加速器上提供的并行性的方法。为此,我们实现了两个从Wilson Dslash运算符派生的内核,并研究了几种数据布局技术,以增加适用于Intel Xeon Phi的点阵QCD科学内核的可伸缩性。在应用中使用实数进行计算的部分中,由于编译器的自动向量化,与标量代码相比,带宽增加了6.6倍。在其他对复数进行算术运算占主导的内核中,我们的手工矢量化代码的性能优于编译器的自动矢量化。在本文中,我们发现我们提出的“跳向矢量友好排序”可以更有效地对复杂算术浮点运算进行矢量化。使用此数据布局,我们设法将持续带宽增加了约1.8倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号