【24h】

Introducing Control Flow into Vectorized Code

机译:将控制流引入矢量化代码中

获取原文
获取原文并翻译 | 示例

摘要

Single instruction multiple data (SIMD) functional units are ubiquitous in modern microprocessors. Effective use of these SIMD functional units is essential in achieving the highest possible performance. Automatic generation of SIMD instructions in the presence of control flow is chal- lenging, however, not only because SIMD code is hard to generate in the presence of arbitrarily complex control flow, but also because the SIMD code executing the instructions in all control paths may slow compared to the scalar orig- inal, which may bypass a large portion of the code. One promising technique introduced recently involves inserting branches-on-superword-condition-codes (BOSCCs) to by- pass vector instructions. In this paper, we describe two techniques that improve on the previous approach. First, BOSCCs are generated in a nested fashion so that even BOSCCs themselves can be bypassed by other BOSCCs. Second, we generate all vec_any_* instructions to by- pass even some predicate-defining instructions.We imple- mented these techniques in a vectorizing compiler. On 14 kernels, the compiler achieves distinct speedups, including 1.99X over the previous technique that generates single- level BOSCCs and vec_any_ne only.
机译:单指令多数据(SIMD)功能单元在现代微处理器中无处不在。有效使用这些SIMD功能单元对于实现最高性能至关重要。在存在控制流的情况下自动生成SIMD指令非常困难,这不仅是因为在存在任意复杂的控制流的情况下难以生成SIMD代码,而且还因为在所有控制路径中执行指令的SIMD代码可能与标量原始数据相比,它的速度较慢,后者可能会绕过大部分代码。最近推出的一项有前途的技术涉及在超字条件代码(BOSCC)中插入分支以绕过矢量指令。在本文中,我们描述了在以前的方法上有所改进的两种技术。首先,BOSCC以嵌套方式生成,因此,即使BOSCC本身也可以被其他BOSCC绕过。其次,我们生成所有vec_any_ *指令以绕过某些谓词定义指令。我们在矢量化编译器中实现了这些技术。在14个内核上,编译器实现了明显的加速,与仅生成单级BOSCC和仅vec_any_ne的先前技术相比,其速度提高了1.99倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号