首页> 外文会议>Brazilian Symposium on Computing Systems Engineering >Runtime Vectorization of Conditional Code and Dynamic Range Loops to ARM NEON Engine
【24h】

Runtime Vectorization of Conditional Code and Dynamic Range Loops to ARM NEON Engine

机译:条件代码和动态范围循环的运行时矢量化到ARM NEON引擎

获取原文

摘要

SIMD engines are widely present in market processors aiming to improve performance of applications through Data Level Parallelism (DLP) exploitation. However, most SIMD engines rely on specific libraries and compilers to support DLP execution, which limits DLP gains since they are restricted to analyze static code. Dynamic SIMD Assembler (DSA) [8] is capable of exploiting DLP at runtime by identifying vectorizable loops to generate ARM NEON SIMD instructions. However, its DLP coverage capability is not fully exploited, since portion of code that depends on runtime information, such as dynamic range and conditional code loops are not exploited. In this work, we extend the DSA coverage by coupling the exploitation of conditional code and dynamic range loop vectorization. Results show that the proposed techniques improve the original DSA performance in 38% considering benchmarks with opportunities to exploit conditional code and dynamic range loops. In addition, the Extended DSA, besides keeping software productivity and binary compatibility, outperforms ARM compiler auto-vectorization by 12%.
机译:SIMD引擎广泛存在于市场处理器中,旨在通过数据级并行(DLP)开发来提高应用程序的性能。但是,大多数SIMD引擎都依赖于特定的库和编译器来支持DLP执行,这限制了DLP的收益,因为它们仅限于分析静态代码。动态SIMD汇编器(DSA)[8]能够通过识别矢量化循环来生成ARM NEON SIMD指令,从而在运行时利用DLP。但是,由于未利用依赖于运行时信息的部分代码(例如动态范围和条件代码循环),因此未完全利用其DLP覆盖能力。在这项工作中,我们通过结合使用条件代码和动态范围循环矢量化来扩展DSA的覆盖范围。结果表明,考虑基准测试,并利用条件代码和动态范围循环的机会,所提出的技术可将原始DSA性能提高38%。此外,扩展DSA除了保持软件生产率和二进制兼容性之外,还比ARM编译器的自动矢量化性能高12%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号