首页> 外文会议>Brazilian Symposium on Computing Systems Engineering >Runtime Vectorization of Conditional Code and Dynamic Range Loops to ARM NEON Engine
【24h】

Runtime Vectorization of Conditional Code and Dynamic Range Loops to ARM NEON Engine

机译:条件代码和动态范围循环的运行时向量化到ARM Neon引擎

获取原文

摘要

SIMD engines are widely present in market processors aiming to improve performance of applications through Data Level Parallelism (DLP) exploitation. However, most SIMD engines rely on specific libraries and compilers to support DLP execution, which limits DLP gains since they are restricted to analyze static code. Dynamic SIMD Assembler (DSA) [8] is capable of exploiting DLP at runtime by identifying vectorizable loops to generate ARM NEON SIMD instructions. However, its DLP coverage capability is not fully exploited, since portion of code that depends on runtime information, such as dynamic range and conditional code loops are not exploited. In this work, we extend the DSA coverage by coupling the exploitation of conditional code and dynamic range loop vectorization. Results show that the proposed techniques improve the original DSA performance in 38% considering benchmarks with opportunities to exploit conditional code and dynamic range loops. In addition, the Extended DSA, besides keeping software productivity and binary compatibility, outperforms ARM compiler auto-vectorization by 12%.
机译:SIMD发动机广泛存在于市场处理器中,旨在通过数据水平并行性(DLP)开发来提高应用程序的性能。但是,大多数SIMD引擎依赖于特定库和编译器来支持DLP执行,这限制了DLP增益,因为它们仅限于分析静态代码。动态SIMD汇编程序(DSA)[8]能够通过识别可将可用的循环识别要生成ARM Neon SIMD指令的运行时进行DLP。然而,它的DLP覆盖能力未充分利用,因为未剥开依赖于运行时信息的代码部分,例如动态范围和条件代码循环。在这项工作中,通过耦合条件代码和动态范围循环矢量化的开发来扩展DSA覆盖范围。结果表明,考虑利用条件代码和动态范围循环的机会,拟议的技术在38%中提高了原始DSA性能。此外,除了保持软件生产力和二进制兼容性之外,扩展DSA还优于ARM编译器自动向量化12%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号