首页> 外文会议>Design, Automation and Test in Europe Conference and Exhibition >Boosting SIMD Benefits through a Run-time and Energy Efficient DLP Detection
【24h】

Boosting SIMD Benefits through a Run-time and Energy Efficient DLP Detection

机译:通过运行时和节能DLP检测提高SIMD的收益

获取原文

摘要

Data Level Parallelism has been improving performance-energy tradeoff of current processors by coupling SIMD engines, such as Intel AVX and ARM NEON. Special libraries and compilers are used to support DLP execution on such engines. However, timing overhead on hand coding is inevitable since most software developers are not skilled to extract DLP using unfamiliar libraries. In addition, DLP detection through compiler, besides breaking software compatibility, is limited to static code analysis, which compromises performance gains. In this work, we propose a runtime DLP detection named as Dynamic SIMD Assembler, which transparently identifies vectorizable code regions to execute in the ARM NEON engine. Due to its dynamic fashion, DSA keeps software compatibility and avoids timing overhead on software developing process. Results have shown that DSA outperforms ARM NEON auto-vectorization compiler by 32% since it covers wider vectorized regions, such as Dynamic Range, Sentinel and Conditional Loops. In addition, DSA outperforms hand-vectorized code using ARM library by 26% reducing 45% of energy consumption with no penalties over software development time.
机译:数据级并行性通过结合SIMD引擎(例如Intel AVX和ARM NEON)一直在改善当前处理器的性能与能耗之间的权衡。特殊的库和编译器用于支持此类引擎上的DLP执行。但是,手工编码的定时开销是不可避免的,因为大多数软件开发人员都不熟练使用不熟悉的库来提取DLP。此外,通过编译器进行DLP检测,除了破坏软件兼容性外,还限于静态代码分析,这会损害性能。在这项工作中,我们提出了一个名为“动态SIMD汇编程序”的运行时DLP检测,该检测透明地标识了可在ARM NEON引擎中执行的可向量化代码区域。由于其动态方式,DSA保持了软件兼容性,并避免了软件开发过程中的时序开销。结果表明,DSA覆盖ARM NEON自动矢量化编译器32%,因为它涵盖了更宽的矢量化区域,例如动态范围,前哨和条件循环。此外,DSA优于使用ARM库的手工矢量化代码26%,减少了45%的能源消耗,并且不会对软件开发时间造成任何影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号