首页> 外文期刊>Scientific programming >Inastemp: A Novel Intrinsics-as-Template Library for Portable SIMD-Vectorization
【24h】

Inastemp: A Novel Intrinsics-as-Template Library for Portable SIMD-Vectorization

机译:Inastemp:一种用于便携式SIMD矢量化的新颖的作为模板的内部库

获取原文
获取原文并翻译 | 示例

摘要

The development of scientific applications requires highly optimized computational kernels to benefit from modern hardware. In recent years, vectorization has gained key importance in exploiting the processing capabilities of modern CPUs, whose evolution is characterized by increasing register-widths and core numbers, but stagnating clock frequencies. In particular, vectorization allows floating point operations to be performed at a higher rate than the processor's frequency. However, compilers often fail to vectorize complex codes and pure assembly/intrinsic implementations often suffer from software engineering issues, such as readability and maintainability. Moreover, it is difficult for domain scientists to write optimized code without technical support. To address these issues, we propose Inastemp, a lightweight open-source C++ library. Inastemp offers a solution to develop hardware-independent computational kernels for theCPU. These kernels are portable across compilers and floating point precision and vectorized targeting SSE(3,4.1,4.2), AVX(2), AVX512, or ALTIVEC/VMX instructions. Inastemp provides advanced features, such as an if-else statement that vectorizes branches that cannot be removed. Our performance study shows that Inastemp has the same efficiency as pure intrinsic approaches onmodern architectures. As side-results, this study provides micro benchmarks on the latest HPC architectures for three different computational kernels, emphasizing comparisons between scalar and intrinsic-based codes.
机译:科学应用程序的发展需要高度优化的计算内核才能从现代硬件中受益。近年来,向量化在利用现代CPU的处理能力方面已变得至关重要,而现代CPU的发展特点是寄存器宽度和内核数增加,但时钟频率却停滞不前。特别是,矢量化允许以比处理器频率更高的速率执行浮点运算。但是,编译器通常无法向量化复杂代码,并且纯汇编/内部实现经常遭受软件工程问题(如可读性和可维护性)的困扰。而且,如果没有技术支持,领域科学家很难编写优化的代码。为了解决这些问题,我们提出了Inastemp,一个轻量级的开源C ++库。 Inastemp提供了一种为CPU开发独立于硬件的计算内核的解决方案。这些内核可跨编译器和浮点精度移植,并针对SSE(3,4.1,4.2),AVX(2),AVX512或ALTIVEC / VMX指令进行矢量化处理。 Inastemp提供了高级功能,例如if-else语句,该语句将无法删除的分支矢​​量化。我们的性能研究表明,Inastemp与现代架构上的纯内在方法具有相同的效率。作为附带的结果,本研究提供了针对三种不同计算内核的最新HPC体系结构的微型基准,强调了标量代码与基于内在代码的比较。

著录项

  • 来源
    《Scientific programming》 |2017年第2期|5482468.1-5482468.18|共18页
  • 作者

    Bramas Berenger;

  • 作者单位

    MPCDF, Gieenbachstr 2, D-85748 Garching, Germany;

  • 收录信息 美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号