首页> 外文会议>International Conference on High Performance Computing and Simulation >Modern Generative Programming for Optimizing Small Matrix-Vector Multiplication
【24h】

Modern Generative Programming for Optimizing Small Matrix-Vector Multiplication

机译:优化小矩阵矢量乘法的现代生成规划

获取原文

摘要

BLAS-level functions are the cornerstone of a large subset of applications. If a large body of work surrounding efficient and large-scale implementation of some routines such as gemv exists, the interest for small-sized, highly-optimized versions of those routines emerged. In this paper, we propose to show how a modern C++ approach based on generative programming techniques such as vectorization and loop unrolling in the framework of meta-programming can deliver efficient automatically generated codes for such routines, that are competitive with existing, hand-tuned library kernels with a very low programming effort compared to writing assembly code. In particular, we analyze the performance of automatically generated small-sized gemv kernels for both Intel x86 and ARM processors. We show through a performance comparison with the OpenBLAS gemv kernel on small matrices of sizes ranging from 4 to 32 that our C++ kernels are very efficient and may have a performance that is up to 3 times better than that of OpenBLAS gemv.
机译:Blas-Level函数是大型应用程序子集的基石。如果存在大型工作的高效和大规模实施某些例程,如GEMV,则对这些常规的小型,高度优化版本的兴趣。在本文中,我们建议展示基于生成编程技术的现代C ++方法,例如在元编程框架中展开的矢量化和循环,可以为这些例程提供高效的自动生成的代码,这与现有的,手工调整具有竞争力与编写汇编代码相比,库内核具有非常低的编程工作。特别是,我们分析了英特尔X86和ARM处理器的自动生成的小型GEMV内核的性能。我们通过与OpenBlas Gemv内核的性能比较显示在4到32的小矩阵上,我们的C ++内核非常有效,并且可能具有比OpenBlas Gemv更好的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号