首页> 外文会议>International workshop on applied parallel computing >Superscalar GEMM-based Level 3 BLAS - The On-going Evolution of a Portable and High-Performance Library
【24h】

Superscalar GEMM-based Level 3 BLAS - The On-going Evolution of a Portable and High-Performance Library

机译:基于Superscalar Gemm的3级Blas - 便携式和高性能库的正在进行的演变

获取原文

摘要

Recently, a first version of our GEMM-based level 3 BLAS for superscalar type processors was announced. A new feature is the inclusion of DGEMM itself. This DGEMM routine contains inline what we call a level 3 kernel routine, which is based on register blocking. Additionally, it features level 1 cache blocking and data copying of sub-matrix operands for the level 3 kernel. Our other BLAS's which possess triangular operands, e.g., DTRSM, DSYRK use a similar level 3 kernel routine to handle the triangular blocks that appear on the diagonal of the larger input triangular operand. Like our previous GEMM-based work all other BLAS's perform the dominating part of the computations in calls to DGEMM. We are seeing the adoption of our BLAS's by several organizations, including the ATLAS and PHiPAC projects on automatic generation of fast DGEMM kernels for superscalar processors, and some computer vendors. The evolution of the superscalar GEMM-based level 3 BLAS is presented. Also, we describe new developments which include techniques that make the library applicable to symmetric multiprocessing (SMP) systems.
机译:最近,宣布了Superscalar型处理器的基于Gemm的3级Blas的第一个版本。一个新功能是包含DGEMM本身。此DGEMM例程包含内联我们所谓的3级内核例程,它基于寄存器阻塞。此外,它具有第3级内核的第1级缓存阻塞和数据复制子矩阵操作数。我们的其他BLAS具有三角形操作数,例如DTRSM,DTRSM,DSyrk使用类似的级别3内核例程来处理出现在较大输入三角形操作数的对角线上的三角形块。与我们以前的基于Gemm的工作一样,所有其他Blas都会在调用DGEMM中执行计算的主导部分。我们正在看到由若干组织采用我们的Blas,包括用于自动生成Superscalar处理器的快速DGEMM内核的地图集和PhiPac项目,以及一些计算机供应商。提出了超卡血清宝石的3级Blas的演变。此外,我们描述了新的开发,包括使图书馆适用于对称多处理(SMP)系统的技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号