首页> 美国政府科技报告 >Blocked Implementation of Level 3 BLAS for RISC Processors
【24h】

Blocked Implementation of Level 3 BLAS for RISC Processors

机译:阻止RIsC处理器的3级BLas实施

获取原文

摘要

The authors describe a version of the Level 3 BLAS which is designed to beefficient on RISC processors. This is an extension of previous studies by the same authors (Amestoy, Dayde, Duff & Morere (1995), Dayde, Duff & Petitet (1994), and Dayde & Duff (1995)) where they describe a similar approach for efficient serial and parallel implementations of Level 3 BLAS on shared and virtual shared memory multiprocessors. All the codes are written in Fortran and use loop-unrolling, blocking, and copying to improve the performance. The blocking technique is used to express the BLAS in terms of operations involving triangular blocks and calls to the matrix-matrix multiplication kernel (GEMM). No manufacturer-supplied or assembler code is used. This blocked implementation uses the same blocking ideas as in Dayde et al. (1994) except that the ordering of loops is designed for efficient reuse of data held in cache and not necessarily for parallelization. A parameter which controls the blocking allows efficient exploitation of the memory hierarchy on the various target computers. The authors present results on a range of RISC-based workstations and multiprocessors, viz. DEC 3000/4000 AXP, DEC 8400 5/300, HP 715/64, IBM RS/6000-750, MEIKO CS2-HA, SGI Power Challenge L, and SUN SPARC 20/50. (Copyright (c) 1995 Council for the Central Laboratory of the Research Council.)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号