【24h】

Time and energy modeling of high-performance Level-3 BLAS on x86 architectures

机译:在x86架构上的高性能Level-3 BLAS的时间和能量建模

获取原文
获取原文并翻译 | 示例
           

摘要

We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (GEMM) and the triangular system solve with multiple right-hand sides (TRSM) on x86 architectures. Our methodology decouples the costs due to the floating-point arithmetic/data movement occurring in the higher levels of the cache hierarchy from those of packing/data transfers between the main memory and the L2/L3 cache. A careful analytical study of the data transfers, in combination with an architecture-specific calibration of the costs per operation, render then the components to assemble piece-wise models for the accurate estimation of GEMM and TRSM's performance on x86 processors.
机译:我们针对x86架构上矩阵乘法(GEMM)和具有多个右侧(TRSM)的三角系统求解的高性能实现的时间和能源成本,提供了精确的分段模型。我们的方法将由于在高速缓存层次结构较高级别中发生的浮点算术/数据移动而导致的成本与主内存和L2 / L3高速缓存之间的打包/数据传输的成本相分离。对数据传输进行仔细的分析研究,再结合特定于体系结构的每次操作成本校准,可以使组件组装成分段模型,以准确估计GEMM和TRSM在x86处理器上的性能。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号