首页> 外文会议>IEEE International Symposium on Computer Architecture and High Performance Computing >A Highly Efficient SGEMM Implementation using DMA on the Intel/Movidius Myriad-2
【24h】

A Highly Efficient SGEMM Implementation using DMA on the Intel/Movidius Myriad-2

机译:在Intel / Movidius Myriad-2上使用DMA的高效SGEMM实现

获取原文

摘要

Reducing energy consumption and achieving high energy efficiency in computation has become the top priority in High Performance Computing. High energy efficiency generally requires high resource utilization since energy demand for any applications and architectures is dependent on active time. We show that by using DMA the 28nm CMOS node Myriad-2 Vision Processing Unit can achieve 25 GFLOPs/W for FP32 matrixmultiplication. Our main contributions are: (i) An analysis of data transfer needs for inner and outer-product formulations of matrix multiplication with respect to the Myriad-2 memory hierarchy, (ii) An efficient use of DMA for managing matrix block transfers between on-chip and main memory (iii) A detailed analysis of the effects of matrix block shapes and DRAM page faults on performance and energy efficiency.
机译:降低能耗并在计算中实现高能效已成为高性能计算的重中之重。高能效通常需要高资源利用率,因为任何应用程序和架构的能源需求都取决于活动时间。我们证明,通过使用DMA,28nm CMOS节点Myriad-2视觉处理单元可以实现FP32矩阵乘法的25 GFLOPs / W。我们的主要贡献是:(i)就Myriad-2存储器层次结构对矩阵乘法的内乘和外乘公式表示的数据传输需求进行分析,(ii)有效地使用DMA来管理在线之间的矩阵块传输。芯片和主存储器(iii)详细分析矩阵块形状和DRAM页面错误对性能和能效的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号