首页> 外文会议>Design, Automation Test in Europe Conference Exhibition >A Data Layout Transformation (DLT) accelerator: Architectural support for data movement optimization in accelerated-centric heterogeneous systems
【24h】

A Data Layout Transformation (DLT) accelerator: Architectural support for data movement optimization in accelerated-centric heterogeneous systems

机译:数据布局转换(DLT)加速器:加速中心异构系统中数据移动优化的架构支持

获取原文

摘要

Technology scaling and growing use of accelerators make optimization of data movement of increasing importance in all computing systems. Further, growing diversity in memory structures makes embedding such optimization in software non-portable. We propose a novel architectural solution called Data Layout Transformation (DLT) associated with a simple set of instructions that enable software to describe the required data movement compactly, and free the implementation to optimize the movement based on the knowledge of the memory hierarchy and system structure. The DLT architecture ideas can be applicable to both general-purpose and accelerator-based heterogeneous systems. Experiment results first show that the proposed DLT architecture can make use of the full bandwidth (>97%) of a wide range of memory systems (DDR3 and HMC) while its implementation cost is relatively low, occupying only 0.24 mm2 and consuming 75mW at 1GHz in 32nm CMOS technology. Our evaluation of using the DLT accelerator in accelerated-based heterogeneous system across DDR3 and HMC memory shows that the DLT can enhance system performance in range of 4.6x???99x (DDR3), 4.4x???115x (HMC) which turns out 2.8x???48x (DDR3), 1.4x???39x (HMC) improvement for energy efficiency.
机译:技术缩放和越来越多的加速器利用使得在所有计算系统中越来越重要的数据流动进行了优化。此外,在存储器结构中越来越多的多样性使得在软件不便携式中嵌入这种优化。我们提出了一种名为数据布局转换(DLT)的新颖的架构解决方案,与一组简单的指令相关联,使得软件能够紧凑地描述所需的数据移动,并使实现基于存储层级和系统结构的知识来优化移动。 DLT架构思路可适用于通用和基于加速器的异构系统。实验结果首先表明,所提出的DLT架构可以利用各种内存系统(DDR3和HMC)的全带宽(> 97%),而其实现成本相对较低,仅占用0.24 mm2并在1GHz下消耗75mW在32nm CMOS技术中。我们对DLT加速器在DDR3和HMC存储器中使用的加速基础异构系统中的评价显示DLT可以增强4.6倍的系统性能,范围为4.6x ???(DDR3),4.4x ??? 115x(HMC)输出2.8x ??? 48x(DDR3),1.4倍??? 39x(HMC)的能效提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号