首页> 外文会议>Design, Automation Test in Europe Conference Exhibition >A Data Layout Transformation (DLT) accelerator: Architectural support for data movement optimization in accelerated-centric heterogeneous systems
【24h】

A Data Layout Transformation (DLT) accelerator: Architectural support for data movement optimization in accelerated-centric heterogeneous systems

机译:数据布局转换(DLT)加速器:对以加速为中心的异构系统中的数据移动优化的架构支持

获取原文

摘要

Technology scaling and growing use of accelerators make optimization of data movement of increasing importance in all computing systems. Further, growing diversity in memory structures makes embedding such optimization in software non-portable. We propose a novel architectural solution called Data Layout Transformation (DLT) associated with a simple set of instructions that enable software to describe the required data movement compactly, and free the implementation to optimize the movement based on the knowledge of the memory hierarchy and system structure. The DLT architecture ideas can be applicable to both general-purpose and accelerator-based heterogeneous systems. Experiment results first show that the proposed DLT architecture can make use of the full bandwidth (>97%) of a wide range of memory systems (DDR3 and HMC) while its implementation cost is relatively low, occupying only 0.24 mm2 and consuming 75mW at 1GHz in 32nm CMOS technology. Our evaluation of using the DLT accelerator in accelerated-based heterogeneous system across DDR3 and HMC memory shows that the DLT can enhance system performance in range of 4.6x???99x (DDR3), 4.4x???115x (HMC) which turns out 2.8x???48x (DDR3), 1.4x???39x (HMC) improvement for energy efficiency.
机译:技术的扩展和加速器的不断使用使数据移动的优化在所有计算系统中的重要性日益提高。此外,存储器结构中日益增长的多样性使得将这种优化嵌入软件中是不可移植的。我们提出了一种新颖的架构解决方案,称为数据布局转换(DLT),它与一组简单的指令相关联,这些指令使软件能够紧凑地描述所需的数据移动,并根据内存层次结构和系统结构的知识释放实现以优化移动的实现。 DLT体系结构的思想可以适用于通用和基于加速器的异构系统。实验结果首先表明,提出的DLT体系结构可以利用各种存储系统(DDR3和HMC)的全带宽(> 97%),而其实现成本却相对较低,仅占用0.24 mm2的空间并在1GHz时消耗75mW的功率。采用32nm CMOS技术。我们对在DDR3和HMC内存上基于加速的异构系统中使用DLT加速器的评估表明,DLT可以在4.6x ??? 99x(DDR3),4.4x ??? 115x(HMC)范围内增强系统性能。能源效率提高了2.8倍×48倍(DDR3),1.4倍×39倍(HMC)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号