首页> 外文OA文献 >HAMLeT: Hardware Accelerated Memory Layout Transform within 3D-stacked DRAM
【2h】

HAMLeT: Hardware Accelerated Memory Layout Transform within 3D-stacked DRAM

机译:HAMLeT:3D堆叠DRAM中的硬件加速内存布局转换

摘要

Memory layout transformations via data reorganization are very common operations, which occur as a part of the computation or as a performance optimization in data-intensive applications. These operations require inefficient memory access patterns and roundtrip data movement through the memory hierarchy, failing to utilize the performance and energy-efficiency potentials of the memory subsystem. This paper proposes a high-bandwidth and energy-efficient hardware accelerated memory layout transform (HAMLeT) system integrated within a 3D-stacked DRAM. HAMLeT uses a low-overhead hardware that exploits the existing infrastructure in the logic layer of 3D-stacked DRAMs, and does not require any changes to the DRAM layers, yet it can fully exploit the locality and parallelism within the stack by implementing efficient layout transform algorithms. We analyze matrix layout transform operations (such as matrix transpose, matrix blocking and 3D matrix rotation) and demonstrate that HAMLeT can achieve close to peak system utilization, offering up to an order of magnitude performance improvement compared to the CPU and GPU memory subsystems which does not employ HAMLeT.
机译:通过数据重组进行的内存布局转换是非常常见的操作,它们在数据密集型应用程序中作为计算的一部分或性能优化发生。这些操作需要低效率的内存访问模式和通过内存层次结构的往返数据移动,无法利用内存子系统的性能和能效潜力。本文提出了一种集成在3D堆栈DRAM中的高带宽,高能效的硬件加速内存布局转换(HAMLeT)系统。 HAMLeT使用低开销的硬件,该硬件可利用3D堆栈DRAM逻辑层中的现有基础结构,并且不需要对DRAM层进行任何更改,但是它可以通过实施有效的布局转换来充分利用堆栈中的局部性和并行性算法。我们分析了矩阵布局转换操作(例如矩阵转置,矩阵阻塞和3D矩阵旋转),并证明HAMLeT可以实现接近峰值系统利用率,与CPU和GPU内存子系统相比,其性能提高了一个数量级。不雇用HAMLeT。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号