首页> 外文期刊>Concurrency, practice and experience >Tile/line access cachememory based on amulti-level Z-order tiling data layout
【24h】

Tile/line access cachememory based on amulti-level Z-order tiling data layout

机译:基于多级Z顺序切片数据布局的切片/行访问缓存

获取原文
获取原文并翻译 | 示例

摘要

Ineffective column-directional cache memory access has become a bottleneck for efficient two-dimensional (2-D) data processing utilizing extended single instructionmultiple data (SIMD) instructions. To solve this problem, we propose a cache memory with tile (column and row directions) and line (row direction) accessibility for efficient 2-D data processing. 2-D data access to the proposed cache memory is enabled via a hardware-based multi-mode address translation unit that eliminates the overhead of software-based address calculation. To reduce the hardware overhead of the proposed cache, we propose a tag memory reduction method that replaces multiple tiles with an aligned tile set (RATS) in the cache. To verify the feasibility of the proposed cache, an LSI layout of a SIMD-based general purpose-oriented datapath embedding the proposed cache is designed in a 2.5×5 mm~2 area using 0.18-μm CMOS technology. Under a 3.9-ns clock period (250 MHz), the read latency is limited to 3 clock cycles, which is the same as that for the conventional cache memory. Using the RATS method, the entire hardware overhead of the proposed cache is reduced to only 7% of that required for a conventional cache. In addition, simulation results for the proposed cache indicate a considerable reduction of L1 and L2 cache confliction misses compared with a conventional cache in power-of-two matrix size due to the column-directional address stride being sufficiently smaller than page size. Therefore, the proposed cache provides efficient column-directional parallel access as same as row-directional parallel access so that it enables efficient SIMD operation requiring no transposition in matrix multiplication (MM). For LU decomposition (LUD), the proposed cache can provide almost the same performance to the column-major-based LUD program as that to the row-major-based LUD program. These results show that the proposed cache does not restrict our freedom in selecting either row- or column-major order coding.
机译:利用扩展的单指令多数据(SIMD)指令,无效的列方向高速缓存访​​问已成为有效二维(2-D)数据处理的瓶颈。为解决此问题,我们提出了一种具有切片(列和行方向)和行(行方向)可访问性的高速缓存,以进行有效的二维数据处理。通过基于硬件的多模式地址转换单元可以实现对建议的高速缓存的2D数据访问,从而消除了基于软件的地址计算的开销。为了减少建议的缓存的硬件开销,我们提出了一种标签内存减少方法,该方法用缓存中的对齐图块集(RATS)替换多个图块。为了验证所提出的缓存的可行性,使用0.18-μmCMOS技术在2.5×5 mm〜2的区域中设计了嵌入SIM的基于SIMD的通用数据路径的LSI布局。在3.9 ns的时钟周期(250 MHz)下,读取延迟被限制为3个时钟周期,这与传统的高速缓存相同。使用RATS方法,建议的缓存的整个硬件开销减少到仅传统缓存所需的7%。另外,对于所提出的高速缓存的仿真结果表明,与传统高速缓存相比,由于列方向的地址步幅足够小于页面大小,因此与传统高速缓存的二乘幂矩阵大小相比,L1和L2高速缓存冲突未命中的数量大大减少。因此,所提出的高速缓存提供与行方向并行访问相同的有效的列方向并行访问,从而使得它能够进行高效的SIMD操作,而无需矩阵乘法(MM)中的转置。对于LU分解(LUD),建议的缓存可以为基于列的LUD程序提供几乎与基于行的LUD程序相同的性能。这些结果表明,建议的缓存不会限制我们选择行或列主序编码的自由。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号