Tile/line access cachememory based on amulti-level Z-order tiling data layout

BaokangWang; Yuki Fukazawa; Toshio Kondo; Takahiro Sasaki

首页> 外文期刊>Concurrency, practice and experience >Tile/line access cachememory based on amulti-level Z-order tiling data layout

【24h】

Tile/line access cachememory based on amulti-level Z-order tiling data layout

机译：基于多级Z顺序切片数据布局的切片/行访问缓存

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Ineffective column-directional cache memory access has become a bottleneck for efficient two-dimensional (2-D) data processing utilizing extended single instructionmultiple data (SIMD) instructions. To solve this problem, we propose a cache memory with tile (column and row directions) and line (row direction) accessibility for efficient 2-D data processing. 2-D data access to the proposed cache memory is enabled via a hardware-based multi-mode address translation unit that eliminates the overhead of software-based address calculation. To reduce the hardware overhead of the proposed cache, we propose a tag memory reduction method that replaces multiple tiles with an aligned tile set (RATS) in the cache. To verify the feasibility of the proposed cache, an LSI layout of a SIMD-based general purpose-oriented datapath embedding the proposed cache is designed in a 2.5×5 mm~2 area using 0.18-μm CMOS technology. Under a 3.9-ns clock period (250 MHz), the read latency is limited to 3 clock cycles, which is the same as that for the conventional cache memory. Using the RATS method, the entire hardware overhead of the proposed cache is reduced to only 7% of that required for a conventional cache. In addition, simulation results for the proposed cache indicate a considerable reduction of L1 and L2 cache confliction misses compared with a conventional cache in power-of-two matrix size due to the column-directional address stride being sufficiently smaller than page size. Therefore, the proposed cache provides efficient column-directional parallel access as same as row-directional parallel access so that it enables efficient SIMD operation requiring no transposition in matrix multiplication (MM). For LU decomposition (LUD), the proposed cache can provide almost the same performance to the column-major-based LUD program as that to the row-major-based LUD program. These results show that the proposed cache does not restrict our freedom in selecting either row- or column-major order coding.

机译：利用扩展的单指令多数据（SIMD）指令，无效的列方向高速缓存访问已成为有效二维（2-D）数据处理的瓶颈。为解决此问题，我们提出了一种具有切片（列和行方向）和行（行方向）可访问性的高速缓存，以进行有效的二维数据处理。通过基于硬件的多模式地址转换单元可以实现对建议的高速缓存的2D数据访问，从而消除了基于软件的地址计算的开销。为了减少建议的缓存的硬件开销，我们提出了一种标签内存减少方法，该方法用缓存中的对齐图块集（RATS）替换多个图块。为了验证所提出的缓存的可行性，使用0.18-μmCMOS技术在2.5×5 mm〜2的区域中设计了嵌入SIM的基于SIMD的通用数据路径的LSI布局。在3.9 ns的时钟周期（250 MHz）下，读取延迟被限制为3个时钟周期，这与传统的高速缓存相同。使用RATS方法，建议的缓存的整个硬件开销减少到仅传统缓存所需的7％。另外，对于所提出的高速缓存的仿真结果表明，与传统高速缓存相比，由于列方向的地址步幅足够小于页面大小，因此与传统高速缓存的二乘幂矩阵大小相比，L1和L2高速缓存冲突未命中的数量大大减少。因此，所提出的高速缓存提供与行方向并行访问相同的有效的列方向并行访问，从而使得它能够进行高效的SIMD操作，而无需矩阵乘法（MM）中的转置。对于LU分解（LUD），建议的缓存可以为基于列的LUD程序提供几乎与基于行的LUD程序相同的性能。这些结果表明，建议的缓存不会限制我们选择行或列主序编码的自由。

著录项

来源
《Concurrency, practice and experience》 |2018年第9期|e4375.1-e4375.20|共20页
作者
BaokangWang; Yuki Fukazawa; Toshio Kondo; Takahiro Sasaki;
展开▼
作者单位

Graduate School of Engineering, Mie University, 1577, Kurimamachiya-cho Tsu,Mie 514-8507, Japan;

Graduate School of Engineering, Mie University, 1577, Kurimamachiya-cho Tsu,Mie 514-8507, Japan;

Graduate School of Engineering, Mie University, 1577, Kurimamachiya-cho Tsu,Mie 514-8507, Japan;

Graduate School of Engineering, Mie University, 1577, Kurimamachiya-cho Tsu,Mie 514-8507, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
cache memory; cache locality; tile; Z-Morton layout; 2-D data processing;

机译：高速缓存存储器;缓存位置;瓦;Z-Morton布局;二维数据处理;

相似文献

外文文献
中文文献
专利

1. Tile size selection using cache organization and data layout [J] . Stephanie Coleman, Kathryn S. McKinley ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 1995,第6期

机译：使用缓存组织和数据布局选择图块大小
2. Tiling algebra for constraint-based layout editing [J] . Zeidler Clemens, Weber Gerald, Gavryushkin Alex, Journal of Logic and Algebraic Programming . 2017,第JUNa期

机译：平铺代数，用于基于约束的布局编辑
3. A study of animated transition in similarity-based tiled image layout [J] . Zhang Huan, Tao Jun, Fang Ruan, Tsinghua Science and Technology . 2013,第2期

机译：基于相似度的平铺图像布局中的动画过渡的研究
4. Raised-floor data center: perforated tile flow rates for various tile layouts [C] . Schmidt, R., Karki, . 2004

机译：高架地板数据中心：各种瓷砖布局的穿孔瓷砖流速
5. A large-scale dynamic vector and raster data visualization geographic information system based on parallel map tiling [D] . Wang, Huan 2012

机译：基于并行地图平铺的大规模动态矢量和栅格数据可视化地理信息系统
6. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets [O] . D. D. Shrimankar, S. R. Sathe 2016

机译：大型生物数据集基于新图块的并行编程模型对SMP节点和工作站集群的并行算法进行分析
7. Tiling, block data layout, and memory hierarchy performance [O] . Neungsoo Park, Bo Hong, Viktor K. Prasanna 2003

机译：平铺，阻止数据布局和内存层次结构性能
8. Tile-Based System for the Rapid Display of Symbolized Digital Nautical Chart Data [R] . 2010

机译：基于平铺的系统快速显示符号化数字航海图数据

Tile/line access cachememory based on amulti-level Z-order tiling data layout

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅