首页> 外文会议>International Conference on VLSI Design;International Conference on Embedded Systems >Towards a Better Cache Utilization by Selective Data Storage for CMP Last Level Caches
【24h】

Towards a Better Cache Utilization by Selective Data Storage for CMP Last Level Caches

机译:通过针对CMP末级缓存的选择性数据存储实现更好的缓存利用率

获取原文

摘要

Tiled based CMP (TCMP) has become the essential next generation scalable multicore architecture. The cores in TCMP commonly share a large sized Last Level Cache. NUCA is used in LLC to divide it into multiple banks such that each bank can be accessed independently. Static NUCA has a fixed address mapping policy whereas dynamic NUCA (DNUCA) allows blocks to relocate nearer to the processing cores at runtime. DNUCA based TCMP can distribute the loads to each bank uniformly for a better global utilization. But such DNUCA designs cannot improve the local utilization factor for every bank. Within each bank the memory accesses are not uniformly distributed among the sets. Therefore flexibility of storing data items in unused portion of the bank can help to improve its utilization. In this paper we propose a DNUCA based design called STD-NUCA, to improve the local utilization of each bank. It has been observed that on average 24% blocks in L2 are useless as they are exclusively owned by some L1. These blocks, called stale blocks, cannot be used without contacting the owner. STD-NUCA removes the data portion of these stale blocks from L2 and only stores their tags. The proposed STD-NUCA can be used either for performance improvement or reducing the hardware overheads. Increasing the number of tag entries improves the performance, while keeping the size of tag entries same and decreasing the data array reduces the hardware overheads. Reduction in data array size gives 9% gain in energy consumption and 10.4% gain in energy delay product over an existing design TLD-NUCA. With higher associative tag array we get 5% improvement in performance.
机译:基于瓷砖的CMP(TCMP)已成为必不可少的下一代可扩展多核架构。 TCMP中的核心通常共享大型最后一级缓存。 NUCA用于LLC中将其分成多个银行,以便可以独立访问每个银行。静态NUCA具有固定的地址映射策略,而动态NUCA(DNUCA)允许块在运行时重新定位到处理核心。基于DNUCA的TCMP可以均匀地将负载分配给每个银行,以获得更好的全球利用率。但这种DNUCA设计不能提高每家银行的本地利用率。在每个存储器中,内存访问不均匀地分布在集合中。因此,在银行未使用部分中存储数据项的灵活性可以有助于提高其利用率。在本文中,我们提出了一种名为STD-NUCA的基于DNCA的设计,提高每个银行的本地利用率。已经观察到,在L2的平均24%的块上,因为它们完全由某些L1拥有。这些块(称为陈旧块)不能在不联系所有者的情况下使用。 STD-NUCA从L2中移除这些陈旧块的数据部分,只能存储其标签。所提出的STD-NUCA可用于性能改进或减少硬件开销。增加标记条目的数量提高了性能,同时保持标签条目的大小相同和减少数据阵列减少了硬件开销。数据阵列的减少大小为现有设计TLD-Nuca提供了9%的能耗增益和能量延迟产品的增益10.4%。使用更高的关联标签阵列,我们的性能提高了5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号