Stream Floating: Enabling Proactive and Decentralized Cache Optimizations

机译：流浮动：启用主动和分散的缓存优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As multicore systems continue to grow in scale and on-chip memory capacity, the on-chip network bandwidth and latency become problematic bottlenecks. Because of this, overheads in data transfer, the coherence protocol and replacement policies become increasingly important. Unfortunately, even in well-structured programs, many natural optimizations are difficult to implement because of the reactive and centralized nature of traditional cache hierarchies, where all requests are initiated by the core for short, cache line granularity accesses. For example, long-lasting access patterns could be streamed from shared caches without requests from the core. Indirect memory access can be performed by chaining requests made from within the cache, rather than constantly returning to the core. Our primary insight is that if programs can embed information about long-term memory stream behavior in their ISAs, then these streams can be floated to the appropriate level of the memory hierarchy. This decentralized approach to address generation and cache requests can lead to better cache policies and lower request and data traffic by proactively sending data before the cores even request it. To evaluate the opportunities of stream floating, we enhance a tiled multicore cache hierarchy with stream engines to process stream requests in last-level cache banks. We develop several novel optimizations that are facilitated by stream exposure in the ISA, and subsequent exposure to caches. We evaluate using a cycle-level execution-driven gem5-based simulator, using 10 data-processing workloads from Rodinia and 2 streaming kernels written in OpenMP. We find that stream floating enables 52% and 39% speedup over an inorder and OOO core with state of art prefetcher design respectively, with 64% and 49% energy efficiency advantage.

机译：随着多核系统的持续增长和片上存储容量，片上网络带宽和延迟成为有问题的瓶颈。因此，数据传输中的开销，一致性协议和更换政策变得越来越重要。不幸的是，即使在结构良好的程序中，由于传统缓存层次结构的反应性和集中性，许多自然优化难以实现，其中所有请求由核心短，高速缓存行粒度访问启动。例如，可以从共享缓存中流式流式传输长持久的访问模式，而不从核心请求。可以通过缓存内的链接请求来执行间接存储器访问，而不是不断返回到核心。我们的主要识别是，如果程序可以在其ISA中嵌入有关长期内存流行为的信息，则这些流可以浮动到适当的内存层级级别。通过主动地在核心之前甚至请求之前，这种分散的解决生成和缓存请求的方法可以通过主动发送数据来导致更好的缓存策略和更低的请求和数据流量。为了评估流浮动的机会，我们可以增强具有流引擎的瓷砖多核缓存层次结构，以在最后级缓存库中处理流请求。我们开发了几种新颖的优化，通过ISA中的流暴露，随后暴露于高速缓存。我们使用来自Rodinia的10个数据处理工作负载和在OpenMP中编写的2个流核，使用来自循环级执行驱动的GEM5的模拟器进行评估。我们发现，流漂浮的流浮动，分别具有52％和39％的加速，具有艺术预取器设计状态，能效优势64％和49％。

著录项

来源
《IEEE International Symposium on High Performance Computer Architecture》|2021年|640-653|共14页
会议地点
作者
Zhengrong Wang; Jian Weng; Jason Lowe-Power; Jayesh Gaur; Tony Nowatzki;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Out of order; Technological innovation; Microarchitecture; Protocols; Multicore processing; Prefetching; Coherence;

机译：出于秩序;技术创新;微体系结构;协议;多核加工;预取;连贯;

相似文献

外文文献
中文文献
专利

1. Optimization-Based Decentralized Coded Caching for Files and Caches With Arbitrary Sizes [J] . Wang Qi, Cui Ying, Jin Sian, IEEE Transactions on Communications . 2020,第4期

机译：基于优化的分散编码缓存，用于文件和缓存具有任意尺寸的文件和缓存
2. A 186-Mvertices/s 161-mW Floating-Point Vertex Processor With Optimized Datapath and Vertex Caches [J] . Chang-Hyo Yu, Kyusik Chung, Donghyun Kim, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2009,第10期

机译：具有优化的数据路径和顶点缓存的186 Mbps的161 mW浮点顶点处理器
3. Trace-Driven QoE-Aware Proactive Caching for Mobile Video Streaming in Metropolis [J] . IEEE transactions on wireless communications . 2020,第1期

机译：跟踪驱动的QoE感知主动缓存，用于大都市中的移动视频流
4. Optimization-based Decentralized Coded Caching for Files and Caches with Arbitrary Sizes [C] . Qi Wang, Ying Cui, Sian Jin, International Workshop on Signal Processing Advances in Wireless Communications . 2019

机译：基于优化的分散式编码缓存，用于任意大小的文件和缓存
5. Model-driven memory optimizations for high performance computing: From caches to I/O. [D] . Frasca, Michael. 2012

机译：用于高性能计算的模型驱动的内存优化：从缓存到I / O。
6. The Stream Exchange Protocol: A Secure and Lightweight Tool for Decentralized Connection Establishment [O] . Stefan Tatschner, Ferdinand Jarisch, Alexander Giehl, 2021

机译：流交换协议：用于分散连接建立的安全和轻量级工具
7. Caching in the Sky: Proactive Deployment of Cache-Enabled Unmanned Aerial Vehicles for Optimized Quality-of-Experience [O] . Chen, Mingzhe, Mozaffari, Mohammad, Saad, Walid, 2016

机译：在天空中缓存：主动部署启用缓存的无人值守优化经验质量的飞行器

Stream Floating: Enabling Proactive and Decentralized Cache Optimizations

摘要

著录项

相似文献

相关主题

期刊订阅