Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads

Harshad Kasture; Daniel Sanchez

首页> 外文期刊>ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages >Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads

【24h】

Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads

机译：UBIK：高效缓存共享，具有严格的QoS，适用于延迟关键工作负载

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3×, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications.

机译：芯片 - 多处理器（CMP）必须经常执行具有不同性能要求的工作量混合。一方面，面对用户延迟关键的应用程序（例如，Web搜索）需要低尾（即最坏情况）延迟，通常在毫秒范围内，并且具有固有的低利用率。另一方面，计算密集型批量应用（例如，MapReduce）只需要高长期的平均性能。在当前的CMPS中，由于共享资源的干扰，无法同时运行延迟关键和批量应用。遗憾的是，在CMPS的服务质量（QoS）的前程的工作侧重于保证平均性能，而不是尾部延迟。在这项工作中，我们分析了几个延迟关键的工作负载，并表明保证平均性能不足以保持低尾延迟，因为具有州的微架构资源，如缓存或核心，施加惯性工作负载性能。最后级别的缓存赋予最高惯性，因为工作负载需要数十毫秒来温暖它们。当留下非托管或使用传统QoS框架管理时，共享最后级别缓存会显着降低尾部延迟。相反，我们提出了一种动态分区技术，该技术预测和利用延迟关键工作负载的瞬态行为，以保持其尾延迟，同时最大化可用于批处理应用的缓存空间。使用广泛的模拟，虽然传统的QoS框架将尾部延迟降低到2.3×，但Ubik同时保持延迟关键工作负载的尾部延迟，并且显着提高了批量应用的性能。

著录项

来源
《ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages》 |2014年第4期|共14页
作者
Harshad Kasture; Daniel Sanchez;
展开▼
作者单位

Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology;

Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机软件;
关键词
multicore; interference; isolation; quality of service; tail latency; resource management; cache partitioning;

机译：多芯;干扰;隔离;服务质量;尾延迟;资源管理;缓存分区;

相似文献

外文文献
中文文献
专利

1. Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads [J] . Harshad Kasture, Daniel Sanchez ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2014,第4期

机译：UBIK：高效缓存共享，具有严格的QoS，适用于延迟关键工作负载
2. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems [J] . Zhu Haishan, Erez Mattan ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2016,第4期

机译：Dirigent：在共享多核系统上对延迟关键任务执行QoS
3. Priority Based Yield of Shared Cache to Provide Cache QoS in Multicore Systems [J] . Krupa Sivakumaran, Arul Siromoney International journal of parallel programming . 2017,第3期

机译：基于优先级的共享缓存收益，以在多核系统中提供缓存QoS
4. Performance Bound Energy Efficient L2 Cache Organization for Emerging Workload for Multi-Core Processor: A Comparison of Private and Shared Cache [C] . Ramya Arun, Eugene John International conference on computer design . 2012

机译：性能受限的节能型L2高速缓存组织，用于新兴的多核处理器工作负载：专用和共享高速缓存的比较
5. Performance bound energy efficient cache organization for multi-core processors: A comparison of private and shared cache. [D] . Arun, Ramya. 2010

机译：性能受限的多核处理器的节能缓存组织：私有缓存和共享缓存的比较。
6. Crafting Care That Fits: Workload and Capacity Assessments Complementing Decision Aids in Implementing Shared Decision Making [O] . Thomas H Wieringa, Manuel F Sanchez-Herrera, Nataly R Espinoza, 2020

机译：制作关怀适合：工作量和能力评估补充决策援助实施共享决策
7. Ubik: efficient cache sharing with strict qos for latency-critical workloads [O] . Kasture Harshad, Sanchez Daniel 2014

机译：Ubik：高效的缓存共享，对延迟关键的工作负载具有严格的qos

Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅