...
首页> 外文期刊>ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages >Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads
【24h】

Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads

机译:UBIK:高效缓存共享,具有严格的QoS,适用于延迟关键工作负载

获取原文
获取原文并翻译 | 示例

摘要

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3×, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications.
机译:芯片 - 多处理器(CMP)必须经常执行具有不同性能要求的工作量混合。一方面,面对用户延迟关键的应用程序(例如,Web搜索)需要低尾(即最坏情况)延迟,通常在毫秒范围内,并且具有固有的低利用率。另一方面,计算密集型批量应用(例如,MapReduce)只需要高长期的平均性能。在当前的CMPS中,由于共享资源的干扰,无法同时运行延迟关键和批量应用。遗憾的是,在CMPS的服务质量(QoS)的前程的工作侧重于保证平均性能,而不是尾部延迟。在这项工作中,我们分析了几个延迟关键的工作负载,并表明保证平均性能不足以保持低尾延迟,因为具有州的微架构资源,如缓存或核心,施加惯性工作负载性能。最后级别的缓存赋予最高惯性,因为工作负载需要数十毫秒来温暖它们。当留下非托管或使用传统QoS框架管理时,共享最后级别缓存会显着降低尾部延迟。相反,我们提出了一种动态分区技术,该技术预测和利用延迟关键工作负载的瞬态行为,以保持其尾延迟,同时最大化可用于批处理应用的缓存空间。使用广泛的模拟,虽然传统的QoS框架将尾部延迟降低到2.3×,但Ubik同时保持延迟关键工作负载的尾部延迟,并且显着提高了批量应用的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号