首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >Enhancing Server Efficiency in the Face of Killer Microseconds
【24h】

Enhancing Server Efficiency in the Face of Killer Microseconds

机译:面对杀手级微秒,提高服务器效率

获取原文

摘要

We are entering an era of ""killer microseconds"" in data center applications. Killer microseconds refer to s-scale ""holes"" in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of s-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve core utilization and increase server performance density. Unfortunately, scaling SMT to provision enough threads to hide frequent s-scale stalls is prohibitive and SMT co-location can often drastically increase the tail latency of cloud microservices. In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily executes a single latency-critical master-thread, and lender-cores, which multiplex latency-insensitive throughput threads. When the master-thread stalls, the master-core borrows filler-threads from the lender-core, filling s-scale utilization holes of the microservice. We propose critical mechanisms, including separate memory paths for the master-thread and filler-threads, to enable master-cores to borrow filler-threads while protecting master-threads' state from disruption. Duplexity facilitates fast master-thread restart when stalls resolve and minimizes the microservice's QoS violation. Our evaluation demonstrates that Duplexity is able to achieve 1.9 higher core utilization and 2.7 lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.
机译:我们正在数据中心应用程序中进入“杀手级微秒”时代。杀手级微秒是指CPU调度中的s级“漏洞”,是由停顿导致访问快速I / O设备或高吞吐量微服务中请求之间的短暂空闲时间所致。尽管现代计算平台可以通过微体系结构技术和OS上下文切换来有效地隐藏ns级和ms级停顿,但它们缺乏有效的支持来隐藏s级停顿的延迟。同步多线程(SMT)是提高核心利用率并提高服务器性能密度的有效方法。不幸的是,扩展SMT以提供足够的线程来隐藏频繁的s-scale停滞是令人望而却步的,并且SMT托管通常会大大增加云微服务的尾部延迟。在本文中,我们提出了Duplexity,这是一种异构服务器体系结构,它采用积极的多线程技术来隐藏杀手级微秒的延迟,而不会牺牲对延迟敏感的微服务的服务质量(QoS)。双工提供了两类核心的成对(成对):主核心(分别主要执行单个对延迟至关重要的主线程)和贷款人核心(对复用对延迟不敏感的吞吐量线程进行复用)。当主线程停止运行时,主内核从借出者内核借用填充线程,从而填补了微服务的s规模利用率漏洞。我们提出了关键的机制,包括分别用于主线程和填充线程的内存路径,以使主内核能够借用填充线程,同时保护主线程的状态不受干扰。当停顿得到解决时,双工有助于快速重新启动主线程,并最大程度地减少了微服务的QoS违规。我们的评估表明,与基于SMT的服务器设计相比,双工平均能够实现1.9更高的核心利用率和2.7更低的等效吞吐量99%的尾部延迟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号