首页> 外文会议>ACM/IEEE Annual International Symposium on Computer Architecture >TPShare: A Time-Space Sharing Scheduling Abstraction for Shared Cloud via Vertical Labels
【24h】

TPShare: A Time-Space Sharing Scheduling Abstraction for Shared Cloud via Vertical Labels

机译:TPShare:通过垂直标签的共享云的时空共享调度抽象

获取原文

摘要

Current shared cloud operating systems (cloud OS) such as Mesos and YARN are based on the “de facto” two-horizontal-layer cloud platform architecture, which decouples cloud application frameworks (e.g., Apache Spark) from the underlying resource management infrastructure. Each layer normally has its own task or resource allocation scheduler, based on either time-sharing or space- sharing. As such, the schedulers in different layers are unavoidably disconnected, - not aware of each other, which highly likely leads to resource (e.g.,CPU) wastes. Moreover, the tail latency may even be harmed due to the performance interference on shared resources. This paper takes the first step to establish the critical missing connection between the horizontal layers. We propose TPShare, a time- space sharing scheduling abstraction, using a simple but efficient vertical label mechanism to coordinate the time- or space-sharing schedulers in different layers. The vertical labels are bidirectional (i.e., up and down) message carriers which convey necessary information across two layers and are kept as small as possible. The schedulers in different layers can thus take actions according to the label messages to reduce resource wastes and improve tail latency. Moreover, the labels can be defined to support different cloud application frameworks. We implement the label mechanism in Mesos and two popular cloud application frameworks (Apache Spark and Flink) to study the effectiveness of the time-space sharing scheduling abstraction. The label messages of TPShare reduce resource waste and performance interference due to independent time-sharing or space- sharing scheduling of different layers by enabling 1) on-demand fine-grained resource offering, 2) load-aware resource filtering, and 3) resource demand scaling with global view, eventually improving performance and tail latency. We co-locate 13 Spark batch and 4 Flink latency-sensitive programs on a 8-node cluster managed by TPShare to evaluate the speedup, CPU and memory utilization, and tail latency. The results show that TPShare accelerates the Spark programs significantly with even lower CPU and memory utilization compared to Mesos. With higher resource utilization, the throughput of TPShare is drastically larger than that of Mesos. For the Flink programs, TPShare improves the 99th tail latency by 48% on average and up to 120%.
机译:当前的共享云操作系统(云OS)(例如Mesos和YARN)基于“事实上的”两层云平台架构,该架构将云应用程序框架(例如Apache Spark)与底层资源管理基础架构分离。通常,基于时间共享或空间共享,每个层都有其自己的任务或资源分配调度程序。这样,在不同层中的调度器不可避免地被断开,-彼此不知道,这很可能导致资源(例如,CPU)浪费。此外,由于对共享资源的性能干扰,甚至可能损害尾部等待时间。本文迈出了在水平层之间建立关键缺失连接的第一步。我们提出了TPShare,一种时空共享调度抽象,它使用一种简单但有效的垂直标签机制来协调不同层中的时间或空间共享调度程序。垂直标签是双向的(即,上下)消息载体,其跨两层传送必要的信息并保持尽可能小。因此,不同层中的调度程序可以根据标签消息采取措施,以减少资源浪费并改善尾部等待时间。此外,可以定义标签以支持不同的云应用程序框架。我们在Mesos和两个流行的云应用程序框架(Apache Spark和Flink)中实现标签机制,以研究时空共享调度抽象的有效性。 TPShare的标签消息通过启用1)按需细粒度资源提供,2)负载感知资源过滤和3)资源来减少由于不同层的独立时间共享或空间共享调度而导致的资源浪费和性能干扰。通过全局视图扩展需求,最终改善性能和尾部延迟。我们在TPShare管理的8节点群集上共放置了13个Spark批处理程序和4个Flink延迟敏感程序,以评估速度,CPU和内存利用率以及尾部延迟。结果表明,与Mesos相比,TPShare显着提高了Spark程序的CPU和内存利用率。随着资源利用率的提高,TPShare的吞吐量大大高于Mesos。对于Flink程序,TPShare将第99个拖尾延迟平均缩短了48%,最高可提高120%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号