首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters
【24h】

Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

机译:大型集群的性能感知投机性资源超额订购

获取原文
获取原文并翻译 | 示例

摘要

It is a long-standing challenge to achieve a high degree of resource utilization in cluster scheduling. Resource oversubscription has become a common practice in improving resource utilization and cost reduction. However, current centralized approaches to oversubscription suffer from the issue with resource mismatch and fail to take into account other performance requirements, e.g., tail latency. In this article we present ROSE, a new resource management platform capable of conducting performance-aware resource oversubscription. ROSE allows latency-sensitive long-running applications (LRAs) to co-exist with computation-intensive batch jobs. Instead of waiting for resource allocation to be confirmed by the centralized scheduler, job managers in ROSE can independently request to launch speculative tasks within specific machines according to their suitability for oversubscription. Node agents of those machines can however, avoid any excessive resource oversubscription by means of a mechanism for admission control using multi-resource threshold control and performance-aware resource throttle. Experiments show that in case of mixed co-location of batch jobs and latency-sensitive LRAs, the CPU utilization and the disk utilization can reach 56.34 and 43.49 percent, respectively, but the 95th percentile of read latency in YCSB workloads only increases by 5.4 percent against the case of executing the LRAs alone.
机译:在集群调度中实现高度资源利用是一项长期的挑战。资源超额预订已成为提高资源利用率和降低成本的普遍做法。但是,当前的集中式超额预订方法存在资源不匹配的问题,并且无法考虑其他性能要求,例如尾部等待时间。在本文中,我们介绍了ROSE,这是一个能够执行性能感知资源超额预订的新资源管理平台。 ROSE允许对延迟敏感的长期运行应用程序(LRA)与计算密集型批处理作业共存。 ROSE中的作业管理者可以根据超额预订的适合性,独立请求在特定计算机内启动推测性任务,而不必等待集中式调度程序确认资源分配。但是,这些机器的节点代理可以通过使用多资源阈值控制和性能感知资源限制的准入控制机制来避免任何过多的资源超额预订。实验表明,在批处理作业和对延迟敏感的LRA混合托管的情况下,CPU利用率和磁盘利用率分别可以达到56.34%和43.49%,但是YCSB工作负载中的读取延迟的第95个百分点仅增加了5.4%。反对仅执行LRA。

著录项

  • 来源
  • 作者

  • 作者单位

    Univ Leeds Sch Comp Leeds LS2 91T W Yorkshire England;

    Beihang Univ Sch Comp Sci & Engn Beijing 100083 Peoples R China;

    Univ Lancaster Sch Comp & Commun Lancaster LA1 4YW England;

    Beihang Univ Sch Comp Beijing 100083 Peoples R China|Beihang Univ State Key Lab Software Dev Environm Beijing 100083 Peoples R China;

    Newcastle Univ Sch Comp Newcastle Upon Tyne NE1 7RU Tyne & Wear England;

    Beihang Univ Beijing Adv Innovat Ctr Big Data & Brain Comp Beijing 100083 Peoples R China;

    Alibaba Grp Engn Hangzhou 310052 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Resource scheduling; oversubscription; cluster utilization; resource throttling; QoS;

    机译:资源调度;超额认购;集群利用率;资源节流;服务质量;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号