Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

【24h】

Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

机译：大型集群的性能感知投机性资源超额订购

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

It is a long-standing challenge to achieve a high degree of resource utilization in cluster scheduling. Resource oversubscription has become a common practice in improving resource utilization and cost reduction. However, current centralized approaches to oversubscription suffer from the issue with resource mismatch and fail to take into account other performance requirements, e.g., tail latency. In this article we present ROSE, a new resource management platform capable of conducting performance-aware resource oversubscription. ROSE allows latency-sensitive long-running applications (LRAs) to co-exist with computation-intensive batch jobs. Instead of waiting for resource allocation to be confirmed by the centralized scheduler, job managers in ROSE can independently request to launch speculative tasks within specific machines according to their suitability for oversubscription. Node agents of those machines can however, avoid any excessive resource oversubscription by means of a mechanism for admission control using multi-resource threshold control and performance-aware resource throttle. Experiments show that in case of mixed co-location of batch jobs and latency-sensitive LRAs, the CPU utilization and the disk utilization can reach 56.34 and 43.49 percent, respectively, but the 95th percentile of read latency in YCSB workloads only increases by 5.4 percent against the case of executing the LRAs alone.

机译：在集群调度中实现高度资源利用是一项长期的挑战。资源超额预订已成为提高资源利用率和降低成本的普遍做法。但是，当前的集中式超额预订方法存在资源不匹配的问题，并且无法考虑其他性能要求，例如尾部等待时间。在本文中，我们介绍了ROSE，这是一个能够执行性能感知资源超额预订的新资源管理平台。 ROSE允许对延迟敏感的长期运行应用程序（LRA）与计算密集型批处理作业共存。 ROSE中的作业管理者可以根据超额预订的适合性，独立请求在特定计算机内启动推测性任务，而不必等待集中式调度程序确认资源分配。但是，这些机器的节点代理可以通过使用多资源阈值控制和性能感知资源限制的准入控制机制来避免任何过多的资源超额预订。实验表明，在批处理作业和对延迟敏感的LRA混合托管的情况下，CPU利用率和磁盘利用率分别可以达到56.34％和43.49％，但是YCSB工作负载中的读取延迟的第95个百分点仅增加了5.4％。反对仅执行LRA。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2020年第7期|1499-1517|共19页
作者

展开▼
作者单位

Univ Leeds Sch Comp Leeds LS2 91T W Yorkshire England;

Beihang Univ Sch Comp Sci & Engn Beijing 100083 Peoples R China;

Univ Lancaster Sch Comp & Commun Lancaster LA1 4YW England;

Beihang Univ Sch Comp Beijing 100083 Peoples R China|Beihang Univ State Key Lab Software Dev Environm Beijing 100083 Peoples R China;

Newcastle Univ Sch Comp Newcastle Upon Tyne NE1 7RU Tyne & Wear England;

Beihang Univ Beijing Adv Innovat Ctr Big Data & Brain Comp Beijing 100083 Peoples R China;

Alibaba Grp Engn Hangzhou 310052 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Resource scheduling; oversubscription; cluster utilization; resource throttling; QoS;

机译：资源调度;超额认购;集群利用率;资源节流;服务质量;

相似文献

外文文献
中文文献
专利

1. Wind Resource Assessment in the Southern Plains of the US: Characterizing Large-Scale Atmospheric Circulation with Cluster Analysis [J] . Li Dong Atmosphere . 2018,第3期

机译：美国南部平原的风资源评估：通过聚类分析表征大规模大气环流
2. Self-Adaptive Resource Management for Large-Scale Shared Clusters [J] . 李研, 陈峰宏, 孙熙, 计算机科学技术学报：英文版 . 2010,第005期

机译：大型共享集群的自适应资源管理
3. Self-Adaptive Resource Management for Large-Scale Shared Clusters [J] . Yan Li, Feng-Hong Chen, Xi Sun, 计算机科学技术学报（英文版） . 2010,第005期
4. Improving Cluster Resource Efficiency with Oversubscription [C] . Jie Chen, Chun Cao, Ying Zhang, IEEE Annual Computer Software and Applications Conference . 2018

机译：通过超额预订提高集群资源效率
5. ASAP: Automatic speculative acyclic parallelization for clusters. [D] . Kim, Hanjun. 2013

机译：尽快：针对群集的自动推测性非循环并行化。
6. A pig multi-tissue normalised cDNA library: large-scale sequencing cluster analysis and 9K micro-array resource generation [O] . Agnès Bonnet, Eddie Iannuccelli, Karine Hugot, 2008

机译：猪多组织标准化cDNA文库：大规模测序聚类分析和9K微阵列资源生成
7. A pig multi-tissue normalised cDNA library: large-scale sequencing, cluster analysis and 9K micro-array resource generation [O] . Agnès Bonnet, Eddie Iannuccelli, Karine Hugot, 2008

机译：猪多组织标准化cDNA文库：大规模测序，聚类分析和9K微阵列资源生成

Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅