首页> 外文会议>BenchCouncil International Symposium on Benchmarking, Measuring, and Optimizing >Anomaly Analysis and Diagnosis for Co-located Datacenter Workloads in the Alibaba Cluster
【24h】

Anomaly Analysis and Diagnosis for Co-located Datacenter Workloads in the Alibaba Cluster

机译:异常分析和诊断阿里巴巴集群中的共同数据中心工作负载

获取原文

摘要

In warehouse-scale cloud datacenters, co-locating online services and offline batch jobs is an efficient approach to improving datacenter utilization. In this paper, we perform a deep analysis on the released Alibaba workload dataset, from the perspective of anomaly analysis and diagnosis, we first performed raw data preprocessing, including data supplementing, filtering, correlation and aggregation, and generating the container-level, batch-level and server-level resource usage data finally. Then based on the summary data, we illustrate the overall cluster usage distribution of online container services and batch jobs. Obviously, there are several abnormal nodes in the co-located cluster, and we explore the causes of anomalies from three aspects: (1) unbalanced co-located workloads distribution; (2) skew co-located workload resource utilization; (3) system failures or job instance failures. In addition, we also give some cases of abnormal nodes, which show that frequent system failures and unbalanced workload distribution have a great impact on abnormal nodes, the skew co-located workload resource utilization and frequent instance failures are the causes of abnormalities, too.
机译:在仓库规模的云数据中心,共同定位在线服务和离线批处理作业是提高数据中心利用率的有效方法。在本文中,我们对释放的阿里巴巴工作负载数据集进行了深入的分析,从异常分析和诊断的角度来看,我们首先进行了原始数据预处理,包括数据补充,过滤,关联和聚合,并生成容器级,批量生产--Level和服务器级资源使用数据最终。然后基于摘要数据,我们说明了在线容器服务和批处理作业的整体群集使用分发。显然,共同集群中有几个异常的节点,我们探讨了三个方面的异常原因:(1)不平衡共同定位的工作负载分布; (2)歪斜共同工作负载资源利用; (3)系统故障或作业实例故障。此外,我们还给出了一些异常节点的情况,表明频繁的系统故障和不平衡的工作负载分布对异常节点产生了很大影响,偏斜的工作负载资源利用和频繁的实例故障也是异常的原因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号