首页> 外文期刊>Software, practice & experience >Aggregating data center measurements for availability analysis
【24h】

Aggregating data center measurements for availability analysis

机译:聚合数据中心测量可用性分析

获取原文
获取原文并翻译 | 示例

摘要

A data center infrastructure is composed of heterogeneous resources divided into three main subsystems: IT (processor, memory, disk, network, etc.), power (generators, power transformers, uninterruptible power supplies, distribution units, among others), and cooling (water chillers, pipes, and cooling tower). This heterogeneity brings challenges for collecting and gathering data from several devices in the infrastructure. In addition, extracting relevant information is another challenge for data center managers. While seeking to improve the cloud availability, monitoring the entire infrastructure using a variety of (open source and/or commercial) advanced monitoring tools, such as Zabbix, Nagios, Prometheus, CloudWatch, AzureWatch, and others is required. It is often common to use many monitoring systems to collect real-time data for data center components from different subsystems. Such an environment brings an inherent challenge stemming from the need to aggregate and organize the whole collected infrastructure data and measurements. This first step is necessary prior to obtaining any valuable insights for decision-making. In this paper, we present the Data Center Availability (DCA) System, a software system that is able to aggregate and analyze data center measurements aimed toward the study of DCA. We also discuss the DCA implementation and illustrate its operation, monitoring a small University research laboratory data center. The DCA System is able to monitor different types of devices using the Zabbix tool, such as servers, switches, and power devices. The DCA System is able to automatically identify the failure time seasonality and trend present in the collected data from different devices of the data center.
机译:数据中心基础架构由异构资源组成,分为三个主要子系统:它(处理器,内存,磁盘,网络等),电源(发电机,电力变压器,不间断电源,配电单元等)和冷却(水冷钻,管道和冷却塔)。这种异质性带来了从基础设施中的几个设备收集和收集数据的挑战。此外,提取相关信息是数据中心管理人员的另一个挑战。在寻求提高云可用性的同时,使用各种(开源和/或商业)高级监控工具(如Zabbix,Nagios,Prometheus,CloudWatch,Azurevatch等)监控整个基础架构。通常常常使用许多监控系统来从不同子系统中收集数据中心组件的实时数据。这样的环境带来了源于聚合的需要和组织整个收集的基础设施数据和测量的固有挑战。在获得决策的任何有价值的见解之前,这第一步是必要的。在本文中,我们介绍了数据中心可用性(DCA)系统,这是一种能够聚合和分析旨在研究DCA的数据中心测量的软件系统。我们还讨论了DCA实施,并说明了其运作,监测小型大学研究实验室数据中心。 DCA系统能够使用Zabbix工具(例如服务器,交换机和电源设备)监控不同类型的设备。 DCA系统能够自动识别来自数据中心的不同设备的收集数据中的故障时间季节性和趋势。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号