【24h】

Comprehensive Resource Use Monitoring for HPC Systems with TACC Stats

机译:具有TACC统计数据的HPC系统的综合资源使用监控

获取原文

摘要

This paper reports on a comprehensive, fully automated resource use monitoring package, TACC Stats, which enables both consultants, users and other stakeholders in an HPC system to systematically and actively identify jobs/applications that could benefit from expert support and to aid in the diagnosis of software and hardware issues. TACC Stats continuously collects and analyzes resource usage data for every job run on a system and differs significantly from conventional profilers because it requires no action on the part of the user or consultants -- it is always collecting data on every node for every job. TACC Stats is open source and downloadable, configurable and compatible with general Linux-based computing platforms, and extensible to new CPU architectures and hardware devices. It is meant to provide a comprehensive resource usage monitoring solution. In addition to describing TACC Stats, the paper illustrates its application to identifying production jobs which have inefficient resource use characteristics.
机译:本文有关全面,全自动资源使用监控包,TACC统计数据的报告,可在系统性地支持HPC系统中进行顾问,用户和其他利益相关者,以系统地,积极地识别可能受益于专家支持的工作/应用程序,并帮助诊断软件和硬件问题。 TACC统计数据持续收集和分析系统上的每项作业的资源使用数据,并且与传统分析器有显着不同,因为它不需要在用户或顾问的一部分中的任何操作 - 它总是在每个作业的每个节点上收集数据。 TACC统计数据是开源和可下载的,可配置的,并兼容基于常规的基于Linux的计算平台,并扩展到新的CPU架构和硬件设备。它意味着提供全面的资源使用监控解决方案。除了描述TACC统计数据外,本文还说明其应用于识别具有低效资源使用特征的生产作业的应用。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号