【24h】

Comprehensive Resource Use Monitoring for HPC Systems with TACC Stats

机译:具有TACC统计信息的HPC系统的全面资源使用监控

获取原文

摘要

This paper reports on a comprehensive, fully automated resource use monitoring package, TACC Stats, which enables both consultants, users and other stakeholders in an HPC system to systematically and actively identify jobs/applications that could benefit from expert support and to aid in the diagnosis of software and hardware issues. TACC Stats continuously collects and analyzes resource usage data for every job run on a system and differs significantly from conventional profilers because it requires no action on the part of the user or consultants -- it is always collecting data on every node for every job. TACC Stats is open source and downloadable, configurable and compatible with general Linux-based computing platforms, and extensible to new CPU architectures and hardware devices. It is meant to provide a comprehensive resource usage monitoring solution. In addition to describing TACC Stats, the paper illustrates its application to identifying production jobs which have inefficient resource use characteristics.
机译:本文报告了一个全面的,完全自动化的资源使用监视程序包TACC Stats,它使HPC系统中的顾问,用户和其他利益相关者能够系统地,积极地确定可以从专家支持中受益并有助于诊断的工作/应用程序。软硬件问题。 TACC Stats持续收集和分析系统上运行的每个作业的资源使用情况数据,并且与传统的探查器有很大不同,因为它不需要用户或顾问的任何操作-它始终在为每个作业收集每个节点上的数据。 TACC Stats是开源的,可下载,可配置且与基于Linux的常规计算平台兼容,并且可扩展到新的CPU体系结构和硬件设备。它旨在提供一个全面的资源使用情况监视解决方案。除了描述TACC Stats之外,本文还说明了其在识别具有低效资源使用特性的生产作业中的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号