首页> 外文会议>IEEE International Conference on Cluster Computing >Global Experiences with HPC Operational Data Measurement, Collection and Analysis
【24h】

Global Experiences with HPC Operational Data Measurement, Collection and Analysis

机译:HPC操作数据测量,收集和分析的全球经验

获取原文

摘要

As we move into the exascale era, supercomputers grow larger, denser, more heterogeneous, and ever more complex. Operating such machines reliably and efficiently requires deep insight into the operational parameters of the machine itself as well as its supporting infrastructure. To fulfill this need, early adopter sites have started the development and deployment of Operational Data Analytics (ODA) frameworks allowing the continuous monitoring, archiving, and analysis of near realtime performance data from the machine and infrastructure levels, providing immediately actionable information for multiple operational uses. To understand their ODA goals, requirements, and use cases, we have conducted a survey among eight early adopter sites from the US, Europe, and Japan that operate top 50 high-performance computing systems. We have assessed the technologies leveraged to build their ODA frameworks, identified use cases and other push and pull factors that drive the sites' ODA activities, and report on their operational lessons.
机译:随着我们进入万亿级时代,超级计算机变得越来越大,越来越密集,越来越异构,越来越复杂。要可靠,高效地操作此类机器,需要深入了解机器本身及其支持基础结构的操作参数。为了满足这一需求,早期采用者站点已开始开发和部署运营数据分析(ODA)框架,从而可以对机器和基础架构级别的近实时性能数据进行连续监视,归档和分析,从而为多个运营提供即时可操作的信息。用途。为了了解他们的ODA目标,要求和用例,我们对来自美国,欧洲和日本的八个采用50强高性能计算系统的早期采用者站点进行了调查。我们评估了用于构建其ODA框架的技术,确定了驱动站点ODA活动的用例和其他推动因素,并报告了其运营经验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号