首页> 外文会议>International Conference ISC High Performance: International Conference on High Performance Computing >Semi-automatic Assessment of I/O Behavior by Inspecting the Individual Client-Node Timelines-An Explorative Study on 10~6 Jobs
【24h】

Semi-automatic Assessment of I/O Behavior by Inspecting the Individual Client-Node Timelines-An Explorative Study on 10~6 Jobs

机译:通过检查各个客户端节点时间表来半自动地评估I / O行为-10到6个工作的探索性研究

获取原文

摘要

HPC applications with suboptimal I/O behavior interfere with well-behaving applications and lead to increased application runtime. In some cases, this may even lead to unresponsive systems and unfinished jobs. HPC monitoring systems can aid users and support staff to identify problematic behavior and support optimization of problematic applications. The key issue is how to identify relevant applications? A profile of an application doesn't allow identifying problematic phases during the execution but tracing of each individual I/O is too invasive.In this work, we split the execution into segments, i.e., windows of fixed size and analyze profiles of them. We develop three I/O metrics to identify three relevant classes of inefficient I/O behaviors, and evaluate them on raw data of 1,000,000 jobs on the supercomputer Mistral. The advantages of our method is that temporal information about I/O activities during job runtime is preserved to some extent and can be used to identify phases of inefficient I/O.The main contribution of this work is the segmentation of time series and computation of metrics (Job-I/O-Utilization, Job-I/O-Problem-Time, and Job-I/O-Balance) that are effective to identify problematic I/O phases and jobs.
机译:I / O行为欠佳的HPC应用程序会干扰运行良好的应用程序,并导致应用程序运行时间增加。在某些情况下,这甚至可能导致系统无响应和未完成的工作。 HPC监视系统可以帮助用户和支持人员识别有问题的行为并支持有问题的应用程序的优化。关键问题是如何识别相关应用程序?应用程序的配置文件不允许在执行期间识别有问题的阶段,但是跟踪每个单独的I / O太麻烦了。在这项工作中,我们将执行分为多个部分,即固定大小的窗口并分析它们的配置文件。我们开发了三个I / O指标,以识别低效率I / O行为的三个相关类别,并根据超级计算机Mistral上的1,000,000个作业的原始数据对它们进行评估。我们方法的优点是可以在一定程度上保留有关作业运行时I / O活动的时间信息,并可用于识别效率低下的I / O阶段。这项工作的主要贡献是时间序列的分段和计算指标(作业I / O使用率,作业I / O问题时间和作业I / O平衡)可有效地识别有问题的I / O阶段和作业。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号