首页> 外文期刊>Future generation computer systems >End-to-end online performance data capture and analysis for scientific workflows
【24h】

End-to-end online performance data capture and analysis for scientific workflows

机译:科学工作流的端到端在线性能数据捕获和分析

获取原文
获取原文并翻译 | 示例

摘要

With the increased prevalence of employing workflows for scientific computing and a push towards exascale computing, it has become paramount that we are able to analyze characteristics of scientific applications to better understand their impact on the underlying infrastructure and vice-versa. Such analysis can help drive the design, development, and optimization of these next generation systems and solutions. In this paper, we present the architecture, integrated with existing well-established and newly developed tools, to collect online performance statistics of workflow executions from various, heterogeneous sources and publish them in a distributed database (Elasticsearch). Using this architecture, we are able to correlate online workflow performance data, with data from the underlying infrastructure, and present them in a useful and intuitive way via an online dashboard. We have validated our approach by executing two classes of real-world workflows, both under normal and anomalous conditions. The first is an I/O-intensive genome analysis workflow; the second, a CPU- and memory-intensive material science workflow. Based on the data collected in Elasticsearch, we are able to demonstrate that we can correctly identify anomalies that we injected. The resulting end-to-end data collection of workflow performance data is an important resource of training data for automated machine learning analysis.
机译:随着对科学计算的工作流程的普遍性增加和推动Exascale计算,我们能够分析科学应用的特点,以更好地了解他们对基础设施的影响,反之亦然。这种分析可以帮助推动这些下一代系统和解决方案的设计,开发和优化。在本文中,我们介绍了与现有良好的良好建立和新开发的工具集成的架构,以从各种异构源和在分布式数据库(Elasticsearch)中发布它们的工作流程执行的在线性能统计数据。使用此架构,我们能够将在线工作流性能数据与来自底层基础架构的数据相关联,并通过在线仪表板以有用而直观的方式展示它们。我们通过在正常和异常的条件下执行两种现实世界工作流程来验证了我们的方法。首先是I / O密集型基因组分析工作流程;第二,一个CPU和内存密集型材料科学工作流程。基于Elasticsearch中收集的数据,我们能够证明我们可以正确识别我们注入的异常。由此产生的工作流性能数据的端到端数据收集是自动化机器学习分析的培训数据的重要资源。

著录项

  • 来源
    《Future generation computer systems》 |2021年第4期|387-400|共14页
  • 作者单位

    University of Southern California Information Sciences Institute Marina del Rey CA USA;

    RENCI University of North Carolina Chapel Hill NC USA;

    University of Southern California Information Sciences Institute Marina del Rey CA USA;

    University of Southern California Information Sciences Institute Marina del Rey CA USA;

    RENCI University of North Carolina Chapel Hill NC USA;

    Data Science and Learning Division Argonne National Laboratory IL USA University of Chicago Chicago IL USA;

    University of Southern California Information Sciences Institute Marina del Rey CA USA;

    University of Southern California Information Sciences Institute Marina del Rey CA USA;

    Energy Sciences Network (ESnet) Lawrence Berkeley National Labs CA USA;

    Oak Ridge National Laboratory TN USA;

    Data Science and Learning Division Argonne National Laboratory IL USA University of Chicago Chicago IL USA;

    University of Southern California Information Sciences Institute Marina del Rey CA USA;

    Oak Ridge National Laboratory TN USA;

    Data Science and Learning Division Argonne National Laboratory IL USA University of Chicago Chicago IL USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Scientific workflows; Online performance monitoring; Extreme scale;

    机译:科学工作流;在线性能监测;极端鳞片;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号