Production Application Performance Data Streaming for System Monitoring

RAM IN IZADPANAH; BENJAMIN A. ALLAN; DAM IAN DECHEV; JIM BRAN DT

首页> 外文期刊>ACM Transactions on Modeling and Performance Evaluation of Computing Systems >Production Application Performance Data Streaming for System Monitoring

【24h】

Production Application Performance Data Streaming for System Monitoring

机译：生产应用程序性能数据流，用于系统监控

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this article, we present an approach to streaming collection of application performance data. Practical application performance tuning and troubleshooting in production high-performance computing (HPC) environments requires an understanding of how applications interact with the platform, including (but not limited to) parallel programming libraries such as Message Passing Interface (MPI). Several profiling and tracing tools exist that collect heavy runtime data traces either in memory (released only at application exit) or on a file system (imposing an I/O load that may interfere with the performance being measured). Although these approaches are beneficial in development stages and post-run analysis, a systemwide and low-overhead method is required to monitor deployed applications continuously. This method must be able to collect information at both the application and system levels to yield a complete performance picture.In our approach, an application profiler collects application event counters. A sampler uses an efficient inter-process communication method to periodically extract the application counters and stream them into an infrastructure for performance data collection. We implement a tool-set based on our approach and integrate it with the Lightweight Distributed Metric Service (LDMS) system, a monitoring system used on large-scale computational platforms. LDMS provides the infrastructure to create and gather streams of performance data in a low overhead manner. We demonstrate our approach using applications implemented with MPI, as it is one of the most common standards for the development of large-scale scientific applications.We utilize our tool-set to study the impact of our approach on an open source HPC application, Nalu. Our tool-set enables us to efficiently identify patterns in the behavior of the application without source-level knowledge. We leverage LDMS to collect system-level performance data and explore the correlation between the system and application events. Also, we demonstrate how our tool-set can help detect anomalies with a low latency. We run tests on two different architectures: a system enabled with Intel Xeon Phi and another system equipped with Intel Xeon processor. Our overhead study shows our method imposes at most 0.5% CPU usage overhead on the application in realistic deployment scenarios.

机译：在本文中，我们提出了一种流式收集应用程序性能数据的方法。生产高性能计算（HPC）环境中的实际应用程序性能调整和故障排除需要了解应用程序如何与平台交互，包括（但不限于）并行编程库，例如消息传递接口（MPI）。存在几种概要分析和跟踪工具，它们可以在内存（仅在应用程序出口处释放）或文件系统（强加可能会影响所测性能的I / O负载）中收集大量的运行时数据跟踪。尽管这些方法在开发阶段和运行后分析中很有用，但仍需要一种系统范围内且开销较低的方法来连续监视已部署的应用程序。此方法必须能够在应用程序和系统级别上收集信息，以产生完整的性能图。在我们的方法中，应用程序分析器收集应用程序事件计数器。采样器使用高效的进程间通信方法来定期提取应用程序计数器，并将它们流式传输到用于性能数据收集的基础结构中。我们根据我们的方法实施工具集，并将其与轻量级分布式度量服务（LDMS）系统集成，该系统是用于大型计算平台的监视系统。 LDMS提供了以低开销的方式创建和收集性能数据流的基础结构。我们使用MPI实现的应用程序演示了我们的方法，因为它是开发大型科学应用程序的最常见标准之一。我们利用工具集来研究我们的方法对开源HPC应用程序Nalu的影响。我们的工具集使我们能够在没有源级知识的情况下有效地识别应用程序行为中的模式。我们利用LDMS来收集系统级性能数据，并探索系统与应用程序事件之间的相关性。此外，我们演示了我们的工具集如何帮助以低延迟检测异常。我们在两种不同的体系结构上运行测试：一个启用了Intel Xeon Phi的系统和另一个配备了Intel Xeon处理器的系统。我们的开销研究表明，在实际的部署方案中，我们的方法在应用程序上最多施加0.5％的CPU使用率开销。

著录项

来源
《ACM Transactions on Modeling and Performance Evaluation of Computing Systems 》 |2019年第2期| 24-48| 共25页
作者
RAM IN IZADPANAH; BENJAMIN A. ALLAN; DAM IAN DECHEV; JIM BRAN DT;
展开▼
作者单位

University of Central Florida;

SandiaNational Laboratories;

University of Central Florida;

Sandia National Laboratories;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Application and system monitoring; performance data streaming; application profiling;

机译：应用程序和系统监控;性能数据流;应用程序分析;

相似文献

外文文献
中文文献
专利

1. Production Application Performance Data Streaming for System Monitoring [J] . RAM IN IZADPANAH, BENJAMIN A. ALLAN, DAM IAN DECHEV, ACM Transactions on Modeling and Performance Evaluation of Computing Systems . 2019 ,第2期

机译：生产应用程序性能数据流进行系统监控
2. Monitoring High Performance Data Streams in Vertical Markets: Theory and Applications in Public Safety and Healthcare [J] . Rafal Maison, Daryl J. Steen, Maciej Zakrzewicz, Bell Labs technical journal . 2011 ,第3期

机译：监视垂直市场中的高性能数据流：公共安全和医疗保健中的理论和应用
3. Online Monitoring System for Stand-Alone Photovoltaic Applications—Analysis of System Performance From Monitored Data [J] . M. Torres, F. J. Munoz, J. V. Munoz, Journal of solar energy engineering . 2012 ,第3期

机译：单机光伏应用在线监测系统-根据监测数据分析系统性能
4. MIC Monitoring of Recirculating Cooling Water Systems: Relationship Between side Stream Monitoring Results and System Performance [C] . Rodney Donlan, Raymond J. Simmen, Alan L.Smith, International water conference annual meeting . 1996

机译：麦克风监测再循环冷却水系统：侧流监测结果与系统性能之间的关系
5. Applications of a New Geodata Crawler for Landscape Ecology: From Mapping Natural Stream Hydrology to Monitoring Endangered Beetles. [D] . Leasure, Douglas Ryan. 2014

机译：新型地理数据爬虫在景观生态学中的应用：从绘制自然溪流水文到监测濒危甲虫。
6. Integrating Novel Data Streams to Support Biosurveillance in Commercial Livestock Production Systems in Developed Countries: Challenges and Opportunities [O] . M. Carolyn Gates, Lindsey K. Holmstrom, Keith E. Biggers, 2015

机译：集成新型数据流以支持发达国家商业畜牧生产系统中的生物监测：挑战和机遇
7. An Evaluation of Data Stream Processing Systems for Data Driven Applications [O] . Samosir Jonathan, Indrawan-Santiago Maria, Haghighi Pari Delir 2016

机译：数据驱动应用的数据流处理系统评估

Production Application Performance Data Streaming for System Monitoring

摘要

著录项

相似文献

相关主题

期刊订阅