首页> 外文OA文献 >WallMon : Interactive distributed monitoring of process-level resource usage on display and compute clusters
【2h】

WallMon : Interactive distributed monitoring of process-level resource usage on display and compute clusters

机译:Wallmon:对显示和计算集群上的流程级资源使用情况进行交互式分布式监控

摘要

To achieve low overhead, traditional cluster monitoring systems sample data at low frequencies and with coarse granularity. However, interactive monitoring requires frequent (up to 60 Hz) sampling of fine-grained data and visualization tools that can explore and display data in near real-time. This makes traditional cluster monitoring systems unsuited for interactive monitoring of distributed cluster applications, as they fail to capture short-duration events, making understanding the performance relationship between processes on the same or different nodes difficult. To address this issue, WallMon was developed, a tool for interactive visual exploration of performance behaviors in distributed systems. For gathering of data, WallMon is centered around an abstraction of collectors and handlers; collectors gathers data of interest, such as CPU and memory usage, and forwards it to handlers in a push-based fashion, while handlers take action upon the data. WallMon captures and visualizes data for every process on every node, as well as overall node statistics. Data is visualized using a technique inspired by the concept of information flocking. WallMon's design is based on the client-server model, and it is extensible through a module system that encapsulates functionality specific to monitoring (collectors) and visualization (handlers). A set of experiments have been carried out on a cluster of 29 nodes with 180 processes per node. Performance results show 7% (of 100) CPU usage at 64 Hz sampling rate when performing process-level monitoring with WallMon. Using WallMon's interactive visualization, we have observed interesting patterns in different parallel and distributed systems, such as unexpected ratio of user- and kernel-level execution among processes in a particular distributed system.
机译:为了实现低开销,传统的群集监视系统以低频和粗粒度对数据进行采样。但是,交互式监视需要频繁(高达60 Hz)的细粒度数据采样和可视化工具,这些工具可以近乎实时地浏览和显示数据。这使得传统的集群监视系统不适合用于分布式集群应用程序的交互式监视,因为它们无法捕获短期事件,从而使理解相同或不同节点上的进程之间的性能关系变得困难。为了解决这个问题,开发了WallMon,这是一种用于交互式可视化探索分布式系统中性能行为的工具。为了收集数据,WallMon围绕抽象的收集器和处理程序。收集器收集感兴趣的数据(例如CPU和内存使用情况),并以基于推送的方式将其转发给处理程序,而处理程序则对数据进行操作。 WallMon捕获并可视化每个节点上每个进程的数据,以及整体节点统计信息。使用受信息聚集概念启发的技术对数据进行可视化。 WallMon的设计基于客户端-服务器模型,并且可以通过模块系统进行扩展,该模块系统封装了特定于监视(收集器)和可视化(处理程序)的功能。在29个节点的群集上进行了一组实验,每个节点180个进程。性能结果显示,在使用WallMon执行进程级监视时,在64 Hz采样率下有7%(共100个)CPU使用率。使用WallMon的交互式可视化,我们观察到了在不同的并行和分布式系统中的有趣模式,例如特定分布式系统中进程之间用户级和内核级执行的意外比例。

著录项

  • 作者

    Nilsen Arild;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号