首页> 中文期刊> 《高能物理与核物理计算国际会议公报:英文版》 >Fermilab Distributed Monitoring System(NGOP)

Fermilab Distributed Monitoring System(NGOP)

         

摘要

A Distributed Monitoring System(NGOP)that will scale to the anticipated requirements for RUn II computing has been under development at Fermilab.NGOP [1] provides a framework to create Monitoring Agents for monitoring the overall state of computers and software that are running on them.Several Monitoring Agents are available within NGOP that are capable of analyzing log files,and checking existence of system daemons,CPU and memory utilization,etc,NGOP also provides customizable graphical hierarchical representations of these monitored systems.NGOP is able to generate events when serious problems have occurred as well as raising alarms when potential problems have been detected.NGOP allows performing correctiv actions or sending notifications,NGOP provides persistent storage for collected events,alarms and actions.A first implementation of NGOP was recently deployed at Fermilab.This is a fully functional prototype that satisfies most of the existing requirements.For the time being the NGOP prototype is monitoring 512 nodes.During the first few months of running NGOP has proved to be a useful tool.Multiple problems such as node resets,offline CPUs,and dead system daemons have been detected.NGOP provided system administrators with information required for better system tuning and configuration.The current state of deployment and future steps to improve the prototype and to implement some new features will be presented.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号