首页> 外文会议>Distributed Computing Systems, 2009. ICDCS '09 >REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems
【24h】

REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems

机译:REMO:大型分布式系统的资源感知应用程序状态监视

获取原文

摘要

To observe, analyze and control large scale distributed systems and the applications hosted on them, there is an increasing need to continuously monitor performance attributes of distributed system and application states. This results in application state monitoring tasks that require fine-grained attribute information to be collected from relevant nodes efficiently. Existing approaches either treat multiple application state monitoring tasks independently and build ad-hoc monitoring trees for each task, or construct a single static monitoring tree for multiple tasks. We argue that a careful planning of multiple application state monitoring tasks by jointly considering multi-task optimization and node level resource constraints can provide significant gains in performance and scalability. In this paper, we present REMO, a REsource-aware application state MOnitoring system. REMO produces a forest of optimized monitoring trees through iterations of two phases, one phase exploring cost sharing opportunities via estimation and the other refining the monitoring plan through resource-sensitive tree construction. Our experimental results include those gathered by deploying REMO on a BlueGene/P rack running IBM's large-scale distributed streaming system - System S. Using REMO running over 200 monitoring tasks for an application deployed across 200 nodes results in a 35%-45% decrease in the percentage error of collected attributes compared to existing schemes.
机译:为了观察,分析和控制大型分布式系统及其上托管的应用程序,越来越需要不断监视分布式系统的性能属性和应用程序状态。这导致应用程序状态监视任务,需要高效地从相关节点收集细粒度的属性信息。现有方法或者独立地处理多个应用程序状态监视任务,并为每个任务构建临时监视树,或者为多个任务构建单个静态监视树。我们认为,通过联合考虑多任务优化和节点级资源约束,仔细计划多个应用程序状态监视任务可以显着提高性能和可伸缩性。在本文中,我们介绍了REMO,这是一种可感知资源的应用程序状态监控系统。 REMO通过两个阶段的迭代产生了一个优化的监视树森林,一个阶段通过估算来探索成本分摊的机会,另一阶段通过对资源敏感的树的构建来完善监视计划。我们的实验结果包括通过在运行IBM大型分布式流传输系统System S的BlueGene / P机架上部署REMO所收集的结果。对于在200个节点上部署的应用程序,使用REMO运行200多个监视任务将减少35%-45%与现有方案相比,收集的属性的百分比误差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号