首页> 外文学位 >Simplifying system management through automated forecasting, diagnosis, and configuration tuning.
【24h】

Simplifying system management through automated forecasting, diagnosis, and configuration tuning.

机译:通过自动预测​​,诊断和配置调整来简化系统管理。

获取原文
获取原文并翻译 | 示例

摘要

Large-scale networked computing systems are widely deployed to run business-critical applications in environments where changes are frequent. Manual management of these complex systems can be tedious and error-prone. Meanwhile, the high costs of application downtime make it critical to ensure system availability and reliability. Recent progress in monitoring tools enables system administrators to collect fine-grained data about system activity with low overhead. This data provides valuable information for system management. However, the monitoring data collected from production systems is massive in size and noisy; which makes it hard for system administrators to fully utilize this data for effective system management.;This dissertation describes a data-management platform, called Fa, where system administrators can pose declarative queries over system monitoring data. Fa automatically finds fairly accurate and efficient execution plans for given queries, and returns query results in easy-to-interpret formats. Fa supports three key query types, namely, forecasting queries (for predicting or detecting performance problems), diagnosis queries (for finding the cause of performance problems), and tuning queries (for recommending changes to system configuration to resolve diagnosed problems): (a) For processing diagnosis queries, Fa constructs problem signatures from system monitoring data to identify recurrent problems and to reuse past diagnostic information. For a rare or new problem, Fa employs an anomaly-based clustering technique to generate performance baselines and to characterize the deviation from baselines to pinpoint root causes. Fa also incorporates an active-learning component that identifies diagnosis queries whose results, if provided or confirmed by system administrators, can be used to update problem signatures and to improve the accuracy and efficiency for processing future queries. (b) For processing tuning queries to resolve problems caused by system misconfiguration, Fa employs an adaptive sampling algorithm that plans experiments to efficiently identify high-impact configuration parameters and high-performance settings. These experiments bring in information---required for generating accurate query results---that is missing in the monitoring data collected so far. (c) For both one-time and continuous forecasting queries, Fa automatically searches for efficient execution plans in a large space of plans composed of data-transformation operators as well as synopsis-learning and prediction operators. Forecasting queries can be composed with diagnosis and tuning queries to enable proactive system management that avoids potential problems.;We have evaluated the Fa platform with monitoring data collected from database-backed multitier services, and with synthetic data that models the noisy nature of monitoring data from production systems. Our evaluation shows that Fa's query plan selection and execution strategies provide actionable information for system management automatically, accurately, and efficiently. Critical features like reliable confidence estimates, robustness to noise, and providing supporting evidence for query results make Fa a practical and useful platform.
机译:大规模的网络计算系统被广泛部署以在频繁变化的环境中运行关键业务应用程序。这些复杂系统的手动管理可能很乏味且容易出错。同时,应用程序停机造成的高昂成本对确保系统可用性和可靠性至关重要。监视工具的最新进展使系统管理员可以以较低的开销收集有关系统活动的细粒度数据。该数据为系统管理提供了有价值的信息。但是,从生产系统收集的监视数据规模巨大且嘈杂。因此,系统管理员很难充分利用这些数据来进行有效的系统管理。本文介绍了一个名为Fa的数据管理平台,系统管理员可以在该平台上对系统监视数据进行声明式查询。 Fa会自动为给定查询找到相当准确和高效的执行计划,并以易于理解的格式返回查询结果。 Fa支持三种关键查询类型,即预测查询(用于预测或检测性能问题),诊断查询(用于查找性能问题的原因)和调优查询(用于建议对系统配置进行更改以解决已诊断的问题):(a )对于处理诊断查询,Fa从系统监视数据中构造问题签名,以识别重复出现的问题并重用过去的诊断信息。对于罕见或新问题,Fa采用基于异常的聚类技术来生成性能基准并表征与基准之间的偏差以查明根本原因。 Fa还集成了一个主动学习组件,该组件可标识诊断查询,如果系统管理员提供或确认了诊断查询,其结果可用于更新问题签名并提高处理未来查询的准确性和效率。 (b)为了处理调优查询以解决由系统配置错误引起的问题,Fa采用了自适应采样算法,该算法可计划实验以有效识别高影响力的配置参数和高性能设置。这些实验引入了信息(这是生成准确查询结果所必需的信息),而这些信息是迄今为止收集的监视数据所缺少的。 (c)对于一次性和连续预测查询,Fa会自动在由数据转换运算符以及提要学习和预测运算符组成的大量计划中搜索有效的执行计划。预测查询可以与诊断查询和优化查询一起使用,以实现主动系统管理,从而避免潜在的问题。我们已经使用从数据库支持的多层服务收集的监视数据以及对监视数据的嘈杂性质进行建模的综合数据对Fa平台进行了评估从生产系统。我们的评估表明,Fa的查询计划选择和执行策略可自动,准确和高效地为系统管理提供可操作的信息。可靠的置信度估计,抗噪声能力以及为查询结果提供支持证据等关键功能使Fa成为一个实用而有用的平台。

著录项

  • 作者

    Duan, Songyun.;

  • 作者单位

    Duke University.;

  • 授予单位 Duke University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 261 p.
  • 总页数 261
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号