首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >PerfCompass: Online Performance Anomaly Fault Localization and Inference in Infrastructure-as-a-Service Clouds
【24h】

PerfCompass: Online Performance Anomaly Fault Localization and Inference in Infrastructure-as-a-Service Clouds

机译:PerfCompass:基础架构即服务云中的在线性能异常故障本地化和推断

获取原文
获取原文并翻译 | 示例

摘要

Infrastructure-as-a-service clouds are becoming widely adopted. However, resource sharing and multi-tenancy have made performance anomalies a top concern for users. Timely debugging those anomalies is paramount for minimizing the performance penalty for users. Unfortunately, this debugging often takes a long time due to the inherent complexity and sharing nature of cloud infrastructures. When an application experiences a performance anomaly, it is important to distinguish between faults with a global impact and faults with a local impact as the diagnosis and recovery steps for faults with a global impact or local impact are quite different. In this paper, we present PerfCompass, an online performance anomaly fault debugging tool that can quantify whether a production-run performance anomaly has a global impact or local impact. PerfCompass can use this information to suggest the root cause as either an external fault (e.g., environment-based) or an internal fault (e.g., software bugs). Furthermore, PerfCompass can identify top affected system calls to provide useful diagnostic hints for detailed performance debugging. PerfCompass does not require source code or runtime application instrumentation, which makes it practical for production systems. We have tested PerfCompass by running five common open source systems (e.g., Apache, MySQL, Tomcat, Hadoop, Cassandra) inside a virtualized cloud testbed. Our experiments use a range of common infrastructure sharing issues and real software bugs. The results show that PerfCompass accurately classifies 23 out of the 24 tested cases without calibration and achieves 100 percent accuracy with calibration. PerfCompass provides useful diagnosis hints within several minutes and imposes negligible runtime overhead to the production system during normal execution time.
机译:基础架构即服务云正在被广泛采用。但是,资源共享和多租户已使性能异常成为用户的头等大事。及时调试这些异常对于最大程度地降低用户的性能损失至关重要。不幸的是,由于云基础架构固有的复杂性和共享性,该调试通常需要很长时间。当应用程序遇到性能异常时,重要的是要区分具有全局影响的故障和具有局部影响的故障,因为具有全局影响或局部影响的故障的诊断和恢复步骤是完全不同的。在本文中,我们介绍了PerfCompass,这是一种在线性能异常故障调试工具,可以量化生产运行性能异常是对全局还是局部的影响。 PerfCompass可以使用此信息来建议根本原因是外部故障(例如,基于环境的故障)还是内部故障(例如,软件错误)。此外,PerfCompass可以识别受影响最大的系统调用,以提供有用的诊断提示以进行详细的性能调试。 PerfCompass不需要源代码或运行时应用程序检测,这使其在生产系统中很实用。我们已经通过在虚拟化的云测试平台中运行五个常见的开源系统(例如Apache,MySQL,Tomcat,Hadoop,Cassandra)对PerfCompass进行了测试。我们的实验使用了一系列常见的基础架构共享问题和实际的软件错误。结果表明,PerfCompass无需分类即可对24个测试案例中的23个进行准确分类,并通过校准达到100%的准确性。 PerfCompass可在几分钟内提供有用的诊断提示,并且在正常执行时间内对生产系统造成的运行时开销可忽略不计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号