【24h】

Learning-Based Characterizing and Modeling Performance Bottlenecks of Big Data Workloads

机译:大数据工作量的基于学习的表征和建模性能瓶颈

获取原文

摘要

As the increasing demands of large-scale data analytics, the understanding of performance bottlenecks on big data workloads becomes critical for the optimization of distribution platforms. Existing work focused on qualitatively characterizing the behaviors and performance of workloads. However little effort has been spent on quantification of performance bottlenecks and building bottleneck models. In this paper, we define a series of bottleneck ratios to quantify bottlenecks according to resource utilizations. Then based on features parsed from original logs, a stage-level modeling approach is proposed to characterize bottlenecks of workloads. By modeling, we can estimate bottleneck ratios using original logs, without collecting resource utilizations. To generalize the models for diverse workloads, we propose a workload generator: TrainBench, which is flexible to generate workloads with multifarious behaviors at stage-level. In addition, taking hardware performance into account, three key features are extracted to improve the estimation accuracy. Our bottleneck models perform well for diverse workloads in different clusters.
机译:随着大规模数据分析需求的增长,对大数据工作负载的性能瓶颈的了解对于优化分发平台至关重要。现有工作集中于定性地描述工作负载的行为和性能。但是,在性能瓶颈的量化和构建瓶颈模型上的投入很少。在本文中,我们定义了一系列瓶颈比率,以根据资源利用率量化瓶颈。然后基于从原始日志解析的功能,提出了一种阶段级建模方法来表征工作负载的瓶颈。通过建模,我们可以使用原始日志估算瓶颈比率,而无需收集资源利用率。为了概括各种工作负载的模型,我们提出了一个工作负载生成器:TrainBench,它可以灵活地在阶段级生成具有多种行为的工作负载。此外,考虑到硬件性能,提取了三个关键特征以提高估计精度。我们的瓶颈模型对于不同集群中的各种工作负载表现良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号