Learning-Based Characterizing and Modeling Performance Bottlenecks of Big Data Workloads

机译：大数据工作量的基于学习的表征和建模性能瓶颈

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As the increasing demands of large-scale data analytics, the understanding of performance bottlenecks on big data workloads becomes critical for the optimization of distribution platforms. Existing work focused on qualitatively characterizing the behaviors and performance of workloads. However little effort has been spent on quantification of performance bottlenecks and building bottleneck models. In this paper, we define a series of bottleneck ratios to quantify bottlenecks according to resource utilizations. Then based on features parsed from original logs, a stage-level modeling approach is proposed to characterize bottlenecks of workloads. By modeling, we can estimate bottleneck ratios using original logs, without collecting resource utilizations. To generalize the models for diverse workloads, we propose a workload generator: TrainBench, which is flexible to generate workloads with multifarious behaviors at stage-level. In addition, taking hardware performance into account, three key features are extracted to improve the estimation accuracy. Our bottleneck models perform well for diverse workloads in different clusters.

机译：随着大规模数据分析需求的增长，对大数据工作负载的性能瓶颈的了解对于优化分发平台至关重要。现有工作集中于定性地描述工作负载的行为和性能。但是，在性能瓶颈的量化和构建瓶颈模型上的投入很少。在本文中，我们定义了一系列瓶颈比率，以根据资源利用率量化瓶颈。然后基于从原始日志解析的功能，提出了一种阶段级建模方法来表征工作负载的瓶颈。通过建模，我们可以使用原始日志估算瓶颈比率，而无需收集资源利用率。为了概括各种工作负载的模型，我们提出了一个工作负载生成器：TrainBench，它可以灵活地在阶段级生成具有多种行为的工作负载。此外，考虑到硬件性能，提取了三个关键特征以提高估计精度。我们的瓶颈模型对于不同集群中的各种工作负载表现良好。

著录项

来源
《IEEE International Conference on High Performance Computing and Communications;IEEE International Conference on Smart City;IEEE International Conference on Data Science and Systems》|2016年|860-867|共8页
会议地点
作者
Zhongxin Guo; Zheng Hu; Chunhong Zhang; Youer Pu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Resource management; Computational modeling; Benchmark testing; Sparks; Data models; Load modeling; Big data;

机译：资源管理;计算建模;基准测试;火花;数据模型;负载建模;大数据;

相似文献

外文文献
中文文献
专利

1. BiPhase adaptive learning-based neural network model for cloud datacenter workload forecasting [J] . Kumar Jitendra, Saxena Deepika, Singh Ashutosh Kumar, Soft computing: A fusion of foundations, methodologies and applications . 2020,第19期

机译：基于Biphase自适应学习的云数据中心工作量预测的神经网络模型
2. Robust Identification of Thermal Models for In-Production High-Performance-Computing Clusters With Machine Learning-Based Data Selection [J] . Pittino Federico, Diversi Roberto, Benini Luca, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第10期

机译：基于机器学习的数据选择的生产高性能计算群集热模型的鲁棒识别
3. Adaptive Performance Modeling of Data-intensive Workloads for Resource Provisioning in Virtualized Environment [J] . HOSEIN MOHAMAMDI MAKRANI, HOSSEIN SAYADI, NAJMEH NAZARI, ACM Transactions on Modeling and Performance Evaluation of Computing Systems . 2020,第4期

机译：虚拟环境资源配置数据密集型工作负载的自适应性能建模
4. Learning-Based Characterizing and Modeling Performance Bottlenecks of Big Data Workloads [C] . Zhongxin Guo, Zheng Hu, Chunhong Zhang, IEEE International Conference on High Performance Computing and Communications . 2016

机译：基于学习的特征和建模性能瓶颈大数据工作负载
5. The Effects of Formative Feedback on Driver Performance and Modeling Risk Using Driver Workload Data [D] . Silber, Hannah Rachel. 2018

机译：形成性反馈对使用驾驶员工作量数据的驾驶员绩效和建模风险的影响
6. Learning-based Stochastic Object Models for Characterizing Anatomical Variations [O] . Steven R. Dolly, Yang Lou, Mark A. Anastasio, -1

机译：基于学习的随机对象模型用于表征解剖变异
7. SCRAP: A Statistical Approach for Creating a Database Query Workload Based on Performance Bottlenecks [O] . James Skarie, Biplob K. Debnath, David J. Lilja, 2014

机译：sCRap：基于性能瓶颈创建数据库查询工作量的统计方法
8. Performance Modeling of a Bottleneck Node in an IEEE 802.11 Ad-Hoc Network; Probability rept [R] . Van den Berg, J. L., Mandjes, M. R. H., Roijers, F. 2006

机译：瓶颈节点在IEEE 802.11 ad-hoc网络的性能建模;概率REpT

Learning-Based Characterizing and Modeling Performance Bottlenecks of Big Data Workloads

摘要

著录项

相似文献

相关主题

期刊订阅