Gray Box Modeling Methodology for Runtime Prediction of Apache Spark Jobs

机译：Apache Spark工作的运行时预测灰色盒子建模方法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Nowadays, many data centers facilitate data processing and acquisition by developing multiple Apache Spark jobs which can be executed in private clouds with various parameters. Each job might take various application parameters which influence its execution time. Some examples of application parameters can be a selected area of interest in spatiotemporal data processing application or a time range of events in a complex event stream processing application. To predict its runtime accurately, these application parameters shall be considered during constructing its runtime model. Runtime prediction of Spark jobs allows us to schedule them efficiently in order to utilize cloud resources, increase system throughput, reduce job latency and meet customers requirements, e.g. deadlines and QoS. Also, the prediction is considered as important advantage when using a pay-as-you-go pricing model. In this paper, we present a gray box modeling methodology for runtime prediction of each individual Apache Spark job in two steps. The first one is building a white box model for predicting the input RDD size of each stage relying on prior knowledge about its behaviour and taking the application parameters into consideration. The second one is extracting a black box runtime model of each task by observing its runtime metrics according to various allocated resources and variant input RDD sizes. The modeling methodology is validated with experimental evaluation on a real-world application, and the results show a high matching accuracy which reached 83-94% of the actual runtime of the tested application.

机译：如今，许多数据中心通过开发可以在具有各种参数的私有云中执行的多个Apache Spark作业来促进数据处理和获取。每个作业可能需要各种影响其执行时间的应用程序参数。应用程序参数的一些示例可以是时空数据处理应用中的所选择的感兴趣区域或复杂事件流处理应用中的事件的时间范围。要准确预测其运行时，应在构建运行时模型期间考虑这些应用程序参数。 Spark工作的运行时间预测允许我们有效地安排它们以利用云资源，提高系统吞吐量，减少工作延迟并满足客户要求，例如，截止日期和QoS。此外，在使用支付支付定价模型时，预测被认为是重要的优势。在本文中，我们为每个单独的Apache Spark作业的运行时间预测提供了一种灰色盒子建模方法。第一个是构建一个白色盒式模型，用于预测每个阶段的输入RDD大小依赖于现有知识，并考虑应用程序参数。第二个是通过根据各种分配的资源和变体输入RDD大小观察其运行时指标来提取每个任务的黑匣子运行时模型。在真实应用程序的实验评估中验证了建模方法，结果显示了高匹配的准确性，达到了测试应用实际运行时间的83-94％。

著录项

来源
《IEEE International Conference on Data Engineering Workshops》|2019年|xxiii 341 p. :|共8页
会议地点
作者
Hani Al-Sayeh; Kai-Uwe Sattler;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类数据处理、数据处理系统;
关键词
cloud computing; cluster computing; data handling; quality of service; scheduling;

机译：云计算;集群计算;数据处理;服务质量;调度;

相似文献

外文文献
中文文献
专利

1. A gray-box modeling methodology for runtime prediction of Apache Spark jobs [J] . Al-Sayeh Hani, Hagedorn Stefan, Sattler Kai-Uwe Distributed and Parallel Databases . 2020,第4期

机译：Apache Spark工作的运行时预测灰度盒建模方法
2. A gray-box performance model for Apache Spark [J] . Zemin Chao, Shengfei Shi, Hong Gao, Future generation computer systems . 2018,第DECa期

机译：Apache Spark的灰盒性能模型
3. Tuning configuration of apache spark on public clouds by combining multi-objective optimization and performance prediction model [J] . Guoli Cheng, Shi Ying, Bingming Wang The Journal of Systems and Software . 2021,第Octa期

机译：通过组合多目标优化和性能预测模型，调整公共云上的Apache Spark的配置
4. Gray Box Modeling Methodology for Runtime Prediction of Apache Spark Jobs [C] . Hani Al-Sayeh, Kai-Uwe Sattler IEEE International Conference on Data Engineering Workshops . 2019

机译：用于Apache Spark作业的运行时预测的灰箱建模方法
5. A methodology for CFD predictions of spark-ignition direct-injection engine conical sprays combining improved physical submodels and system optimization. [D] . Grover, Ronald O. 2005

机译：结合改进的物理子模型和系统优化的火花塞直喷发动机圆锥形喷雾CFD预测方法。
6. Validation of the APACHE IV model and its comparison with the APACHE II SAPS 3 and Korean SAPS 3 models for the prediction of hospital mortality in a Korean surgical intensive care unit [O] . Hannah Lee, Yoon-Jung Shon, Hyerim Kim, 2014

机译：验证APACHE IV模型并将其与APACHE IISAPS 3和Korea SAPS 3模型进行比较以预测韩国外科重症监护病房的医院死亡率
7. A gray-box modeling methodology for runtime prediction of Apache Spark jobs [O] . Hani Al-Sayeh, Stefan Hagedorn, Kai-Uwe Sattler 2020

机译：Apache Spark工作的运行时预测灰度盒建模方法

Gray Box Modeling Methodology for Runtime Prediction of Apache Spark Jobs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅