首页> 外文会议>IEEE International Conference on Data Engineering Workshops >Gray Box Modeling Methodology for Runtime Prediction of Apache Spark Jobs
【24h】

Gray Box Modeling Methodology for Runtime Prediction of Apache Spark Jobs

机译:Apache Spark工作的运行时预测灰色盒子建模方法

获取原文

摘要

Nowadays, many data centers facilitate data processing and acquisition by developing multiple Apache Spark jobs which can be executed in private clouds with various parameters. Each job might take various application parameters which influence its execution time. Some examples of application parameters can be a selected area of interest in spatiotemporal data processing application or a time range of events in a complex event stream processing application. To predict its runtime accurately, these application parameters shall be considered during constructing its runtime model. Runtime prediction of Spark jobs allows us to schedule them efficiently in order to utilize cloud resources, increase system throughput, reduce job latency and meet customers requirements, e.g. deadlines and QoS. Also, the prediction is considered as important advantage when using a pay-as-you-go pricing model. In this paper, we present a gray box modeling methodology for runtime prediction of each individual Apache Spark job in two steps. The first one is building a white box model for predicting the input RDD size of each stage relying on prior knowledge about its behaviour and taking the application parameters into consideration. The second one is extracting a black box runtime model of each task by observing its runtime metrics according to various allocated resources and variant input RDD sizes. The modeling methodology is validated with experimental evaluation on a real-world application, and the results show a high matching accuracy which reached 83-94% of the actual runtime of the tested application.
机译:如今,许多数据中心通过开发可以在具有各种参数的私有云中执行的多个Apache Spark作业来促进数据处理和获取。每个作业可能需要各种影响其执行时间的应用程序参数。应用程序参数的一些示例可以是时空数据处理应用中的所选择的感兴趣区域或复杂事件流处理应用中的事件的时间范围。要准确预测其运行时,应在构建运行时模型期间考虑这些应用程序参数。 Spark工作的运行时间预测允许我们有效地安排它们以利用云资源,提高系统吞吐量,减少工作延迟并满足客户要求,例如,截止日期和QoS。此外,在使用支付支付定价模型时,预测被认为是重要的优势。在本文中,我们为每个单独的Apache Spark作业的运行时间预测提供了一种灰色盒子建模方法。第一个是构建一个白色盒式模型,用于预测每个阶段的输入RDD大小依赖于现有知识,并考虑应用程序参数。第二个是通过根据各种分配的资源和变体输入RDD大小观察其运行时指标来提取每个任务的黑匣子运行时模型。在真实应用程序的实验评估中验证了建模方法,结果显示了高匹配的准确性,达到了测试应用实际运行时间的83-94%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号