首页> 外文期刊>Big Data, IEEE Transactions on >Deadline-Aware Cost Optimization for Spark
【24h】

Deadline-Aware Cost Optimization for Spark

机译:截止日期意识到火花的成本优化

获取原文
获取原文并翻译 | 示例

摘要

We present OptEx, a closed-form model of job execution on Apache Spark, a popular parallel processing engine. To the best of our knowledge, OptEx is the first work that analytically models job completion time on Spark. The model can be used to estimate the completion time of a given Spark job on a cloud, with respect to the size of the input dataset, the number of iterations, and the number of nodes comprising the underlying cluster. Experimental results demonstrate that OptEx yields a mean relative error of 6 percent in estimating the job completion time. Furthermore, the model can be applied for estimating the cost-optimal cluster composition for running a given Spark job on a cloud under a completion deadline specified in the SLO (i.e., Service Level Objective). We show experimentally that OptEx is able to correctly estimate the required cluster composition for running a given Spark job under a given SLO deadline with an accuracy of 98 percent. We also provide a tool which can classify Spark jobs into job categories based on bisimilarity analysis on lineage graphs collected from the given jobs.
机译:我们在Apache Spark上呈现Optex,一个封闭的工作模型,一个流行的并行处理引擎。据我们所知,Optex是第一项工作,分析了火花上的工作完成时间。该模型可用于估计云上给定的火花作业的完成时间,相对于输入数据集的大小,迭代的数量和包括底层簇的节点的数量。实验结果表明,OPTEX在估计工作完成时间时,例如在6%的平均相对误差。此外,可以应用该模型来估计用于在SLO(即服务级别目标)中指定的完成截止日期下在云上运行给定的火花作业的成本最佳集群组合。我们通过实验显示Optex能够正确地估计在给定的SLO截止日下运行给定的火花作业的所需集群组成,精度为98%。我们还提供了一种工具,可以根据从给定作业收集的谱系图的双模图分析将Spark作业分类为工作类别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号