首页> 外文会议>IEEE International Symposium on Performance Analysis of Systems and Software >A Model Driven Approach Towards Improving the Performance of Apache Spark Applications
【24h】

A Model Driven Approach Towards Improving the Performance of Apache Spark Applications

机译:提高Apache Spark应用程序性能的模型驱动方法

获取原文

摘要

Apache Spark applications often execute in multiple stages where each stage consists of multiple tasks running in parallel. However, prior efforts noted that the execution time of different tasks within a stage can vary significantly for various reasons (e.g., inefficient partition of input data), and tasks can be distributed unevenly across worker nodes for different reasons (e.g., data co-locality). While these problems are well-known, it is nontrivial to predict and address them effectively. In this paper we present an analytical model driven approach that can predict the possibility of such problems by executing an application with a limited amount of input data and recommend ways to address the identified problems by repartitioning input data (in case of task straggler problem) and/or changing the locality configuration setting (in case of skewed task distribution problem). The novelty of our approach lies in automatically predicting the potential problems a priori based on limited execution data and recommending the locality setting and partition number. Our experimental result using 9 Apache Spark applications on two different clusters shows that our model driven approach can predict these problems with high accuracy and improve the performance by up to 71%.
机译:Apache Spark应用程序经常在多个阶段执行,其中每个阶段由多个并行运行的任务组成。然而,事先努力注意到,由于各种原因(例如,输入数据的低效分区),阶段内不同任务的执行时间可以显着变化,并且可以出于不同的原因(例如,数据共同占地性)在工作节点上不均匀地分发任务)。虽然这些问题是众所周知的,但是预测并有效地解决它们是不动的。在本文中,我们介绍了一种分析模型驱动方法,可以通过以有限量的输入数据执行应用程序来预测这些问题的可能性,并通过重新分区输入数据(在任务级别问题的情况下,推荐用于解决所识别的问题的方法/或更改位置配置设置(在偏斜任务分发问题的情况下)。我们的方法的新颖性在于,基于有限的执行数据自动预测潜在问题,并推荐局部设置和分区号。我们在两个不同的集群上使用9 Apache Spark应用的实验结果表明,我们的模型驱动方法可以高精度地预测这些问题,并将性能提高至多71%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号