首页> 外文会议>IEEE International Symposium on Performance Analysis of Systems and Softwar >A Model Driven Approach Towards Improving the Performance of Apache Spark Applications
【24h】

A Model Driven Approach Towards Improving the Performance of Apache Spark Applications

机译:一种模型驱动的方法,旨在提高Apache Spark应用程序的性能

获取原文

摘要

Apache Spark applications often execute in multiple stages where each stage consists of multiple tasks running in parallel. However, prior efforts noted that the execution time of different tasks within a stage can vary significantly for various reasons (e.g., inefficient partition of input data), and tasks can be distributed unevenly across worker nodes for different reasons (e.g., data co-locality). While these problems are well-known, it is nontrivial to predict and address them effectively. In this paper we present an analytical model driven approach that can predict the possibility of such problems by executing an application with a limited amount of input data and recommend ways to address the identified problems by repartitioning input data (in case of task straggler problem) and/or changing the locality configuration setting (in case of skewed task distribution problem). The novelty of our approach lies in automatically predicting the potential problems a priori based on limited execution data and recommending the locality setting and partition number. Our experimental result using 9 Apache Spark applications on two different clusters shows that our model driven approach can predict these problems with high accuracy and improve the performance by up to 71%.
机译:Apache Spark应用程序通常在多个阶段执行,其中每个阶段都由并行运行的多个任务组成。但是,先前的工作注意到,阶段中不同任务的执行时间可能由于各种原因(例如,输入数据的无效分区)而发生显着变化,并且由于不同的原因(例如,数据共地而定),任务可能会在工作节点之间不均匀地分布)。尽管这些问题是众所周知的,但要有效地预测和解决这些问题并非易事。在本文中,我们提出了一种分析模型驱动的方法,该方法可以通过在输入数据量有限的情况下执行应用程序来预测此类问题的可能性,并建议通过对输入数据进行重新分配来解决已识别问题的方法(如果出现任务散乱的问题),并且/或更改位置配置设置(以防任务分配问题)。我们方法的新颖之处在于,基于有限的执行数据自动地自动预测潜在的问题,并推荐位置设置和分区号。我们在两个不同集群上使用9个Apache Spark应用程序的实验结果表明,我们的模型驱动方法可以高精度地预测这些问题,并将性能提高多达71%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号