首页> 外文期刊>Journal of Parallel and Distributed Computing >Efficient Performance Prediction for Apache Spark
【24h】

Efficient Performance Prediction for Apache Spark

机译:高效的高效性能预测Apache Spark

获取原文
获取原文并翻译 | 示例

摘要

Spark is a more efficient distributed big data processing framework following Hadoop. It provides users with more than 180 adjustable configuration parameters, and how to choose the optimal configuration automatically to make the Spark application run effectively is challenging. The key to address the above challenge is having the ability to predict the performance of Spark applications in different configurations. This paper proposes a new approach based on Adaboost, which can efficiently and accurately predict the performance of a given application with a given Spark configuration. In our approach, Adaboost is used to build a set of performance models at the stage-level for Spark. To minimize the overhead of the modeling, we use the classic projective sampling, a data mining technique that allows us to collect as few training samples as possible while meeting the accuracy requirements. We evaluate the proposed approach on six typical Spark benchmarks with five input datasets. The experimental results show that our approach is less than the previously proposed approach in prediction error and cost.
机译:Spark是Hadoop之后更有效的分布式大数据处理框架。它为用户提供了超过180个可调配置参数,以及如何自动选择最佳配置以使Spark应用程序有效运行是具有挑战性的。解决上述挑战的关键是能够预测不同配置的火花应用的性能。本文提出了一种基于Adaboost的新方法,其可以有效准确地预测给定的火花配置的给定应用的性能。在我们的方法中,adaboost用于在Spark的舞台级别构建一组性能模型。为了最大限度地减少建模的开销,我们使用经典的投影采样,一种数据挖掘技术,允许我们尽可能少的培训样本,同时满足准确性要求。我们在具有五个输入数据集的六个典型火花基准测试中评估所提出的方法。实验结果表明,我们的方法小于先前提出的预测误差和成本的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号