首页> 外文期刊>Future generation computer systems >A gray-box performance model for Apache Spark
【24h】

A gray-box performance model for Apache Spark

机译:Apache Spark的灰盒性能模型

获取原文
获取原文并翻译 | 示例
       

摘要

Apache Spark is a powerful open source data processing platform. It is getting more and more popular with the growing need of processing massive amounts of data. A performance prediction model not only helps administrators to have a better understanding of system behavior, but also is useful in performance tuning. However, considering the complex application processing mechanism of Spark, it is not an easy job to model the relationship between system performance and configuration settings.In this paper, we present a gray-box performance model for Spark applications based on machine learning algorithms. Given a specific Spark application, the size of its input data and some key system parameters, this performance model is able to forecast its execution time according to history information. To achieve better accuracy, our model takes basic hardware information and the resource allocation strategy of Spark into consideration.In our experiments, result shows our gray-box model is better than typical black-box approaches in most of the cases. We consider this model is helpful for further researches on Apache Spark.
机译:Apache Spark是一个功能强大的开源数据处理平台。随着对海量数据处理需求的增长,它变得越来越受欢迎。性能预测模型不仅可以帮助管理员更好地了解系统行为,而且对性能调整很有用。但是,考虑到Spark复杂的应用程序处理机制,对系统性能和配置设置之间的关系进行建模并非易事。本文中,我们提出了一种基于机器学习算法的Spark应用程序灰盒性能模型。给定特定的Spark应用程序,其输入数据的大小和一些关键系统参数,该性能模型能够根据历史信息预测其执行时间。为了获得更好的精度,我们的模型考虑了基本的硬件信息和Spark的资源分配策略。在我们的实验中,结果表明,在大多数情况下,我们的灰盒模型优于典型的黑盒方法。我们认为该模型有助于对Apache Spark进行进一步的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号