【24h】

Performance Modeling for Spark Using SVM

机译:使用SVM进行Spark性能建模

获取原文

摘要

At present, Spark is widely used in a number of enterprises. Although Spark is much faster than Hadoop for some applications, its configuration parameters can have a great impact on its performance due to the large number of the parameters, interaction between them, and various characteristics of applications as well. Unfortunately, there is not yet any research conducted to predict the performance of Spark based on its configuration sets.In this paper, we employ a machine learning method-Support Vector Machine(SVM) to build performance models for Spark. The input of configuration sets is collected by running Spark application previously with randomly modified and combined parameter values. In this way, we also determine the range of each property and gain a deeper understanding about how these properties work in Spark. We also use Artificial Neural Network to model the performance of Spark and find that the error rate of ANN is on average 1.98 times that of SVM for three workloads from HiBench.
机译:目前,Spark已在许多企业中广泛使用。尽管对于某些应用程序,Spark比Hadoop快得多,但是由于大量的参数,它们之间的交互以及应用程序的各种特性,Spark的配置参数可能对其性能产生很大的影响。遗憾的是,目前尚未进行任何基于其配置集来预测Spark性能的研究。本文采用一种机器学习方法-支持向量机(SVM)来构建Spark性能模型。通过先前运行带有随机修改和组合参数值的Spark应用程序来收集配置集的输入。通过这种方式,我们还可以确定每个属性的范围,并对这些属性在Spark中的工作方式有更深入的了解。我们还使用人工神经网络对Spark的性能进行建模,发现对于HiBench的三种工作负载,ANN的错误率平均为SVM的1.98倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号