【24h】

Optimizing Multiple Machine Learning Jobs on MapReduce

机译:在MapReduce上优化多个机器学习作业

获取原文
获取原文并翻译 | 示例

摘要

Recently, MapReduce has been used to parallelize machine learning algorithms. To obtain the best performance for these algorithms, tuning the parameters of the algorithms is required. However, this is time consuming because it requires executing a MapReduce program multiple times using various parameters. Such multiple executions can be assigned to a cluster in various ways, and the execution time varies depending on the assignments. To achieve the shortest execution time, we propose a method for optimizing the assignment of MapReduce jobs to a cluster assuming machine learning targeted runtime. We developed an execution cost model to predict the total execution time of jobs and obtained the optimal assignment by minimizing the cost model. To evaluate the proposed method, we implemented an experimental MapReduce runtime based on Message Passing Interface and executed logistic regression in four cases. The results showed that the proposed method can correctly predict the optimal job assignment. We also confirmed that the optimal assignment reduced execution time by a maximum 77% compared to the worst assignment.
机译:最近,MapReduce已用于并行化机器学习算法。为了获得这些算法的最佳性能,需要调整算法的参数。但是,这很耗时,因为它需要使用各种参数多次执行MapReduce程序。可以通过多种方式将这样的多次执行分配给集群,并且执行时间取决于分配。为了实现最短的执行时间,我们提出了一种以机器学习为目标的运行时,优化MapReduce作业到集群的分配的方法。我们开发了一个执行成本模型来预测作业的总执行时间,并通过最小化成本模型来获得最佳分配。为了评估该方法,我们基于Message Passing Interface实现了一个实验性MapReduce运行时,并在四种情况下执行了Logistic回归。结果表明,所提出的方法可以正确预测最优的工作分配。我们还确认,与最差的分配相比,最佳分配将执行时间最多减少了77%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号