首页> 外文会议>International conference on very large data bases >Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML
【24h】

Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML

机译:SystemML中大规模机器学习的混合并行化策略

获取原文

摘要

SystemML aims at declarative, large-scale machine learning (ML) on top of MapReduce, where high-level ML scripts with R-like syntax are compiled to programs of MR jobs. The declarative specification of ML algorithms enables-in contrast to existing large-scale machine learning libraries-automatic optimization. SystemML's primary focus is on data parallelism but many ML algorithms inherently exhibit opportunities for task parallelism as well. A major challenge is how to efficiently combine both types of parallelism for arbitrary ML scripts and workloads. In this paper, we present a systematic approach for combining task and data parallelism for large-scale machine learning on top of MapReduce. We employ a generic Parallel FOR construct (ParFOR) as known from high performance computing (HPC). Our core contributions are (1) complementary parallelization strategies for exploiting multi-core and cluster parallelism, as well as (2) a novel cost-based optimization framework for automatically creating optimal parallel execution plans. Experiments on a variety of use cases showed that this achieves both efficiency and scalability due to automatic adaptation to ad-hoc workloads and unknown data characteristics.
机译:SystemML旨在在MapReduce之上进行声明性的大规模机器学习(ML),其中将具有R类语法的高级ML脚本编译为MR作业程序。与现有的大型机器学习库相比,ML算法的声明性规范可以实现自动优化。 SystemML的主要重点是数据并行性,但是许多ML算法固有地也为任务并行性提供了机会。一个主要的挑战是如何针对任意ML脚本和工作负载有效地组合两种并行性。在本文中,我们提出了一种在MapReduce之上将任务和数据并行性相结合以进行大规模机器学习的系统方法。我们采用了从高性能计算(HPC)已知的通用Parallel FOR构造(ParFOR)。我们的核心贡献是(1)利用多核和集群并行性的互补并行化策略,以及(2)一种新颖的基于成本的优化框架,用于自动创建最佳并行执行计划。在各种用例上进行的实验表明,由于可以自动适应临时工作负载和未知数据特征,因此可以同时实现效率和可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号