Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML

机译：SystemML中大规模机器学习的混合并行化策略

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

SystemML aims at declarative, large-scale machine learning (ML) on top of MapReduce, where high-level ML scripts with R-like syntax are compiled to programs of MR jobs. The declarative specification of ML algorithms enables-in contrast to existing large-scale machine learning libraries-automatic optimization. SystemML's primary focus is on data parallelism but many ML algorithms inherently exhibit opportunities for task parallelism as well. A major challenge is how to efficiently combine both types of parallelism for arbitrary ML scripts and workloads. In this paper, we present a systematic approach for combining task and data parallelism for large-scale machine learning on top of MapReduce. We employ a generic Parallel FOR construct (ParFOR) as known from high performance computing (HPC). Our core contributions are (1) complementary parallelization strategies for exploiting multi-core and cluster parallelism, as well as (2) a novel cost-based optimization framework for automatically creating optimal parallel execution plans. Experiments on a variety of use cases showed that this achieves both efficiency and scalability due to automatic adaptation to ad-hoc workloads and unknown data characteristics.

机译：SystemML旨在在MapReduce之上进行声明性的大规模机器学习（ML），其中将具有R类语法的高级ML脚本编译为MR作业程序。与现有的大型机器学习库相比，ML算法的声明性规范可以实现自动优化。 SystemML的主要重点是数据并行性，但是许多ML算法固有地也为任务并行性提供了机会。一个主要的挑战是如何针对任意ML脚本和工作负载有效地组合两种并行性。在本文中，我们提出了一种在MapReduce之上将任务和数据并行性相结合以进行大规模机器学习的系统方法。我们采用了从高性能计算（HPC）已知的通用Parallel FOR构造（ParFOR）。我们的核心贡献是（1）利用多核和集群并行性的互补并行化策略，以及（2）一种新颖的基于成本的优化框架，用于自动创建最佳并行执行计划。在各种用例上进行的实验表明，由于可以自动适应临时工作负载和未知数据特征，因此可以同时实现效率和可伸缩性。

著录项

来源
《International conference on very large data bases》|2014年|553-564|共12页
会议地点
作者
Matthias Boehm; Shirish Tatikonda; Berthold Reinwald; Prithviraj Sen; Yuanyuan Tian; Douglas R. Burdick; Shivakumar Vaithyanathan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Using machine learning in a cooperative hybrid parallel strategy of metaheuristics [J] . J.M. Cadenas, M.C. Garrido, E. Munoz Information Sciences: An International Journal . 2009,第19期

机译：在协同启发式元启发式混合并行策略中使用机器学习
2. DMP-ELMs: Data and model parallel extreme learning machines for large-scale learning tasks [J] . Ming Yuewei, Zhu En, Wang Mao, Neurocomputing . 2018,第DECa3期

机译：DMP-ELM：用于大规模学习任务的数据和模型并行极限学习机
3. Large-Scale Crop Mapping Based on Machine Learning and Parallel Computation with Grids [J] . Ning Yang, Diyou Liu, Quanlong Feng, Remote Sensing . 2019,第12期

机译：基于机器学习和网格并行计算的大规模作物制图
4. Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML [C] . Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, International conference on very large data bases . 2014

机译：混合并行化策略在Systemml中大规模机器学习
5. Scalable Parallelization Strategy for Large-Scale Deep Learning [D] . Lee, Sunwoo. 2020

机译：大规模深度学习的可扩展并行化策略
6. Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction [O] . Matthew C. Robinson, Robert C. Glen, Alpha A. Lee -1

机译：验证有效性：重新分析深度学习和机器学习模型的大规模比较以预测生物活性
7. Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML [O] . Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, 2014

机译：SystemML中大规模机器学习的混合并行化策略

Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML

摘要

著录项

相似文献

相关主题

期刊订阅