Benchmarking Distributed Data Processing Systems for Machine Learning Workloads

机译：用于机器学习工作负载的分布式数据处理系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Distributed data processing systems have been widely adopted to robustly scale out computations on massive data sets to many compute nodes in recent years. These systems are also popular choices to scale out the training of machine learning models. However, there is a lack of benchmarks to assess how efficiently data processing systems actually perform at executing machine learning algorithms at scale. For example, the learning algorithms chosen in the corresponding systems papers tend to be those that fit well onto the system's paradigm rather than state of the art methods. Furthermore, experiments in those papers often neglect important aspects such as addressing all aspects of scalability. In this paper, we share our experience in evaluating novel data processing systems and present a core set of experiments of a benchmark for distributed data processing systems for machine learning workloads, a rationale for their necessity as well as an experimental evaluation.

机译：分布式数据处理系统已被广泛采用以强大地扩展到近年来对许多计算节点的大规模数据集的计算。这些系统也是扩展机器学习模型的培训的流行选择。然而，缺乏基准测试来评估数据处理系统在规模执行机器学习算法时实际执行的有效程度。例如，在相应的系统论文中选择的学习算法往往是那些适合于系统的范例而不是现有技术的方法。此外，这些论文中的实验往往忽视了解决诸如解决可扩展性的所有方面的重要方面。在本文中，我们分享我们在评估新的数据处理系统方面的经验，并为机器学习工作负载的分布式数据处理系统提供基准的核心实验，是他们必要性的理由以及实验评估。

著录项

来源
《TPC Technology Conference on Performance Evaluation and Benchmarking》|2019年|154p|共16页
会议地点
作者
Christoph Boden; Tilmann Rabl; Sebastian Schelter; Volker Markl;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. 基于多卫星分布式数据处理系统的高分三号卫星数据实时处理方法 [J] . 杨军, 曹筵东, 孙光才, 中南大学学报（英文版） . 2020,第003期
2. SGW-SCN: An integrated machine learning approach for workload forecasting in geo-distributed cloud data centers [J] . Bi Jing, Yuan Haitao, Zhang Libo, Information Sciences: An International Journal . 2019,第期

机译：SGW-SCN：地理分布式云数据中心工作量预测的集成机器学习方法
3. Benchmarking transaction and analytical processing systems: the creation of a mixed workload benchmark and its application [J] . Rinki Sharma Computing reviews . 2014,第2期

机译：基准交易和分析处理系统：创建混合工作负荷基准及其应用
4. Fujitsu, AIST, and RIKEN Achieve Unparalleled Speed on the MLPerf HPC Machine Learning Processing Benchmark Leveraging Leading Japanese Supercomputer Systems [J] . Japan Telecom . 2020,第9期

机译：Fujitsu，Aist和Riken在Mlperf HPC机器学习加工基准下实现了无与伦比的速度，利用了日本超级计算机系统
5. Benchmarking Distributed Data Processing Systems for Machine Learning Workloads [C] . Christoph Boden, Tilmann Rabl, Sebastian Schelter, TPC Technology Conference on Performance Evaluation and Benchmarking . 2019

机译：用于机器学习工作负载的分布式数据处理系统
6. Machine Learning Systems for Highly-distributed and Rapidly-growing Data [D] . Hsieh, Kevin . 2019

机译：用于高度分布式和快速增长数据的机器学习系统
7. Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care [O] . Fadila Zerka, Samir Barakat, Sean Walsh, -1

机译：从联邦医疗保健数据库中保护隐私的分布式机器学习的系统综述
8. SWORD: SCALABLE AND FLEXIBLE WORKLOAD GENERATOR FOR DISTRIBUTED DATA PROCESSING SYSTEMS [O] . L. F. Perrone, F. P. Wiel, J. Liu, 2014

机译：sWORD：分布式数据处理系统的可扩展且灵活的工作负载生成器

Benchmarking Distributed Data Processing Systems for Machine Learning Workloads

摘要

著录项

相似文献

相关主题

期刊订阅