首页> 外文会议>TPC Technology Conference on Performance Evaluation and Benchmarking >Benchmarking Distributed Data Processing Systems for Machine Learning Workloads
【24h】

Benchmarking Distributed Data Processing Systems for Machine Learning Workloads

机译:用于机器学习工作负载的分布式数据处理系统

获取原文

摘要

Distributed data processing systems have been widely adopted to robustly scale out computations on massive data sets to many compute nodes in recent years. These systems are also popular choices to scale out the training of machine learning models. However, there is a lack of benchmarks to assess how efficiently data processing systems actually perform at executing machine learning algorithms at scale. For example, the learning algorithms chosen in the corresponding systems papers tend to be those that fit well onto the system's paradigm rather than state of the art methods. Furthermore, experiments in those papers often neglect important aspects such as addressing all aspects of scalability. In this paper, we share our experience in evaluating novel data processing systems and present a core set of experiments of a benchmark for distributed data processing systems for machine learning workloads, a rationale for their necessity as well as an experimental evaluation.
机译:分布式数据处理系统已被广泛采用以强大地扩展到近年来对许多计算节点的大规模数据集的计算。这些系统也是扩展机器学习模型的培训的流行选择。然而,缺乏基准测试来评估数据处理系统在规模执行机器学习算法时实际执行的有效程度。例如,在相应的系统论文中选择的学习算法往往是那些适合于系统的范例而不是现有技术的方法。此外,这些论文中的实验往往忽视了解决诸如解决可扩展性的所有方面的重要方面。在本文中,我们分享我们在评估新的数据处理系统方面的经验,并为机器学习工作负载的分布式数据处理系统提供基准的核心实验,是他们必要性的理由以及实验评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号