【24h】

Demystifying the MLPerf Training Benchmark Suite

机译:揭秘MLPerf培训基准套件

获取原文

摘要

MLPerf, an emerging machine learning benchmark suite, strives to cover a broad range of machine learning applications. We present a study on the characteristics of MLPerf benchmarks and how they differ from previous deep learning benchmarks such as DAWNBench and DeepBench. MLPerf benchmarks are seen to exhibit moderately high memory transactions per second and moderately high compute rates, while DAWNBench creates a high-compute benchmark with low memory transaction rate, and DeepBench provides low compute rate benchmarks. We also observe that the various MLPerf benchmarks possess unique features that allow unveiling various bottlenecks in systems. We also observe variation in scaling efficiency across the MLPerf models. The variation exhibited by the different models highlight the importance of smart scheduling strategies for multi-GPU training. Another observation is that dedicated low latency interconnect between GPUs in multi-GPU systems is crucial for optimal distributed deep learning training. Furthermore, host CPU utilization increases with an increase in the number of GPUs used for training. Corroborating prior work, we also observe and quantify improvements possible by mixed-precision training using Tensor Cores.
机译:MLPerf是一个新兴的机器学习基准套件,致力于涵盖各种机器学习应用程序。我们对MLPerf基准的特征以及它们与以前的深度学习基准(例如DAWNBench和DeepBench)的区别进行了研究。可以看到MLPerf基准测试表现出每秒较高的内存事务处理和较高的计算速率,而DAWNBench创建了具有较低内存事务处理率的高性能计算基准,而DeepBench提供了较低的计算速率基准。我们还观察到,各种MLPerf基准测试具有独特的功能,可以揭示系统中的各种瓶颈。我们还观察到了MLPerf模型中缩放效率的变化。不同模型展示的差异突出显示了智能调度策略对多GPU训练的重要性。另一个观察结果是,多GPU系统中GPU之间专用的低延迟互连对于优化分布式深度学习培训至关重要。此外,随着用于训练的GPU数量的增加,主机CPU利用率也随之提高。为了证实先前的工作,我们还观察和量化了使用Tensor Cores进行的混合精度培训可能带来的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号