首页> 外文期刊>Operating systems review >Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark
【24h】

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

机译:黎明分析,一种准确的机器学习性能基准

获取原文
获取原文并翻译 | 示例
           

摘要

Researchers have proposed hardware, software, and algorithmic optimizations to improve the computational performance of deep learning. While some of these optimizations perform the same operations faster (e.g., increasing GPU clock speed), many others modify the semantics of the training procedure (e.g., reduced precision), and can impact the final model's accuracy on unseen data. Due to a lack of standard evaluation criteria that considers these trade-offs, it is difficult to directly compare these optimizations. To address this problem, we recently introduced DAWNBENCH, a benchmark competition focused on end-to-end training time to achieve near-state-of-the-art accuracy on an unseen dataset-a combined metric called time-to-accuracy (TTA). In this work, we analyze the entries from DAWNBench, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries. We show that TTA has a low coefficient of variation and that models optimized for TTA generalize nearly as well as those trained using standard methods. Additionally, even though DAWNBench entries were able to train ImageNet models in under 3 minutes, we find they still underutilize hardware capabilities such as Tensor Cores. Furthermore, we find that distributed entries can spend more than half of their time on communication. We show similar findings with entries to the MLPerf v0.5 benchmark.
机译:研究人员提出了硬件,软件和算法优化,以提高深度学习的计算性能。虽然这些优化中的一些速度更快地执行相同的操作(例如,增加GPU时钟速度),但是许多其他优化的操作将修改培训过程的语义(例如,降低精度),并且可以影响最终模型对未经看涨数据的准确性。由于缺乏考虑这些权衡的标准评估标准,很难直接比较这些优化。为了解决这个问题,我们最近推出了Dawnbench,一个基准竞争专注于端到端的培训时间来实现看不见的数据集 - 一个名为Time-timacy的组合度量的近最新的准确性(TTA )。在这项工作中,我们分析了从多个工业组收到优化的提交的黎明封的条目,以调查TTA作为指标的行为以及最佳表现的趋势。我们表明TTA具有较低的变化系数,并且针对TTA优化的模型概括几乎和使用标准方法训练的模型。此外,即使Dawnbench条目能够在3分钟内培训Imagenet模型,我们发现它们仍然不利于张量核心等硬件功能。此外,我们发现分布式条目可以花费超过一半的通信时间。我们与MLPerf V0.5基准显示出类似的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号