首页> 外文期刊>Operating systems review >Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark
【24h】

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

机译:分析DAWNBench,这是一种达到时间精度的机器学习性能基准

获取原文
获取原文并翻译 | 示例
           

摘要

Researchers have proposed hardware, software, and algorithmic optimizations to improve the computational performance of deep learning. While some of these optimizations perform the same operations faster (e.g., increasing GPU clock speed), many others modify the semantics of the training procedure (e.g., reduced precision), and can impact the final model's accuracy on unseen data. Due to a lack of standard evaluation criteria that considers these trade-offs, it is difficult to directly compare these optimizations. To address this problem, we recently introduced DAWNBENCH, a benchmark competition focused on end-to-end training time to achieve near-state-of-the-art accuracy on an unseen dataset-a combined metric called time-to-accuracy (TTA). In this work, we analyze the entries from DAWNBench, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries. We show that TTA has a low coefficient of variation and that models optimized for TTA generalize nearly as well as those trained using standard methods. Additionally, even though DAWNBench entries were able to train ImageNet models in under 3 minutes, we find they still underutilize hardware capabilities such as Tensor Cores. Furthermore, we find that distributed entries can spend more than half of their time on communication. We show similar findings with entries to the MLPerf v0.5 benchmark.
机译:研究人员提出了硬件,软件和算法优化,以提高深度学习的计算性能。这些优化中的某些优化可以更快地执行相同的操作(例如,提高GPU时钟速度),而其他许多优化则可以修改训练过程的语义(例如,降低精度),并且会影响最终模型对看不见的数据的准确性。由于缺乏考虑这些折衷的标准评估标准,因此很难直接比较这些优化。为了解决这个问题,我们最近推出了DAWNBENCH,这是一项基准竞赛,其重点是端到端训练时间,以在看不见的数据集上实现近乎最新的准确性-一种称为“精确时间(TTA)”的组合指标)。在这项工作中,我们分析了DAWNBench的条目,DAWNBench收到了来自多个工业集团的优化意见书,以调查TTA的行为作为一项指标以及表现最佳的条目的趋势。我们显示TTA的变异系数很低,针对TTA优化的模型几乎可以概括为使用标准方法训练的模型。此外,即使DAWNBench条目能够在3分钟内训练ImageNet模型,我们仍然发现它们仍未充分利用Tensor Cores等硬件功能。此外,我们发现分布式条目可以将其一半以上的时间花费在通信上。我们通过MLPerf v0.5基准测试的条目显示了类似的发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号