Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences

机译：缩放现代HPC集群的单图像超分辨率培训：早期经验

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep Learning (DL) models for super-resolution (DLSR) are an emerging trend in response to the growth of ML/DL applications requiring high-resolution images. DLSR methods have also shown promise in domains such as medical imaging, surveillance, and microscopy. However, DLSR models are extremely computationally demanding, and require unreasonably long training times on modern Volta GPUs. In our experiments, we observed only 10.3 images/second on a single Volta GPU for training EDSR, a state-of-the-art DLSR model for single-image super-resolution. In comparison, a Volta GPU can process 360 images/second while training ResNet-50, a state-of-the-art model for image classification. Therefore, we believe supercomputers provide a good candidate to speed up DLSR model training. In this paper, we select EDSR as the representative DLSR PyTorch model. Further, we introduce Horovod-based distributed EDSR training. However, we observed poor default EDSR scaling performance on the Lassen HPC system at Lawrence Livermore National Laboratory. To investigate the performance degradations, we perform exhaustive communication profiling. These profiling insights are then used to optimize CUDA-Aware MPI for DLSR models by ensuring advanced MPI designs involving CUDA IPC and registration caching are properly applied by DL frameworks. We present a comprehensive scaling study of EDSR with MVAPICH2-GDR and NCCL up to 512 GPUs on Lassen. We demonstrate an improvement in scaling efficiency by 15.6% over default Horovod training, which translates to a 1.26× speedup in training performance.

机译：超分辨率（DLSR）的深度学习（DL）模型是响应需要高分辨率图像的ML / DL应用的生长的新兴趋势。 DLSR方法还在域中显示了诸如医学成像，监测和显微镜的域中的承诺。然而，DLSR模型非常苛刻，并且在现代Volta GPU上需要不合理的长期培训时间。在我们的实验中，我们在单个Volta GPU上仅观察到了10.3个图像/秒，用于训练EDSR，是单图像超分辨率的最先进的DLSR模型。相比之下，Volta GPU可以在训练Reset-50的同时处理360图像/第二，用于图像分类的最先进的模型。因此，我们认为超级计算机提供了良好的候选人，以加快DLSR模型培训。在本文中，我们选择EDSR作为代表性DLSR Pytorch模型。此外，我们介绍了基于Horovod的分布式EDSR培训。但是，我们在Lawrence Livermore国家实验室观察到Lassen HPC系统上的违约违约表现差。为了调查性能下降，我们执行详尽的沟通分析。然后，通过确保涉及CUDA IPC的先进MPI设计，通过确保CUDA IPC的先进MPI设计来优化DLSR模型的CUDA感知MPI，并通过DL框架适当地应用登记缓存。我们在Lassen展示了MVAPICH2-GDR和NCCL的全面缩放研究，最高可达512个GPU。我们通过默认HOROVOD培训展示了15.6％的缩放效率的提高，转化为1.26倍的训练性能。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium Workshops》|2021年|923-932|共10页
会议地点
作者
Quentin Anthony; Lang Xu; Hari Subramoni; Dhabaleswar K. DK Panda;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Deep learning; Computational modeling; Surveillance; Microscopy; Superresolution; Graphics processing units;

机译：培训;深入学习;计算建模;监视;显微镜;超级化;图形处理单元;

相似文献

外文文献
中文文献
专利

1. Training-Free, Single-Image Super-Resolution Using a Dynamic Convolutional Network [J] . Aritra Bhowmik, Suprosanna Shit, Chandra Sekhar Seelamantula IEEE signal processing letters . 2018,第1期

机译：使用动态卷积网络的无训练单图像超分辨率
2. Single-Image Super-Resolution Reconstruction via Learned Geometric Dictionaries and Clustered Sparse Coding [J] . Yang S., Wang M., Chen Y., Image Processing, IEEE Transactions on . 2012,第9期

机译：通过学习的几何词典和聚类稀疏编码进行单图像超分辨率重建
3. Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters [J] . Shankar Dipti, Lu Xiaoyi, Wasi-ur-Rahman Md., Journal of supercomputing . 2016,第12期

机译：在现代HPC集群上表征独立的Hadoop MapReduce并对其进行基准测试
4. Single-image super-resolution using clustering-based global regression and propagation filtering [C] . Wenming Yang, Yapeng Tian, Fei Zhou, Third IAPR Asian Conference on Pattern Recognition . 2015

机译：使用基于聚类的全局回归和传播过滤的单图像超分辨率
5. Example-Based Single-Image Super-Resolution [D] . Yang, Chih-Yuan 2015

机译：基于示例的单图像超分辨率
6. Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction [O] . Daniele Ravì, Agnieszka Barbara Szczotka, Dzhoshkun Ismail Shakir, -1

机译：利用基于视频注册的重建技术对内窥镜单图像超分辨率进行有效的深度学习培训
7. Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction [O] . Daniele Ravì, Agnieszka Barbara Szczotka, Dzhoshkun Ismail Shakir, 2018

机译：基于视频注册视频注册的单图像超分辨率的单图像超分辨率有效深度学习培训

Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences

摘要

著录项

相似文献

相关主题

期刊订阅