GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training

机译：GEMS：用于分布式DNN培训的支持GPU的内存感知模型 - 并行系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data-parallelism has become an established paradigm to train DNNs that fit inside GPU memory on large-scale HPC systems. However, model-parallelism is required to train out-of-core DNNs. In this paper, we deal with emerging requirements brought forward by very large DNNs being trained using high-resolution images common in digital pathology. To address these, we propose, design, and implement GEMS; a GPU-Enabled Memory-Aware Model-Parallelism System. We present several design schemes like GEMS-MAST, GEMS-MASTER, and GEMS-Hybrid that offer excellent speedups over state-of-the-art systems like Mesh-TensorFlow and FlexFlow. Furthermore, we combine model-parallelism and data-parallelism to train a 1000-1ayer ResNet-lk model using 1,024 Volta V100 GPUs with 97.32% scaling-efficiency. For the real-world histopathology whole-slide-image (WSI) of 100,000 x 100,000 pixels, we train custom ResNet-110-v2 on image tiles of size 1024 x 1024 and reduce the training time from seven hours to 28 minutes.

机译：数据并行性已成为培训大规模HPC系统内GPU内存的DNN的既定范例。但是，需要模型平行性来培训核心的DNN。在本文中，我们处理在数字病理学中常见的高分辨率图像培训的非常大的DNN推出的新兴需求。要解决这些，我们建议，设计和实施宝石;支持GPU的内存感知模型并行系统。我们提供了Gems-Mast，Gems-Master和Gems-Hybrid等几种设计方案，这些方案提供了优异的快速，如网格 - Tensorflow和Flexflow等最先进的系统。此外，我们将模型 - 并行性和数据并行性结合起来使用1,024 Volta V100 GPU培训1000-1AYER Reset-LK模型，缩放效率为97.32％。对于现实世界的组织病理学全幻灯片（WSI）为100,000 x 100,000像素，我们在尺寸1024 x 1024的图像瓷砖上训练自定义Reset-110-V2，并将培训时间从7小时减少到28分钟。

著录项

来源
《International Conference for High Performance Computing, Networking, Storage and Analysis》|2020年|1-15|共15页
会议地点
作者
Arpan Jain; Ammar Ahmad Awan; Asmaa M. Aljuhani; Jahanzeb Maqbool Hashmi; Quentin G. Anthony; Hari Subramoni; Dhableswar K. Panda; Raghu Machiraju; Anil Parwani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Histopathology; Computational modeling; High performance computing; Graphics processing units; Distributed databases; Data models;

机译：培训;组织病理学;计算建模;高性能计算;图形处理单元;分布式数据库;数据模型;

相似文献

外文文献
中文文献
专利

1. Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization [J] . Shirahata Koichi, Haderbache Amir, Fukumoto Naoto, IEICE Transactions on Electronics . 2021,第6期

机译：缓解同步分布式DNN训练的初步性能分析
2. DNN-state identification of 2D distributed parameter systems [J] . I. Chairez, R. Fuentes, A. Poznyak, International journal of systems science . 2012,第1a3期

机译：二维分布式参数系统的DNN状态识别
3. Accelerating DNN Training in Wireless Federated Edge Learning Systems [J] . Jinke Ren, Guanding Yu, Guangyao Ding IEEE Journal on Selected Areas in Communications . 2021,第1期

机译：加速无线联合边缘学习系统中的DNN培训
4. Accelerating training of DNN in distributed machine learning system with shared memory [C] . Eun-Ji Lim, Shin-Young Ahn, Wan Choi International Conference on Information and Communication Technology Convergence . 2017

机译：具有共享内存的分布式机器学习系统中的DNN加速训练
5. Co-Designing Communication Middleware and Deep Learning Frameworks for High-Performance Dnn Training on Hpc Systems [D] . Awan, Ammar Ahmad. 2020

机译：共同设计通信中间件和HPC系统高性能DNN培训的深度学习框架
6. DNN-MVL: DNN-Multi-View-Learning-Based Recover Block Missing Data in a Dam Safety Monitoring System [O] . Yingchi Mao, Jianhua Zhang, Hai Qi, 2019

机译：DNN-MVL：大坝安全监控系统中基于DNN-多视图学习的恢复块丢失数据
7. TAPP: DNN Training for Task Allocation through Pipeline Parallelism Based on Distributed Deep Reinforcement Learning [O] . Yingchi Mao, Zijian Tu, Fagang Xi, 2021

机译：TAPP：通过基于分布式深度增强学习的管道并行性任务分配DNN培训

GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training

摘要

著录项

相似文献

相关主题

期刊订阅