首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training
【24h】

GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training

机译:GEMS:用于分布式DNN培训的支持GPU的内存感知模型 - 并行系统

获取原文

摘要

Data-parallelism has become an established paradigm to train DNNs that fit inside GPU memory on large-scale HPC systems. However, model-parallelism is required to train out-of-core DNNs. In this paper, we deal with emerging requirements brought forward by very large DNNs being trained using high-resolution images common in digital pathology. To address these, we propose, design, and implement GEMS; a GPU-Enabled Memory-Aware Model-Parallelism System. We present several design schemes like GEMS-MAST, GEMS-MASTER, and GEMS-Hybrid that offer excellent speedups over state-of-the-art systems like Mesh-TensorFlow and FlexFlow. Furthermore, we combine model-parallelism and data-parallelism to train a 1000-1ayer ResNet-lk model using 1,024 Volta V100 GPUs with 97.32% scaling-efficiency. For the real-world histopathology whole-slide-image (WSI) of 100,000 x 100,000 pixels, we train custom ResNet-110-v2 on image tiles of size 1024 x 1024 and reduce the training time from seven hours to 28 minutes.
机译:数据并行性已成为培训大规模HPC系统内GPU内存的DNN的既定范例。但是,需要模型平行性来培训核心的DNN。在本文中,我们处理在数字病理学中常见的高分辨率图像培训的非常大的DNN推出的新兴需求。要解决这些,我们建议,设计和实施宝石;支持GPU的内存感知模型并行系统。我们提供了Gems-Mast,Gems-Master和Gems-Hybrid等几种设计方案,这些方案提供了优异的快速,如网格 - Tensorflow和Flexflow等最先进的系统。此外,我们将模型 - 并行性和数据并行性结合起来使用1,024 Volta V100 GPU培训1000-1AYER Reset-LK模型,缩放效率为97.32%。对于现实世界的组织病理学全幻灯片(WSI)为100,000 x 100,000像素,我们在尺寸1024 x 1024的图像瓷砖上训练自定义Reset-110-V2,并将培训时间从7小时减少到28分钟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号