A computational-graph partitioning method for training memory-constrained DNNs

Qararyah Fareed; Wahib Mohamed; Dikbayir Doga; Belviranli Mehmet Esat; Unat Didem

首页> 外文期刊>Parallel Computing >A computational-graph partitioning method for training memory-constrained DNNs

【24h】

A computational-graph partitioning method for training memory-constrained DNNs

机译：用于训练内存受限DNN的计算图分区方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Many state-of-the-art Deep Neural Networks (DNNs) have substantial memory requirements. Limited device memory becomes a bottleneck when training those models. We propose ParDNN, an automatic, generic, and non-intrusive partitioning strategy for DNNs that are represented as computational graphs. ParDNN decides a placement of DNN's underlying computational graph operations across multiple devices so that the devices' memory constraints are met and the training time is minimized. ParDNN is completely independent of the deep learning aspects of a DNN. It requires no modification neither at the model nor at the systems level implementation of its operation kernels. ParDNN partitions DNNs having billions of parameters and hundreds of thousands of operations in seconds to few minutes. Our experiments with TensorFlow on 16 GPUs demonstrate efficient training of 5 very large models while achieving superlinear scaling for both the batch size and training throughput. ParDNN either outperforms or qualitatively improves upon the related work.

机译：许多最先进的深神经网络（DNN）具有大量的内存要求。限量设备存储器在培训那些模型时成为瓶颈。我们为DNN提出普德纳，自动，通用和非侵入式分区策略，其表示为计算图表。 Pardnn决定DNN底层的底层计算图操作跨多个设备的位置，以便满足设备的内存约束，并且训练时间最小化。 Pardnn完全独立于DNN的深层学习方面。它无需修改模型，也不需要修改其操作内核的系统级别实现。 Pardnn Partitions Dnns具有数十亿个参数和数百秒的操作，以几秒钟为单位。我们对16个GPU的Tensorflow的实验展示了5种非常大型型号的高效训练，同时实现批量大小和训练吞吐量的超连线缩放。 Pardnn无论是优于还是质量地改善相关工作。

著录项

来源
《Parallel Computing》 |2021年第7期|102792.1-102792.14|共14页
作者
Qararyah Fareed; Wahib Mohamed; Dikbayir Doga; Belviranli Mehmet Esat; Unat Didem;
展开▼
作者单位

Koc Univ Istanbul Turkey;

Natl Inst Adv Ind Sci & Technol Tokyo Japan;

Michigan State Univ E Lansing MI 48824 USA;

Colorado Sch Mines Golden CO 80401 USA;

Koc Univ Istanbul Turkey;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
DNN; Graph partitioning; Model parallelism;

机译：DNN;图分隔;模型并行性;

相似文献

外文文献
中文文献
专利

1. Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers [J] . Tsubasa OCHIAI, Shigeki MATSUDA, Hideyuki WATANABE, IEICE transactions on information and systems . 2016,第10期

机译：混合DNN-HMM语音识别器中DNN中的说话人自适应训练本地化说话人模块
2. Partitioning Sporadic Task Systems upon Memory-Constrained Multiprocessors [J] . SANJOY BARUAH ACM Transactions on Embedded Computing Systems . 2013,第3期

机译：在内存受限的多处理器上对零星的任务系统进行分区
3. EC-DNN: A new method for parallel training of deep neural networks [J] . Sun Shizhao, Liu Xiaoguang Neurocomputing . 2018,第APRa26期

机译：EC-DNN：一种用于深度神经网络并行训练的新方法
4. Model-Distributed DNN Training for Memory-Constrained Edge Computing Devices [C] . Pengzhen Li, Hulya Seferoglu, Venkat R. Dasari, IEEE International Symposium on Local and Metropolitan Area Networks . 2021

机译：用于内存约束边缘计算设备的模型分布式DNN培训
5. Co-Designing Communication Middleware and Deep Learning Frameworks for High-Performance Dnn Training on Hpc Systems [D] . Awan, Ammar Ahmad. 2020

机译：共同设计通信中间件和HPC系统高性能DNN培训的深度学习框架
6. Mobility-Included DNN Partition Offloading from Mobile Devices to Edge Clouds [O] . Xianzhong Tian, Juan Zhu, Ting Xu, 2021

机译：包括从移动设备到边缘云的移动设备卸载了DNN分区
7. Deep Partitioned Training From Near-Storage Computing to DNN Accelerators [O] . Yongjoo Jang, Sejin Kim, Daehoon Kim, 2021

机译：从近存储计算到DNN加速器的深度分区训练

A computational-graph partitioning method for training memory-constrained DNNs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅