TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling

Jin Hai; Wu Wenchao; Shi Xuanhua; He Ligang; Zhou Bing Bing

首页> 外文期刊>IEEE Transactions on Computers >TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling

【24h】

TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling

机译：TurboDL：用细粒度多流调度改善GPU上的CNN训练

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Graphics Processing Units (GPUs) have evolved as powerful co-processors for the CNN training. Many new features have been introduced into GPUs such as concurrent kernel execution and hyper-Q technology. It is challenging to orchestrate concurrency for CNN (convolutional neural networks) training on GPUs since it may introduce synchronization overhead and poor resource utilization. Unlike previous research which mainly focuses on single layer or coarse-grained optimization, we introduce a critical-path based, asynchronous parallelization mechanism, and propose the optimization technique for the CNN training that takes into account global network architecture and GPU resource usage together. The proposed methods can effectively overlap the synchronization and the computation in different streams. As a result, the training process of CNN is accelerated. We have integrated our methods into Caffe. The experimental results show that the Caffe integrated with our methods can achieve 1.30X performance speedup on average compared with Caffe+cuDNN, and even higher performance speedup can be achieved for deeper, wider, and more complicated networks.

机译：图形处理单元（GPU）已经发展成为CNN培训的强大的协处理器。已经引入了许多新功能，例如并发内核执行和Hyper-Q技术等GPU。协调CNN（卷积神经网络）对GPU训练的并发性挑战，因为它可能引入同步开销和资源利用率差。与以前的研究不同，主要关注单层或粗粒粒度优化，我们引入了基于临界路径的异步并行化机制，并提出了CNN培训的优化技术，该技术考虑了全局网络架构和GPU资源使用量。所提出的方法可以有效地重叠同步和不同流中的计算。结果，CNN的培训过程加速了。我们已将我们的方法整合到Caffe中。实验结果表明，与我们的方法集成的Caffe可以平均实现1.30倍的性能加速，与Caffe + Cudnn相比，可以实现更高，更广泛，更复杂的网络的性能加速。

著录项

来源
《IEEE Transactions on Computers》 |2021年第4期|552-565|共14页
作者
Jin Hai; Wu Wenchao; Shi Xuanhua; He Ligang; Zhou Bing Bing;
展开▼
作者单位

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol Wuhan 430074 Hubei Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol Wuhan 430074 Hubei Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp Lab Sch Comp Sci & Technol Wuhan 430074 Hubei Peoples R China;

Univ Warwick Dept Comp Sci Coventry CV4 7AL W Midlands England;

Univ Sydney Sch Comp Sci Sydney NSW 2006 Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Training; Graphics processing units; Synchronization; Convolution; Kernel; Resource management; Parallel processing; Deep learning; parallelism optimization; scheduling; GPU;

机译：培训;图形处理单元;同步;卷积;内核;资源管理;并行处理;深度学习;并行优化;调度;GPU;GPU;

相似文献

外文文献
中文文献
专利

1. HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs [J] . Fu Hao, Tang Shanjiang, He Bingsheng, Journal of supercomputing . 2021,第11期

机译：HGP4CNN：用于培训现代GPU的卷积神经网络的有效平行化框架
2. FRF: Toward Warp-Scheduler Friendly STT-RAM/SRAM Fine-Grained Hybrid GPGPU Register File Design [J] . Deng Quan, Zhang Youtao, Zhao Zhenyu, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第10期

机译：FRF：朝着经线调度器友好的STT-RAM / SRAM精细颗粒混合GPGPU注册文件设计
3. An end-to-end CNN and LSTM network with 3D anchors for mitotic cell detection in 4D microscopic images and its parallel implementation on multiple GPUs [J] . Kitrungrotsakul Titinunt, Han Xian-Hua, Iwamoto Yutaro, Neural computing & applications . 2020,第10期

机译：具有3D锚点的端到端CNN和LSTM网络，用于4D微观图像中的有丝分子细胞检测及其在多个GPU上的并行实现
4. Efficient Sharing and Fine-Grained Scheduling of Virtualized GPU Resources [C] . Xiaohui Zhao, Jianguo Yao, Ping Gao, IEEE International Conference on Distributed Computing Systems . 2018

机译：虚拟GPU资源的高效共享和精细调度
5. Hint-Assisted Scheduling on Modern GPUs [D] . ?Gong, Xun 2020

机译：现代GPU的提示辅助调度
6. Fine-Grained Face Annotation Using Deep Multi-Task CNN [O] . Luigi Celona, Simone Bianco, Raimondo Schettini 2018

机译：使用深度多任务CNN的细粒度人脸注释
7. TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling [O] . Hai Jin, Wenchao Wu, Xuanhua Shi, 2021

机译：TurboDL：用细粒度多流调度改善GPU上的CNN训练

TurboDL: Improving the CNN Training on GPU With Fine-Grained Multi-Streaming Scheduling

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅