Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures

Jin Hai; Liu Bo; Jiang Wenbin; Ma Yang; Shi Xuanhua; He Bingsheng; Zhao Shaofeng

首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures

【24h】

Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures

机译：以核心架构为中心的内存重用和数据迁移，用于极度级核心架构

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to the popularity of Deep Neural Network (DNN) models, we have witnessed extreme-scale DNN models with the continued increase of the scale in terms of depth and width. However, the extremely high memory requirements for them make it difficult to run the training processes on single many-core architectures such as a Graphic Processing Unit (GPU), which compels researchers to use model parallelism over multiple GPUs to make it work. However, model parallelism always brings very heavy additional overhead. Therefore, running an extreme-scale model in a single GPU is urgently required. There still exist several challenges to reduce the memory footprint for extreme-scale deep learning. To address this tough problem, we first identify the memory usage characteristics for deep and wide convolutional networks, and demonstrate the opportunities for memory reuse at both the intra-layer and inter-layer levels. We then present Layrub, a runtime data placement strategy that orchestrates the execution of the training process. It achieves layer-centric reuse to reduce memory consumption for extreme-scale deep learning that could not previously be run on a single GPU. Experiments show that, compared to the original Caffe, Layrub can cut down the memory usage rate by an average of 58.2% and by up to 98.9%, at the moderate cost of 24.1% higher training execution time on average. Results also show that Layrub outperforms some popular deep learning systems such as GeePS, vDNN, MXNet, and Tensorflow. More importantly, Layrub can tackle extreme-scale deep learning tasks. For example, it makes an extra-deep ResNet with 1,517 layers that can be trained successfully in one GPU with 12GB memory, while other existing deep learning systems cannot.

机译：由于深度神经网络（DNN）模型的普及，我们已经目睹了极端的DNN模型，在深度和宽度方面持续增加了比例。然而，对它们的极高内存要求使得难以在单个许多核心架构上运行培训过程，例如图形处理单元（GPU），这迫使研究人员在多个GPU上使用模型并行性来使其工作。然而，模型并行始终带来非常沉重的额外开销。因此，迫切需要在单个GPU中运行极度级模型。减少极端深度学习的内存足迹仍然存在几个挑战。为了解决这个棘手的问题，我们首先识别深度和广泛的卷积网络的内存使用特性，并展示了层内层和层间级别的内存重用的机会。然后，我们存在LayRub，一个运行时数据放置策略，用于编制训练过程的执行。它实现了层式的重用，以减少以前无法在单个GPU上运行的极端级深度学习的内存消耗。实验表明，与原来的Caffe相比，Layrub可以将内存使用率平均降低58.2％和高达98.9％，平均训练执行时间的中等成本为24.1％。结果还表明，LayRub优于一些流行的深度学习系统，如Geeps，VDNN，MXNet和Tensorflow。更重要的是，Layrub可以解决极端的深度学习任务。例如，它使得一个高度的Reset，具有1,517层，可以在一个GPU中成功培训，其中包含12GB内存，而其他现有的深度学习系统不能。

著录项

来源
《ACM Transactions on Architecture and Code Optimization》 |2018年第3期|共26页
作者
Jin Hai; Liu Bo; Jiang Wenbin; Ma Yang; Shi Xuanhua; He Bingsheng; Zhao Shaofeng;
展开▼
作者单位

Huazhong Univ Sci &

Technol Sch Comp Sci &

Technol Serv Comp Technol &

Syst Lab Cluster &

Grid Comp Lab Big Data Technol &

Syst L Wuhan 430074 Hubei Peoples R China;

Huazhong Univ Sci &

Technol Sch Comp Sci &

Technol Serv Comp Technol &

Syst Lab Cluster &

Grid Comp Lab Big Data Technol &

Syst L Wuhan 430074 Hubei Peoples R China;

Huazhong Univ Sci &

Technol Sch Comp Sci &

Technol Serv Comp Technol &

Syst Lab Cluster &

Grid Comp Lab Big Data Technol &

Syst L Wuhan 430074 Hubei Peoples R China;

Huazhong Univ Sci &

Technol Sch Comp Sci &

Technol Serv Comp Technol &

Syst Lab Cluster &

Grid Comp Lab Big Data Technol &

Syst L Wuhan 430074 Hubei Peoples R China;

Huazhong Univ Sci &

Technol Sch Comp Sci &

Technol Serv Comp Technol &

Syst Lab Cluster &

Grid Comp Lab Big Data Technol &

Syst L Wuhan 430074 Hubei Peoples R China;

Natl Univ Singapore Sch Comp Dept Comp Sci Singapore 119077 Singapore;

Huazhong Univ Sci &

Technol Div Data Storage Syst Wuhan Natl Lab Optoelect Wuhan 430074 Hubei Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Data placement; DNN; GPU; memory efficiency;

机译：数据展示位置;DNN;GPU;记忆效率;

相似文献

外文文献
中文文献
专利

1. Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures [J] . Jin Hai, Liu Bo, Jiang Wenbin, ACM Transactions on Architecture and Code Optimization . 2018,第3期

机译：以核心架构为中心的内存重用和数据迁移，用于极度级核心架构
2. POSTER: Layrub: Layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems [J] . Bo Liu, Wenbin Jiang, Hai Jin, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2018,第1期

机译：海报：LayRub：以极值深度学习系统为中心的GPU内存重用和数据迁移
3. Hotness- and Lifetime-Aware Data Placement and Migration for High-Performance Deep Learning on Heterogeneous Memory Systems [J] . Han Myeonggyun, Hyun Jihoon, Park Seongbeom, IEEE Transactions on Computers . 2020,第3期

机译：异构内存系统高性能深度学习的热敏和寿命感知数据放置和迁移
4. Exploring Data Migration for Future Deep-Memory Many-Core Systems [C] . Swann Perarnau, Judicael A. Zounmevo, Balazs Gerofi, IEEE International Conference on Cluster Computing . 2016

机译：探索未来深存储器多核系统的数据迁移
5. In-Memory Computing Architecture for Deep Learning Acceleration [D] . Chen, Fan. 2020

机译：用于深度学习加速的内存计算架构
6. Automatic Analysis of EEGs Using Big Data and Hybrid Deep Learning Architectures [O] . Meysam Golmohammadi, Amir Hossein Harati Nejad Torbati, Silvia Lopez de Diego, 2019

机译：使用大数据和混合深度学习架构自动分析脑电图
7. Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures [O] . Hai Jin, Bo Liu, Wenbin Jiang, 2018

机译：用于多核架构的极端级深度学习的地层内存重用和数据迁移

Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures

摘要

著录项

相似文献

相关主题

期刊订阅