A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

Schuiki Fabian; Schaffner Michael; Gurkaynak Frank K.; Benini Luca

首页> 外文期刊>IEEE Transactions on Computers >A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

【24h】

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

机译：可扩展的近内存体系结构，用于在大型内存数据集中训练深度神经网络

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

AI期刊论文写作 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) a loose coupling of RISC-V cores and NTX co-processors reducing offloading overhead by 7 x over previously published results; (ii) an optimized IEEE 754 compliant data path for fast high-precision convolutions and gradient propagation; (iii) evaluation of near-memory computing with NTX embedded into residual area on the Logic Base die of a Hybrid Memory Cube; and (iv) a scaling analysis to meshes of HMCs in a data center scenario. We demonstrate a 2.7 x energy efficiency improvement of NTX over contemporary GPUs at 4.4 x less silicon area, and a compute performance of 1.2 Tflop/s for training large state-of-the-art networks with full floating-point precision. At the data center scale, a mesh of NTX achieves above 95 percent parallel and energy efficiency, while providing 2.1 x energy savings or 3.1 x performance improvement over a GPU-based system.

机译：对于深度神经网络的近内存硬件加速器的大多数研究都主要集中在推理上，而到目前为止，加速训练的潜力却很少受到关注。在对基于梯度的最新训练方法中的关键计算模式进行深入分析的基础上，我们提出了一种称为NTX的有效近内存加速引擎，该引擎可用于训练最新技术深度卷积神经网络我们的主要贡献是：（i）RISC-V内核和NTX协处理器的松散耦合，使卸载开销比以前公布的结果减少了7倍；（ii）优化的符合IEEE 754的数据路径，用于快速高精度卷积和梯度传播；（iii）使用混合内存多维数据集的逻辑基础芯片的剩余区域中嵌入的NTX评估近内存计算；（iv）在数据中心场景中对HMC的网格进行缩放分析。我们证明，与现有的GPU相比，NTX的能效提高了2.7倍，而硅面积却减少了4.4倍，而计算性能为1.2 Tflop / s，可用于以最大的浮点精度训练大型的最新网络。在数据中心规模上，NTX网格可实现95％以上的并行和能效，同时与基于GPU的系统相比，可节省2.1倍的能耗或3.1倍的性能。

著录项

来源
《IEEE Transactions on Computers》 |2019年第4期|484-497|共14页
作者
Schuiki Fabian; Schaffner Michael; Gurkaynak Frank K.; Benini Luca;
展开▼
作者单位

Swiss Fed Inst Technol, D ITET, CH-8092 Zurich, Switzerland;

Swiss Fed Inst Technol, D ITET, CH-8092 Zurich, Switzerland;

Swiss Fed Inst Technol, Microelect Design Ctr, CH-8092 Zurich, Switzerland;

Univ Bologna, Scuola Ingn & Architettura, Dipartimento Elettron Informat & Sistemist, I-40126 Bologna, Emilia Romagna, Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Parallel architectures; memory structures; memory hierarchy; machine learning; neural nets;

机译：并行架构;记忆结构;记忆层次结构;机器学习;神经网;

相似文献

外文文献
中文文献
专利

1. A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets [J] . Schuiki Fabian, Schaffner Michael, Gurkaynak Frank K., IEEE Transactions on Computers . 2019,第4期

机译：一种可扩展的近记忆架构，用于培训大型内存数据集的深神经网络
2. A scalable and reconfigurable in-memory architecture for ternary deep spiking neural network with ReRAM based neurons [J] . Neurocomputing . 2020,第Jana29期

机译：具有基于ReRAM的神经元的三元深度强化神经网络的可扩展且可重新配置的内存结构
3. Custom datasets from 0.-A brief discussion on how to acquire and prepare data for training deep learning neural networks. Whether for sorting, recognizing objects or targeting. [J] . Andr Costa Journal of Telecommunications System & Management . 2020,第4期

机译：自定义数据集免于0.关于如何获取和准备培训深度学习神经网络的数据的简要讨论。是否进行排序，识别对象或目标。
4. Training Deep Neural Networks with Adversarially Augmented Features for Small-scale Training Datasets [C] . Masato Ishii, Atsushi Sato International Joint Conference on Neural Networks . 2019

机译：训练具有对抗性增强功能的深度神经网络以用于小型训练数据集
5. Synthesizing additional training data to increase the classification accuracy of visual data using feed-forward neural networks on small datasets. [D] . Qumsieh, Rafi. 2017

机译：在小型数据集上使用前馈神经网络合成其他训练数据，以提高视觉数据的分类准确性。
6. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network [O] . Seung Seog Han, Gyeong Hun Park, Woohyung Lim, -1

机译：深度神经网络在灰指甲诊断中显示出与皮肤科医生相当且通常优于皮肤病的性能：通过基于区域的卷积深度神经网络自动构建灰指甲数据集
7. Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks [O] . Melodie Boillet, Christopher Kermorvant, Thierry Paquet 2021

机译：多个文档数据集预培训改善了具有深度神经网络的文本线路检测

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅