Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture

Wang Yi; Chen Weixuan; Yang Jing; Li Tao

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture

【24h】

Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture

机译：在3D堆叠的内存处理架构中利用CNN应用程序的并行性

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Deep convolutional neural networks (CNNs) are widely adopted in intelligent systems with unprecedented accuracy but at the cost of a substantial amount of data movement. Although the emerging processing-in-memory (PIM) architecture seeks to minimize data movement by placing memory near processing elements, memory is still the major bottleneck in the entire system. The selection of hyper-parameters in the training of CNN applications requires over hundreds of kilobytes cache capacity for concurrent processing of convolutions. How to jointly explore the computation capability of the PIM architecture and the highly parallel property of neural networks remains a critical issue. This paper presents Para-Net, that exploits Parallelism for deterministic convolutional neural Networks on the PIM architecture. Para-Net achieves data-level parallelism for convolutions by fully utilizing the on-chip processing engine (PE) in PIM. The objective is to capture the characteristics of neural networks and present a hardware-independent design to jointly optimize the scheduling of both intermediate results and computation tasks. We formulate this data allocation problem as a dynamic programming model and obtain an optimal solution. To demonstrate the viability of the proposed Para-Net, we conduct a set of experiments using a variety of realistic CNN applications. The graph abstractions are obtained from deep learning framework Caffe. Experimental results show that Para-Net can significantly reduce processing time and improve cache efficiency compared to representative schemes.

机译：深度卷积神经网络（CNN）以前所未有的精度在智能系统中被广泛采用，但要付出大量数据移动的代价。尽管新兴的内存中处理（PIM）体系结构试图通过将内存放置在处理元素附近来最大程度地减少数据移动，但是内存仍然是整个系统的主要瓶颈。在CNN应用程序训练中选择超参数需要同时进行卷积处理的数百KB缓存容量。如何共同探索PIM体系结构的计算能力和神经网络的高度并行性仍然是一个关键问题。本文介绍了Para-Net，它在PIM架构上将并行性用于确定性卷积神经网络。 Para-Net通过充分利用PIM中的片上处理引擎（PE）来实现卷积的数据级并行性。目的是捕获神经网络的特征，并提出一种独立于硬件的设计，以共同优化中间结果和计算任务的调度。我们将此数据分配问题公式化为动态规划模型，并获得最佳解决方案。为了证明拟议的Para-Net的可行性，我们使用了各种实际的CNN应用程序进行了一系列实验。图形抽象是从深度学习框架Caffe获得的。实验结果表明，与代表性方案相比，Para-Net可以显着减少处理时间并提高缓存效率。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2019年第3期|589-600|共12页
作者
Wang Yi; Chen Weixuan; Yang Jing; Li Tao;
展开▼
作者单位

Shenzhen Univ, Coll Comp Sci & Software Engn, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Peoples R China;

Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China;

Harbin Inst Technol, Expt & Innovat Practice Ctr, Shenzhen 518055, Peoples R China;

Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Near-data processing; neuromorphic computing; scheduling; memory management; parallel computing;

机译：近数据处理;神经形态计算;调度;内存管理;并行计算;

相似文献

外文文献
中文文献
专利

1. Exploiting Application Data-Parallelism on Dynamically Reconfigurable Architectures: Placement and Architectural Considerations [J] . Banerjee S., Bozorgzadeh E., Dutt N. IEEE transactions on very large scale integration (VLSI) systems . 2009,第2期

机译：在动态可重配置架构上利用应用程序数据并行性：布局和架构注意事项
2. A simulation suite for Lattice-Boltzmann based real-time CFD applications exploiting multi-level parallelism on modern multi- and many-core architectures [J] . Markus Geveler, Dirk Ribbrock, Sven Mallach, Journal of computational science . 2011,第2期

机译：针对基于莱迪思-玻尔兹曼的实时CFD应用程序的仿真套件，该模块利用了现代多核和多核架构上的多级并行性
3. A Multiprocessor Architecture SKY that Exploits Thread-Level Parallelism in Non-Numerical Applications [J] . Ryotaro Kobayashi, Yukihiro Ogawa, Mitsuaki Iwata 情報処理学会論文誌 . 2001,第2期

机译：在非数值应用程序中利用线程级并行性的多处理器体系结构SKY
4. Exploiting Parallelism for Convolutional Connections in Processing-In-Memory Architecture [C] . Yi Wang, Mingxu Zhang, Jing Yang ACM/EDAC/IEEE Design Automation Conference . 2017

机译：利用并行性在加工在内存架构中的卷积连接
5. A Programmable Processing-In-Memory Architecture for Memory Intensive Applications [D] . Connolly, Mark. 2021

机译：用于内存密集型应用的可编程处理内存架构
6. A Processing-in-Memory Architecture Programming Paradigm for Wireless Internet-of-Things Applications [O] . Xu Yang, Yumin Hou, Hu He 2019

机译：无线物联网应用的内存中处理架构编程范例
7. A Simulation Suite for Lattice-Boltzmann based Real-Time CFD Applications Exploiting Multi-Level Parallelism on modern Multi- and Many-Core Architectures [O] . Markus Gevelera, Dirk Ribbrocka, Sven Mallachb 2016

机译：基于Lattice-Boltzmann的实时CFD应用仿真套件，利用现代多核和多核架构的多级并行性

Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅