首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture
【24h】

Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture

机译:在3D堆叠的内存处理架构中利用CNN应用程序的并行性

获取原文
获取原文并翻译 | 示例

摘要

Deep convolutional neural networks (CNNs) are widely adopted in intelligent systems with unprecedented accuracy but at the cost of a substantial amount of data movement. Although the emerging processing-in-memory (PIM) architecture seeks to minimize data movement by placing memory near processing elements, memory is still the major bottleneck in the entire system. The selection of hyper-parameters in the training of CNN applications requires over hundreds of kilobytes cache capacity for concurrent processing of convolutions. How to jointly explore the computation capability of the PIM architecture and the highly parallel property of neural networks remains a critical issue. This paper presents Para-Net, that exploits Parallelism for deterministic convolutional neural Networks on the PIM architecture. Para-Net achieves data-level parallelism for convolutions by fully utilizing the on-chip processing engine (PE) in PIM. The objective is to capture the characteristics of neural networks and present a hardware-independent design to jointly optimize the scheduling of both intermediate results and computation tasks. We formulate this data allocation problem as a dynamic programming model and obtain an optimal solution. To demonstrate the viability of the proposed Para-Net, we conduct a set of experiments using a variety of realistic CNN applications. The graph abstractions are obtained from deep learning framework Caffe. Experimental results show that Para-Net can significantly reduce processing time and improve cache efficiency compared to representative schemes.
机译:深度卷积神经网络(CNN)以前所未有的精度在智能系统中被广泛采用,但要付出大量数据移动的代价。尽管新兴的内存中处理(PIM)体系结构试图通过将内存放置在处理元素附近来最大程度地减少数据移动,但是内存仍然是整个系统的主要瓶颈。在CNN应用程序训练中选择超参数需要同时进行卷积处理的数百KB缓存容量。如何共同探索PIM体系结构的计算能力和神经网络的高度并行性仍然是一个关键问题。本文介绍了Para-Net,它在PIM架构上将并行性用于确定性卷积神经网络。 Para-Net通过充分利用PIM中的片上处理引擎(PE)来实现卷积的数据级并行性。目的是捕获神经网络的特征,并提出一种独立于硬件的设计,以共同优化中间结果和计算任务的调度。我们将此数据分配问题公式化为动态规划模型,并获得最佳解决方案。为了证明拟议的Para-Net的可行性,我们使用了各种实际的CNN应用程序进行了一系列实验。图形抽象是从深度学习框架Caffe获得的。实验结果表明,与代表性方案相比,Para-Net可以显着减少处理时间并提高缓存效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号