Deep convolutional neural networks (CNNs) have achieved state-of-the-art accuracy on recognition, detection, and other computer vision fields. A CNN hardware will enable mobile devices to meet real time demands. However, the design of CNN hardware faces challenges of high computational complexity and data bandwidth as well as huge divergence for different CNN network layers. In which, the throughput of the convolutional layer would be bounded by hardware resource and throughput of the fully connected layer would be bounded by available data bandwidth. Thus, a highly flexible design with efficient hardware is desired to meet these needs. This talk will present our end-to-end CNN accelerator with shared filter kernel for all layers and output view strategy for maximum data reuse. The whole CNN architecture is modelled with tile based design to optimize hardware resource and I/O data bandwidth for the desired CNN network under design constraints. The final design is generated based on desired resources and run time reconfigured by layer optimized parameters to achieve real time end-to-end CNN acceleration.
展开▼