The present invention discloses a neural network accelerator and a neural network acceleration method based on structured pruning and low-bit quantization. The neural network accelerator includes a master controller, an activations selection unit, an extensible calculation array, a multifunctional processing element, a DMA, a DRAM and a buffer. The present invention makes full use of the data reusability during inference operation of a neural network, reduces the power consumption of selecting input activation and weights of effective calculations, and relieves the high transmission bandwidth pressure between the activations selection unit and the extensible calculation array through structured pruning and data sharing on the extensible calculation array, reduces the number of weight parameters and the storage bit width by combining the low-bit quantization technology, and further improves the throughput rate and energy efficiency of the convolutional neural network accelerator.
展开▼