The present invention relates to a system and a method for quantizing a pre-trained neural network. The method for quantizing a pre-trained neural network comprises: determining minimum quantization noise for a layer or channel for each master bit-width value in a predetermined set of master bit-width values, by a layer/channel bit-width determiner for a layer or channel of each of pre-trained neural networks; and selecting the master bit-width value having the minimum quantization noise for the layer or channel, by a bit-width selector for the layer or channel. In an embodiment of the present invention, the minimum quantization noise is based on multiplying the square of a range of weights for the layer or channel by a constant that is a negative exponent of a current master bit-width value.
展开▼