Deep learning is becoming more and more widely used in mobile, and since there is still a gap in computing power in mobile compared to servers, the
So the difficulty in deploying deep learning models on mobile lies in how to ensure that the models are effective while having guaranteed operational efficiency.
In the experimental phase for the model structure a large model can be chosen, because that phase is mainly to verify the validity of the method. in verifying
Once that's done, start working on deploying to mobile, and it's time to streamline the structure of the model, generally for the large model that's trained
Do pruning, or refer to existing lightweight networks such as MobileNetV2 and ShuffleNetV2 to redesign your own
Network Module. And the algorithm-level optimization has quantization in addition to pruning. Quantization is the process of taking the weights of the floating-point (high-precision) representation and
The activation values are approximated by lower precision integers. The advantages of low precision are that, compared to high precision arithmetic operations, its unit
more data can be processed in time and the storage space of the model can be further reduced after quantization of weights, etc. .
To do quantization on trained networks, the post-training quantization algorithm of TensorRT  has been tried in practice and works quite well.
But if you can go through the training process to simulate the quantization process and let the network learn to correct for the errors caused by the quantization.
Then the obtained quantization parameters should be more accurate and the performance loss of the model should be able to be smaller in the actual quantization inference.
And the content of this paper is to present the paper  and reproduce some details of its process.
As usual, the code for the experiments in this paper is given first.TrainQuantization
Quantification of training simulations
Introduction to the methodology
Let's first look at the specific definition of quantization, for quantizing activation values to signed 8bit integers, the definition given in the paper is as follows.