Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
so as to reduce model calculation complexity and improve model inference performance.
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. This part provides the function of compressing the model, including two parts: model quantization (offline quantization training and online quantization training) and model pruning.
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance.
This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the PaddleClas models.
Model pruning cuts off the unimportant convolution kernel in CNN to reduce the amount of model parameters, so as to reduce the computational complexity of the model.
It is recommended that you could understand following pages before reading this example:
-[The training strategy of PaddleClas models](../../../docs/en/tutorials/quick_start_en.md)
Quantization is mostly suitable for the deployment of lightweight models on mobile terminals.
After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps.
After training a model, if you want to further compress the model size and speed up the prediction, you can use quantization or pruning to compress the model according to the following steps.
1. Install PaddleSlim
2. Prepare trained model
3.Quantization-Aware Training
3.Model compression
4. Export inference model
5. Deploy quantization inference model
...
...
@@ -27,7 +25,7 @@ After training, if you want to further compress the model size and accelerate th
* Install from source code to get the lastest features.
...
...
@@ -40,71 +38,103 @@ python setup.py install
### 2. Download Pretrain Model
PaddleClas provides a series of trained [models](../../../docs/en/models/models_intro_en.md).
If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../docs/en/tutorials/getting_started_en.md) method to get the trained model.
PaddleClas provides a series of trained [models](../../docs/en/models/models_intro_en.md).
If the model to be quantified is not in the list, you need to follow the [Regular Training](../../docs/en/tutorials/getting_started_en.md) method to get the trained model.
### 3. Model Compression
Go to the root directory of PaddleClas
```bash
cd PaddleClase
```
The code of slim is located in `deploy/slim`
#### 3.1 Model Quantization
Quantization training includes offline quantization and online quantization training.
##### 3.1.1 Online quantization training
### 3. Quant-Aware Training
Quantization training includes offline quantization training and online quantization training.
Online quantization training is more effective. It is necessary to load the pre-trained model.
After the quantization strategy is defined, the model can be quantified.
The code for quantization training is located in `deploy/slim/quant/quant.py`. The training command is as follow:
python3.7 deploy/slim/slim.py -m train -c ppcls/configs/slim/ResNet50_vd_quantalization.yaml -o Global.device cpu
```
The description of `yaml` file can be found in this [doc](../../docs/en/tutorials/config_en.md). To get better accuracy, the `pretrained model`is used in `yaml`.
`-m`: the mode of `slim.py` supported, include ` train, eval, infer, export`, means training models, evaluating model, inferring images using dygraph model and exporting inference model for deploy respectively.
**Attention**: At present, offline quantization must use `inference model` as input, which can be exported by trained model. The process of exporting `inference model` for trained model can refer to this [doc](../../docs/en/inference.md).
Generally speaking, the offline quantization gets more loss of accuracy than online qutization training.
After getting `inference model`, we can run following command to get offline quantization model.
* The command of quantizing `MobileNetV3_large_x1_0` model is as follow:
`Global.save_inference_dir` is the directory storing the `inference model`.
If run successfully, the directory `quant_post_static_model` is generated in `Global.save_inference_dir`, which stores the offline quantization model that can be used for deploy directly.