@@ -8,23 +8,25 @@ Model pruning decreases the number of model parameters by cutting out the unimpo
...
@@ -8,23 +8,25 @@ Model pruning decreases the number of model parameters by cutting out the unimpo
This tutorial explains how to use PaddleSlim, PaddlePaddle's model compression library, for PaddleClas compression, i.e., pruning and quantization. [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) integrates a variety of common and leading model compression functions such as model pruning, quantization (including quantization training and offline quantization), distillation, and neural network search. If you are interested, please follow us and learn more.
This tutorial explains how to use PaddleSlim, PaddlePaddle's model compression library, for PaddleClas compression, i.e., pruning and quantization. [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) integrates a variety of common and leading model compression functions such as model pruning, quantization (including quantization training and offline quantization), distillation, and neural network search. If you are interested, please follow us and learn more.
To start with, you are recommended to learn [PaddleClas Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md) and [PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html), see [Model Pruning and Quantization Algorithms](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/algorithm_introduction/model_prune_quantization.md) for related pruning and quantization methods.
To start with, you are recommended to learn [PaddleClas Training](../models_training/classification_en.md) and [PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html), see [Model Pruning and Quantization Algorithms](../algorithm_introduction/model_prune_quantization_en.md) for related pruning and quantization methods.
------
------
## Contents
## Catalogue
-[1. Prepare the Environment](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/advanced_tutorials/model_prune_quantization.md#1)
-[1.2 Prepare the Trained Model](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/advanced_tutorials/model_prune_quantization.md#1.2)
5. Inference and deployment of the quantized model
5. Inference and deployment of the quantized model
<aname="1.1"></a>
### 1.1 Install PaddleSlim
### 1.1 Install PaddleSlim
- You can adopt pip install for installation.
- You can adopt pip install for installation.
...
@@ -54,9 +58,13 @@ cd Paddleslim
...
@@ -54,9 +58,13 @@ cd Paddleslim
python3.7 setup.py install
python3.7 setup.py install
```
```
<aname="1.2"></a>
### 1.2 Prepare the Trained Model
### 1.2 Prepare the Trained Model
PaddleClas offers a list of trained [models](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/models_intro.md). If the model to be quantized is not in the list, you need to follow the [regular training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md) method to get the trained model.
PaddleClas offers a list of trained [models](../models/models_intro_en.md). If the model to be quantized is not in the list, you need to follow the [regular training](../models_training/classification_en.md) method to get the trained model.
<aname="2"></a>
## 2. Quick Start
## 2. Quick Start
...
@@ -68,10 +76,14 @@ cd PaddleClas
...
@@ -68,10 +76,14 @@ cd PaddleClas
Related code for `slim` training has been integrated under `ppcls/engine/`, and the offline quantization code can be found in `deploy/slim/quant_post_static.py`.
Related code for `slim` training has been integrated under `ppcls/engine/`, and the offline quantization code can be found in `deploy/slim/quant_post_static.py`.
<aname="2.1"></a>
### 2.1 Model Quantization
### 2.1 Model Quantization
Quantization training includes offline and online training. Online quantitative training, the more effective one, requires loading a pre-trained model, which can be quantized after defining the strategy.
Quantization training includes offline and online training. Online quantitative training, the more effective one, requires loading a pre-trained model, which can be quantized after defining the strategy.
<aname="2.1.1"></a>
#### 2.1.1 Online Quantization Training
#### 2.1.1 Online Quantization Training
Try the following command:
Try the following command:
...
@@ -84,7 +96,7 @@ Take CPU for example, if you use GPU, change the `cpu` to `gpu`.
...
@@ -84,7 +96,7 @@ Take CPU for example, if you use GPU, change the `cpu` to `gpu`.
The parsing of the `yaml` file is described in [reference document](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/config_description.md). For accuracy, the `pretrained model` has already been adopted by the `yaml` file.
The parsing of the `yaml` file is described in [reference document](../models_training/config_description_en.md). For accuracy, the `pretrained model` has already been adopted by the `yaml` file.
- Launch in single-machine multi-card/ multi-machine multi-card mode
- Launch in single-machine multi-card/ multi-machine multi-card mode
**Note**: Currently, the `inference model` exported from the trained model is a must for offline quantization. See the [tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/export_model.md) for general export of the `inference model`.
**Note**: Currently, the `inference model` exported from the trained model is a must for offline quantization. See the [tutorial](../inference_deployment/export_model_en.md) for general export of the `inference model`.
Normally, offline quantization may lose more accuracy.
Normally, offline quantization may lose more accuracy.
...
@@ -112,6 +126,8 @@ The `inference model` is stored in`Global.save_inference_dir`.
...
@@ -112,6 +126,8 @@ The `inference model` is stored in`Global.save_inference_dir`.
Successfully executed, the `quant_post_static_model` folder is created in the `Global.save_inference_dir`, where the generated offline quantization models are stored and can be deployed directly without re-exporting the models.
Successfully executed, the `quant_post_static_model` folder is created in the `Global.save_inference_dir`, where the generated offline quantization models are stored and can be deployed directly without re-exporting the models.
Having obtained the saved model after online quantization training and pruning, it can be exported as an inference model for inference deployment. Here we take model pruning as an example:
Having obtained the saved model after online quantization training and pruning, it can be exported as an inference model for inference deployment. Here we take model pruning as an example:
...
@@ -145,11 +163,15 @@ python3.7 tools/export.py \
...
@@ -145,11 +163,15 @@ python3.7 tools/export.py \
-o Global.save_inference_dir=./inference
-o Global.save_inference_dir=./inference
```
```
<aname="4"></a>
## 4. Deploy the Model
## 4. Deploy the Model
The exported model can be deployed directly using inference, please refer to [inference deployment](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_ deployment).
The exported model can be deployed directly using inference, please refer to [inference deployment](../inference_deployment/).
You can also use PaddleLite's opt tool to convert the inference model to a mobile model for its mobile deployment. Please refer to [Mobile Model Deployment](../inference_deployment/paddle_lite_deploy_en.md) for more details.
You can also use PaddleLite's opt tool to convert the inference model to a mobile model for its mobile deployment. Please refer to [Mobile Model Deployment](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_lite_deploy.md) for more details.
@@ -36,7 +36,9 @@ After the above improvement, *PACT* preprocessing is inserted between the activa
...
@@ -36,7 +36,9 @@ After the above improvement, *PACT* preprocessing is inserted between the activa
For specific algorithm parameters, please refer to [Introduction to Parameters](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0.0/docs/zh_cn/api_cn/dygraph/quanter/qat.rst#qat) in PaddleSlim.
For specific algorithm parameters, please refer to [Introduction to Parameters](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0.0/docs/zh_cn/api_cn/dygraph/quanter/qat.rst#qat) in PaddleSlim.
## FPGM
<aname='2'></a>
## 2. FPGM
Model pruning is an essential practice to reduce the model size and improve inference efficiency. In previous articles on network pruning, the norm of the network filter is generally adopted to measure its importance, **the smaller the norm value, the less important the filter is** and the more significant it will be to clip it from the network. **FPGM** believes that the previous approach relies on the following two points:
Model pruning is an essential practice to reduce the model size and improve inference efficiency. In previous articles on network pruning, the norm of the network filter is generally adopted to measure its importance, **the smaller the norm value, the less important the filter is** and the more significant it will be to clip it from the network. **FPGM** believes that the previous approach relies on the following two points: