You can find `cmake-3.22.0/` folder in the current directory.
- To compile cmake, first set the source path of `cmake` (`root_path`) and installation path (`install_path`). In this example, the source path is `cmake-3.22.0/` in the current directory.
```shell
cd ./cmake-3.22.0
export root_path=$PWD
export install_path=${root_path}/cmake
```
- Then compile under the source path as follows:
```shell
./bootstrap --prefix=${install_path}
make -j
make install
```
- Set environment variables
```shell
export PATH=${install_path}/bin:$PATH
#Check its well functioning
cmake --version
```
cmake is now ready for use.
### 1.2 Compile opencv Library
- First, download the package for source compilation in Linux environment from the official website of opencv. Taking version 3.4.7 as an example, follow the command below to download and unzip it:
You can find`opencv-3.4.7/`folder in the current directory.
- To compile opencv, first set the source path of opencv(`root_path`) and installation path (`install_path`). In this example, the source path is`opencv-3.4.7/`in the current directory.
```
cd ./opencv-3.4.7
export root_path=$PWD
export install_path=${root_path}/opencv3
```
- Then compile under the source path as follows:
```shell
rm-rf build
mkdir build
cd build
cmake .. \
-DCMAKE_INSTALL_PREFIX=${install_path}\
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DWITH_IPP=OFF \
-DBUILD_IPP_IW=OFF \
-DWITH_LAPACK=OFF \
-DWITH_EIGEN=OFF \
-DCMAKE_INSTALL_LIBDIR=lib64 \
-DWITH_ZLIB=ON \
-DBUILD_ZLIB=ON \
-DWITH_JPEG=ON \
-DBUILD_JPEG=ON \
-DWITH_PNG=ON \
-DBUILD_PNG=ON \
-DWITH_TIFF=ON \
-DBUILD_TIFF=ON
make -j
make install
```
- After `make install` is done, opencv header and library files will be generated in this folder for later compilation of PaddleClas code.
For opencv version 3.4.7, the final file structure under the installation path is shown below. **Note**: The following file structure may vary for different opencv versions.
```
opencv3/
|-- bin
|-- include
|-- lib64
|-- share
```
### 1.3 Download or Compile Paddle Inference Library
- Here we detail 2 ways to obtain Paddle inference library.
#### 1.3.1 Compile the Source of Inference Library
- To obtain the latest features of the inference library, you can clone the latest code from Paddle github and compile the source code of the library.
- Please refer to the website of [Paddle Inference Library](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_ and_install_lib_cn.html#id16) to get Paddle code from github and then compile it to generate the latest inference library. The method to obtain the code using git is as follows.
- Adopt the following method to compile after entering Paddle directory.
```shell
rm-rf build
mkdir build
cd build
cmake .. \
-DWITH_CONTRIB=OFF \
-DWITH_MKL=ON \
-DWITH_MKLDNN=ON \
-DWITH_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_INFERENCE_API_TEST=OFF \
-DON_INFER=ON \
-DWITH_PYTHON=ON
make -j
make inference_lib_dist
```
See the official website of [Paddle C++ Inference Library](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16) for more compilation parameters.
- The following files and folders can be found generated under `build/paddle_inference_install_dir/` after compilation.
```
build/paddle_inference_install_dir/
|-- CMakeCache.txt
|-- paddle
|-- third_party
|-- version.txt
```
`paddle` is the Paddle library needed for later C++ inference, and `version.txt` contains the version information of the current inference library.
#### 1.3.2 Direct Download and Installation
- The Linux inference library of different cuda versions are available on the official website of [Paddle Inference Library ](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html), where you can choose the appropriate version. Note that you must select the `develop` version.
For the `develop` version of `https://paddle-inference-lib.bj.bcebos.com/2.1.1-gpu-cuda10.2-cudnn8.1-mkl-gcc8.2/paddle_inference.tgz`, use the following command to download and unzip it:
Please install `openblas` before `faiss`, the installation command in `ubuntu` system is as follows:
```
apt-get install libopenblas-dev
```
Note that this tutorial installs the cpu version of faiss as an example, please install it as your need by referring to the official documents of [faiss](https://github.com/facebookresearch/faiss).
## 2 Code Compilation
### 2.2 Compile the C++ Inference Demo of PP-ShiTu
The command is as follows, where the address of Paddle C++ inference library, opencv and other dependency libraries need to be replaced with the actual address on your own machine. Also, you need to download and compile `yaml-cpp` and other C++ libraries during the compilation, so please keep the network unblocked.
```shell
sh tools/build.sh
```
Specifically, the contents of `tools/build.sh` are as follows, please modify according to the specific path.
```shell
OPENCV_DIR=${opencv_install_dir}
LIB_DIR=${paddle_inference_dir}
CUDA_LIB_DIR=/usr/local/cuda/lib64
CUDNN_LIB_DIR=/usr/lib/x86_64-linux-gnu/
FAISS_DIR=${faiss_install_dir}
FAISS_WITH_MKL=OFF
BUILD_DIR=build
rm-rf${BUILD_DIR}
mkdir${BUILD_DIR}
cd${BUILD_DIR}
cmake .. \
-DPADDLE_LIB=${LIB_DIR}\
-DWITH_MKL=ON \
-DWITH_GPU=OFF \
-DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=OFF \
-DOPENCV_DIR=${OPENCV_DIR}\
-DCUDNN_LIB=${CUDNN_LIB_DIR}\
-DCUDA_LIB=${CUDA_LIB_DIR}\
-DFAISS_DIR=${FAISS_DIR}\
-DFAISS_WITH_MKL=${FAISS_WITH_MKL}
make -j
cd ..
```
In the above commands:
-`OPENCV_DIR` is the address of the opencv compilation and installation (in this case, the path of the `opencv-3.4.7/opencv3` folder).
-`LIB_DIR` is the path of the downloaded Paddle inference library (`paddle_inference` folder), or the generated Paddle inference library after compilation (`build/paddle_inference_install_dir` folder).
-`CUDA_LIB_DIR` is path of the cuda library file, which in docker is `/usr/local/cuda/lib64`.
-`CUDNN_LIB_DIR` is the path of the cudnn library file, which in docker is `/usr/lib/x86_64-linux-gnu/` .
-`TENSORRT_DIR` is the path of the tensorrt library file, which in docker is `/usr/local/TensorRT6-cuda10.0-cudnn7/`. TensorRT needs to be used in combination with GPU.
-`FAISS_DIR` is the installation path of faiss.
-`FAISS_WITH_MKL` means whether mkldnn is used during the compilation of faiss. The compilation in this document employs openbals instead of mkldnn, so it is set to `OFF`, otherwise it is `ON`.
A `build` folder will be created in the current path after the compilation, which generates an executable file named `pp_shitu`.
## 3 Run the demo
- Please refer to the [Quick Start of Recognition](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/quick_start/quick_start_recognition.md), download the corresponding Lightweight Generic Mainbody Detection Model, Lightweight Generic Recognition Model, and the beverage test data and unzip them.
`id_map.txt` is generated in `IndexProcess.index_dir` directory for convenience of C++ reading.
- Execute the program
```shell
./build/pp_shitu -c inference_drink.yaml
# or
./build/pp_shitu -config inference_drink.yaml
```
The following results can be obtained after searching the image set.
At the same time, it should be noticed that a slight difference may occur during the pre-processing of the image due to the version of opencv, resulting in a minor discrepancy in python and c++ results, such as a few pixels for bbox, 3 decimal places for retrieval results, etc. But it has no impact on the final search label.
You can also use your self-trained models. Please refer to [model export](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/export_model.md) to export ` inference model` for model inference.
Mind modifying the specific parameters in the `yaml` file.
Complex models are conducive to better model performance, but they may also lead to certain redundancy. This section presents ways to streamline the model, including model quantization (quantization training and offline quantization) and model pruning.
Model quantization reduces the full precision to a fixed number of points to lower the redundancy and achieve the purpose of simplifying the model computation and improving model inference performance. Model quantization can reduce the size of model parameters by converting its precision from FP32 to Int8 without losing model precision, followed by accelerated computation, creating a quantized model with more speed advantages when deployed on mobile devices.
Model pruning decreases the number of model parameters by cutting out the unimportant convolutional kernels in the CNN, thus bringing down the computational complexity.
This tutorial explains how to use PaddleSlim, PaddlePaddle's model compression library, for PaddleClas compression, i.e., pruning and quantization. [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) integrates a variety of common and leading model compression functions such as model pruning, quantization (including quantization training and offline quantization), distillation, and neural network search. If you are interested, please follow us and learn more.
To start with, you are recommended to learn [PaddleClas Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md) and [PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html), see [Model Pruning and Quantization Algorithms](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/algorithm_introduction/model_prune_quantization.md) for related pruning and quantization methods.
------
## Contents
-[1. Prepare the Environment](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/advanced_tutorials/model_prune_quantization.md#1)
-[1.2 Prepare the Trained Model](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/advanced_tutorials/model_prune_quantization.md#1.2)
PaddleClas offers a list of trained [models](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models/models_intro.md). If the model to be quantized is not in the list, you need to follow the [regular training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md) method to get the trained model.
## 2. Quick Start
Go to PaddleClas root directory
```shell
cd PaddleClas
```
Related code for `slim` training has been integrated under `ppcls/engine/`, and the offline quantization code can be found in `deploy/slim/quant_post_static.py`.
### 2.1 Model Quantization
Quantization training includes offline and online training. Online quantitative training, the more effective one, requires loading a pre-trained model, which can be quantized after defining the strategy.
#### 2.1.1 Online Quantization Training
Try the following command:
- CPU/Single GPU
Take CPU for example, if you use GPU, change the `cpu` to `gpu`.
The parsing of the `yaml` file is described in [reference document](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/config_description.md). For accuracy, the `pretrained model` has already been adopted by the `yaml` file.
- Launch in single-machine multi-card/ multi-machine multi-card mode
**Note**: Currently, the `inference model` exported from the trained model is a must for offline quantization. See the [tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/export_model .md) for general export of the `inference model`.
Normally, offline quantization may lose more accuracy.
After generating the `inference model`, the offline quantization is run as follows:
The `inference model` is stored in`Global.save_inference_dir`.
Successfully executed, the `quant_post_static_model` folder is created in the `Global.save_inference_dir`, where the generated offline quantization models are stored and can be deployed directly without re-exporting the models.
### 2.2 Model Pruning
Trying the following command:
- CPU/Single GPU
Take CPU for example, if you use GPU, change the `cpu` to `gpu`.
- Launch in single-machine single-card/ single-machine multi-card/ multi machine multi-card mode
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch \
--gpus="0,1,2,3"\
tools/train.py \
-c ppcls/configs/slim/ResNet50_vd_prune.yaml
```
## 3. Export the Model
Having obtained the saved model after online quantization training and pruning, it can be exported as an inference model for inference deployment. Here we take model pruning as an example:
The exported model can be deployed directly using inference, please refer to [inference deployment](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_ deployment).
You can also use PaddleLite's opt tool to convert the inference model to a mobile model for its mobile deployment. Please refer to [Mobile Model Deployment](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_lite_deploy.md) for more details.
## 5. Hyperparameter Training
- For quantization and pruning training, it is recommended to load the pre-trained model obtained from conventional training to accelerate the convergence of quantization training.
- For quantization training, it is recommended to modify the initial learning rate to `1/20~1/10` of the conventional training and the number of training epochs to `1/5~1/2`, while adding Warmup to the learning rate strategy. Please make no other modifications to the configuration information.
- For pruning training, the hyperparameter configuration is recommended to remain the same as the regular training.
Deep learning limits the deployment of corresponding models in some scenarios and devices due to its computational complexity or parameter redundancy, thus requiring model compression, optimization acceleration. Model compression algorithms can effectively decrease parameter redundancy, thus reducing storage footprint, communication bandwidth, and computational complexity, which are conducive to the application and deployment of models of deep learning. Among them, model quantization and pruning enjoy great popularity. In PaddleClas, the following two algorithms should be mainly applied.
- Quantization: PACT
- Pruning: FPGM
See [PaddeSlim](https://github.com/PaddlePaddle/PaddleSlim/) for detailed parameters.
The model quantization comprises two main parts, the quantization of the Weight and the Activation. Simultaneous quantization of the two parts is necessary to maximize the computational efficiency gain. The weight can be distributed as compactly as possible by means of network regularization to reduce outliers and uneven distribution, while there is a lack of effective means for activation.
**PACT (PArameterized Clipping acTivation)** is a new quantization method that minimizes the loss of accuracy, or even achieves great accuracy by removing some outliers before the quantization of activation. The method was proposed when the author found that "the quantized activation differed significantly from the full accuracy results when the weight quantization is adopted". The author also found that the quantization of activation can cause a great error (as a result of RELU, the range of activation is infinite compared to the weight which is basically within 0 to 1), so the activation function **clipped RELU** was introduced. The clipping ceiling, i.e., $α$, is a learnable parameter, which ensures that each layer can learn a different quantization range through training and minimizes the rounding error caused by quantization. The schematic diagram of quantization is shown below. **PACT** solves the problem by continuously trimming the activation range so that the activation distribution is narrowed, thus reducing the quantization mapping loss. It can acquire a more reasonable quantization scale and cut the quantization loss by clipping the activation, thus reducing the outliers in the activation distribution.
It is shown that PACT is about adopting the above quantization as a substitute for the *ReLU* function to clip the part greater than zero with a threshold of $a$. However, the above formula is further improved in *PaddleSlim* as follows:
After the above improvement, *PACT* preprocessing is inserted between the activation and the OP (convolution, full connection, etc.) to be quantized, which not only clips the distribution greater than 0 but also perform the same for the part less than 0, so as to better obtain the range to be quantized and minimize the quantization loss. At the same time, the clipping threshold is a trainable parameter, which can be detected automatically and reasonably by the model during the quantization training, thus further lowering the quantization accuracy loss.
For specific algorithm parameters, please refer to [Introduction to Parameters](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0.0/docs/zh_cn/api_cn/dygraph/quanter/qat.rst#qat) in PaddleSlim.
## FPGM
Model pruning is an essential practice to reduce the model size and improve inference efficiency. In previous articles on network pruning, the norm of the network filter is generally adopted to measure its importance, **the smaller the norm value, the less important the filter is** and the more significant it will be to clip it from the network. **FPGM** believes that the previous approach relies on the following two points:
- The deviation of the filter's norm should be large so that important and unimportant filters can be well separated
- The norm of the unimportant filter should be small enough
Based on this, **FPGM** takes advantage of the geometric center property of the filter. Since filters near the center can be expressed by others, they can be eliminated, thus avoiding the above two pruning conditions. As a result, the pruning is conducted in consideration of the redundancy of information instead of a small norm. The following figure shows how the **FPGM** differs from the previous method, see [paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/He_Filter_Pruning_via_Geometric_Median_) for more details.
For specific algorithm parameters, please refer to [Introduction to Parameters](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0.0/docs/zh_cn/api_cn/dygraph/pruners/fpgm_filter_pruner.rst#fpgmfilterpruner) in PaddleSlim.