This tutorial details the steps to deploy PP-ShiTU on the server side.
## Catalogue
-[1.Prepare the Environment](#1)
-[1.1 Update cmake](#1.1)
-[1.2 Compile opencv Library](#1.2)
-[1.3 Download or Compile Paddle Inference Library](#1.3)
-[1.3.1 Compile the Source of Inference Library](#1.3.1)
-[1.3.2 Direct Download and Installation](#1.3.2)
-[1.4 Install faiss Library](#1.4)
-[2.Code Compilation](#2)
-[3.Run the demo](#3)
-[4.Use Your Own Model](#4)
<aname="1"></a>
## 1. Prepare the Environment
### Environment Preparation
- Linux environment, ubuntu docker recommended.
<aname="1.1"></a>
### 1.1 Update cmake
The first step is to upgrade `cmake` considering the requirements of the dependency library compilation.
...
...
@@ -48,6 +65,8 @@ cmake --version
cmake is now ready for use.
<aname="1.2"></a>
### 1.2 Compile opencv Library
- First, download the package for source compilation in Linux environment from the official website of opencv. Taking version 3.4.7 as an example, follow the command below to download and unzip it:
...
...
@@ -108,10 +127,14 @@ opencv3/
|-- share
```
<aname="1.3"></a>
### 1.3 Download or Compile Paddle Inference Library
- Here we detail 2 ways to obtain Paddle inference library.
<aname="1.3.1"></a>
#### 1.3.1 Compile the Source of Inference Library
- To obtain the latest features of the inference library, you can clone the latest code from Paddle github and compile the source code of the library.
`paddle` is the Paddle library needed for later C++ inference, and `version.txt` contains the version information of the current inference library.
<aname="1.3.2"></a>
#### 1.3.2 Direct Download and Installation
- The Linux inference library of different cuda versions are available on the official website of [Paddle Inference Library ](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html), where you can choose the appropriate version. Note that you must select the `develop` version.
...
...
@@ -169,6 +194,8 @@ tar -xvf paddle_inference.tgz
The subfolder `paddle_inference/` will finally be created in the current folder.
Note that this tutorial installs the cpu version of faiss as an example, please install it as your need by referring to the official documents of [faiss](https://github.com/facebookresearch/faiss).
## 2 Code Compilation
<aname="2"></a>
### 2.2 Compile the C++ Inference Demo of PP-ShiTu
## 2. Code Compilation
The command is as follows, where the address of Paddle C++ inference library, opencv and other dependency libraries need to be replaced with the actual address on your own machine. Also, you need to download and compile `yaml-cpp` and other C++ libraries during the compilation, so please keep the network unblocked.
...
...
@@ -241,7 +268,9 @@ In the above commands:
A `build` folder will be created in the current path after the compilation, which generates an executable file named `pp_shitu`.
## 3 Run the demo
<aname="3"></a>
## 3. Run the demo
- Please refer to the [Quick Start of Recognition](../../docs/en/quick_start/quick_start_recognition_en.md), download the corresponding Lightweight Generic Mainbody Detection Model, Lightweight Generic Recognition Model, and the beverage test data and unzip them.
...
...
@@ -298,7 +327,9 @@ A `build` folder will be created in the current path after the compilation, whic
You can also use your self-trained models. Please refer to [model export](../../docs/en/inference_deployment/export_model_en.md) to export ` inference model` for model inference.
@@ -20,15 +20,15 @@ The model quantization comprises two main parts, the quantization of the Weight
**PACT (PArameterized Clipping acTivation)** is a new quantization method that minimizes the loss of accuracy, or even achieves great accuracy by removing some outliers before the quantization of activation. The method was proposed when the author found that "the quantized activation differed significantly from the full accuracy results when the weight quantization is adopted". The author also found that the quantization of activation can cause a great error (as a result of RELU, the range of activation is infinite compared to the weight which is basically within 0 to 1), so the activation function **clipped RELU** was introduced. The clipping ceiling, i.e., $α$, is a learnable parameter, which ensures that each layer can learn a different quantization range through training and minimizes the rounding error caused by quantization. The schematic diagram of quantization is shown below. **PACT** solves the problem by continuously trimming the activation range so that the activation distribution is narrowed, thus reducing the quantization mapping loss. It can acquire a more reasonable quantization scale and cut the quantization loss by clipping the activation, thus reducing the outliers in the activation distribution.
It is shown that PACT is about adopting the above quantization as a substitute for the *ReLU* function to clip the part greater than zero with a threshold of $a$. However, the above formula is further improved in *PaddleSlim* as follows:
@@ -47,6 +47,6 @@ Model pruning is an essential practice to reduce the model size and improve infe
Based on this, **FPGM** takes advantage of the geometric center property of the filter. Since filters near the center can be expressed by others, they can be eliminated, thus avoiding the above two pruning conditions. As a result, the pruning is conducted in consideration of the redundancy of information instead of a small norm. The following figure shows how the **FPGM** differs from the previous method, see [paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/He_Filter_Pruning_via_Geometric_Median_) for more details.
For specific algorithm parameters, please refer to [Introduction to Parameters](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0.0/docs/zh_cn/api_cn/dygraph/pruners/fpgm_filter_pruner.rst#fpgmfilterpruner) in PaddleSlim.