README_en.md 3.2 KB
Newer Older
B
baiyfbupt 已提交
1

L
LDOUBLEV 已提交
2
## Introduction
B
baiyfbupt 已提交
3

L
LDOUBLEV 已提交
4 5 6
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
so as to reduce model calculation complexity and improve model inference performance.
B
baiyfbupt 已提交
7

L
LDOUBLEV 已提交
8
This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model.
B
baiyfbupt 已提交
9

L
LDOUBLEV 已提交
10 11
It is recommended that you could understand following pages before reading this example:
- [The training strategy of OCR model](../../../doc/doc_en/quickstart_en.md)
B
baiyfbupt 已提交
12 13
- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)

L
LDOUBLEV 已提交
14 15 16 17 18 19 20 21 22
## Quick Start
Quantization is mostly suitable for the deployment of lightweight models on mobile terminals.
After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps.

1. Install PaddleSlim
2. Prepare trained model
3. Quantization-Aware Training
4. Export inference model
5. Deploy quantization inference model
B
baiyfbupt 已提交
23 24


L
LDOUBLEV 已提交
25
### 1. Install PaddleSlim
B
baiyfbupt 已提交
26 27 28 29 30 31 32 33

```bash
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd Paddleslim
python setup.py install
```


L
LDOUBLEV 已提交
34 35
###2. Download Pretrain Model
PaddleOCR provides a series of trained [models](../../../doc/doc_en/models_list_en.md).
L
LDOUBLEV 已提交
36
If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../doc/doc_en/quickstart_en.md) method to get the trained model.
B
baiyfbupt 已提交
37 38


L
LDOUBLEV 已提交
39 40 41 42
### 3. Quant-Aware Training
Quantization training includes offline quantization training and online quantization training.
Online quantization training is more effective. It is necessary to load the pre-training model.
After the quantization strategy is defined, the model can be quantified.
B
baiyfbupt 已提交
43

L
LDOUBLEV 已提交
44 45 46
The code for quantization training is located in `slim/quantization/quant/py`. For example, to train a detection model, the training instructions are as follows:
```bash
python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights='your trained model'   Global.save_model_dir=./output/quant_model
B
baiyfbupt 已提交
47

L
LDOUBLEV 已提交
48 49 50 51
# download provided model
wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar
tar xf ch_ppocr_mobile_v1.1_det_train.tar
python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./ch_ppocr_mobile_v1.1_det_train/best_accuracy   Global.save_model_dir=./output/quant_model
B
baiyfbupt 已提交
52 53 54 55

```


L
LDOUBLEV 已提交
56
### 4. Export inference model
B
baiyfbupt 已提交
57 58 59 60 61 62

After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment:

```bash
python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model
```
L
LDOUBLEV 已提交
63 64 65 66 67 68

### 5. Deploy
The numerical range of the quantized model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8.
The derived model can be converted through the `opt tool` of PaddleLite.

For quantitative model deployment, please refer to [Mobile terminal model deployment](../lite/readme_en.md)