# Model compress tutorial (Quantization) ## Introduction Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance. This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model. It is recommended that you could understand following pages before reading this example: - [The training strategy of OCR model](../../../doc/doc_en/quickstart_en.md) - [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) ## Quick Start Quantization is mostly suitable for the deployment of lightweight models on mobile terminals. After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps. 1. Install PaddleSlim 2. Prepare trained model 3. Quantization-Aware Training 4. Export inference model 5. Deploy quantization inference model ### 1. Install PaddleSlim ```bash git clone https://github.com/PaddlePaddle/PaddleSlim.git cd Paddleslim python setup.py install ``` ###2. Download Pretrain Model PaddleOCR provides a series of trained [models](../../../doc/doc_en/models_list_en.md). If the model to be quantified is not in the list, you need to follow the [Regular Training](../. ./../doc/doc_en/quickstart_en.md) method to get the trained model. ### 3. Quant-Aware Training Quantization training includes offline quantization training and online quantization training. Online quantization training is more effective. It is necessary to load the pre-training model. After the quantization strategy is defined, the model can be quantified. The code for quantization training is located in `slim/quantization/quant/py`. For example, to train a detection model, the training instructions are as follows: ```bash python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights='your trained model' Global.save_model_dir=./output/quant_model # download provided model wget https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar tar xf ch_ppocr_mobile_v1.1_det_train.tar python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_model_dir=./output/quant_model ``` ### 4. Export inference model After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment: ```bash python deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=output/quant_model/best_accuracy Global.save_model_dir=./output/quant_inference_model ``` ### 5. Deploy The numerical range of the quantized model parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8. The derived model can be converted through the `opt tool` of PaddleLite. For quantitative model deployment, please refer to [Mobile terminal model deployment](../lite/readme_en.md)