Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.
This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model.
This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model.
PaddleSlim (GitHub: https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry.
[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry.
It is recommended that you could understand following pages before reading this example,:
\-[The training strategy of OCR model](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
PaddleOCR also provides a series of (models)[../../../doc/doc_en/models_list_en.md]. Developers can choose their own models or use their own models according to their needs.
[Download link of Detection pretrain model]()
### 3. Pruning sensitivity analysis
## Pruning sensitivity analysis
After the pre-training model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sensitivities_0.data. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
After the pre-training model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sensitivities_0.data. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
When pruning, the previous sensitivity analysis file would determines the pruning ratio of each network layer. In the specific implementation, in order to retain as many low-level features extracted from the image as possible, we skipped the 4 convolutional layers close to the input in the backbone. Similarly, in order to reduce the model performance loss caused by pruning, we selected some of the less redundant and more sensitive [network layer](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41) through the sensitivity table obtained from the previous sensitivity analysis.And choose to skip these network layers in the subsequent pruning process. After pruning, the model need a finetune process to recover the performance and the training strategy of finetune is similar to the strategy of training original OCR detection model.
When pruning, the previous sensitivity analysis file would determines the pruning ratio of each network layer. In the specific implementation, in order to retain as many low-level features extracted from the image as possible, we skipped the 4 convolutional layers close to the input in the backbone. Similarly, in order to reduce the model performance loss caused by pruning, we selected some of the less redundant and more sensitive [network layer](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41) through the sensitivity table obtained from the previous sensitivity analysis.And choose to skip these network layers in the subsequent pruning process. After pruning, the model need a finetune process to recover the performance and the training strategy of finetune is similar to the strategy of training original OCR detection model.
\> PaddleSlim 1.2.0 or higher version should be installed before runing this example.
# Model compress tutorial (Quantization)
Compress results:
<table>
<thead>
<tr>
<th>ID</th>
<th>Task</th>
<th>Model</th>
<th>Compress Strategy</th>
<th>Criterion(Chinese dataset)</th>
<th>Inference Time(ms)</th>
<th>Inference Time(Total model)(ms)</th>
<th>Acceleration Ratio</th>
<th>Model Size(MB)</th>
<th>Commpress Ratio</th>
<th>Download Link</th>
</tr>
</thead>
<tbody>
<tr>
<tdrowspan="2">0</td>
<td>Detection</td>
<td>MobileNetV3_DB</td>
<td>None</td>
<td>61.7</td>
<td>224</td>
<tdrowspan="2">375</td>
<tdrowspan="2">-</td>
<tdrowspan="2">8.6</td>
<tdrowspan="2">-</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>MobileNetV3_CRNN</td>
<td>None</td>
<td>62.0</td>
<td>9.52</td>
<td></td>
</tr>
<tr>
<tdrowspan="2">1</td>
<td>Detection</td>
<td>SlimTextDet</td>
<td>PACT Quant Aware Training</td>
<td>62.1</td>
<td>195</td>
<tdrowspan="2">348</td>
<tdrowspan="2">8%</td>
<tdrowspan="2">2.8</td>
<tdrowspan="2">67.82%</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>SlimTextRec</td>
<td>PACT Quant Aware Training</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
<tr>
<tdrowspan="2">2</td>
<td>Detection</td>
<td>SlimTextDet_quat_pruning</td>
<td>Pruning+PACT Quant Aware Training</td>
<td>60.86</td>
<td>142</td>
<tdrowspan="2">288</td>
<tdrowspan="2">30%</td>
<tdrowspan="2">2.8</td>
<tdrowspan="2">67.82%</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>SlimTextRec</td>
<td>PPACT Quant Aware Training</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
<tr>
<tdrowspan="2">3</td>
<td>Detection</td>
<td>SlimTextDet_pruning</td>
<td>Pruning</td>
<td>61.57</td>
<td>138</td>
<tdrowspan="2">295</td>
<tdrowspan="2">27%</td>
<tdrowspan="2">2.9</td>
<tdrowspan="2">66.28%</td>
<td></td>
</tr>
<tr>
<td>Recognition</td>
<td>SlimTextRec</td>
<td>PACT Quant Aware Training</td>
<td>61.48</td>
<td>8.6</td>
<td></td>
</tr>
</tbody>
</table>
## Overview
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Quantization is a technique that reduces this redundancyby reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance.
This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model.
## Introduction
PaddleSlim (GitHub: https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry.
It is recommended that you could understand following pages before reading this example,:
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
so as to reduce model calculation complexity and improve model inference performance.
This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the OCR model.
-[The training strategy of OCR model](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
It is recommended that you could understand following pages before reading this example:
-[The training strategy of OCR model](../../../doc/doc_en/quickstart_en.md)
Quantization is mostly suitable for the deployment of lightweight models on mobile terminals.
After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps.
PaddleOCR provides a series of trained [models](../../../doc/doc_en/models_list_en.md).
If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../doc/doc_en/quickstart_en.md) method to get the trained model.
[Download link of Detection pretrain model]()
[Download link of recognization pretrain model]()
### 3. Quant-Aware Training
Quantization training includes offline quantization training and online quantization training.
Online quantization training is more effective. It is necessary to load the pre-training model.
After the quantization strategy is defined, the model can be quantified.
The code for quantization training is located in `slim/quantization/quant/py`. For example, to train a detection model, the training instructions are as follows:
After loading the pre training model, the model can be quantified after defining the quantization strategy. For specific details of quantization method, see:[Model Quantization](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/quantization_api.html)