README_en.md 7.5 KB
Newer Older
Y
yukavio 已提交
1 2 3 4
\> PaddleSlim develop version should be installed before runing this example.



Y
yukavio 已提交
5
# Model compress tutorial (Pruning)
Y
yukavio 已提交
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117

Compress results:
<table>
<thead>
  <tr>
    <th>ID</th>
    <th>Task</th>
    <th>Model</th>
    <th>Compress Strategy<sup><a href="#quant">[3]</a><a href="#prune">[4]</a><sup></th>
    <th>Criterion(Chinese dataset)</th>
    <th>Inference Time<sup><a href="#latency">[1]</a></sup>(ms)</th>
    <th>Inference Time(Total model)<sup><a href="#rec">[2]</a></sup>(ms)</th>
    <th>Acceleration Ratio</th>
    <th>Model Size(MB)</th>
    <th>Commpress Ratio</th>
    <th>Download Link</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="2">0</td>
    <td>Detection</td>
    <td>MobileNetV3_DB</td>
    <td>None</td>
    <td>61.7</td>
    <td>224</td>
    <td rowspan="2">375</td>
    <td rowspan="2">-</td>
    <td rowspan="2">8.6</td>
    <td rowspan="2">-</td>
    <td></td>
  </tr>
  <tr>
    <td>Recognition</td>
    <td>MobileNetV3_CRNN</td>
    <td>None</td>
    <td>62.0</td>
    <td>9.52</td>
    <td></td>
  </tr>
  <tr>
    <td rowspan="2">1</td>
    <td>Detection</td>
    <td>SlimTextDet</td>
    <td>PACT Quant Aware Training</td>
    <td>62.1</td>
    <td>195</td>
    <td rowspan="2">348</td>
    <td rowspan="2">8%</td>
    <td rowspan="2">2.8</td>
    <td rowspan="2">67.82%</td>
    <td></td>
  </tr>
  <tr>
    <td>Recognition</td>
    <td>SlimTextRec</td>
    <td>PACT Quant Aware Training</td>
    <td>61.48</td>
    <td>8.6</td>
    <td></td>
  </tr>
  <tr>
    <td rowspan="2">2</td>
    <td>Detection</td>
    <td>SlimTextDet_quat_pruning</td>
    <td>Pruning+PACT Quant Aware Training</td>
    <td>60.86</td>
    <td>142</td>
    <td rowspan="2">288</td>
    <td rowspan="2">30%</td>
    <td rowspan="2">2.8</td>
    <td rowspan="2">67.82%</td>
    <td></td>
  </tr>
  <tr>
    <td>Recognition</td>
    <td>SlimTextRec</td>
    <td>PPACT Quant Aware Training</td>
    <td>61.48</td>
    <td>8.6</td>
    <td></td>
  </tr>
  <tr>
    <td rowspan="2">3</td>
    <td>Detection</td>
    <td>SlimTextDet_pruning</td>
    <td>Pruning</td>
    <td>61.57</td>
    <td>138</td>
    <td rowspan="2">295</td>
    <td rowspan="2">27%</td>
    <td rowspan="2">2.9</td>
    <td rowspan="2">66.28%</td>
    <td></td>
  </tr>
  <tr>
    <td>Recognition</td>
    <td>SlimTextRec</td>
    <td>PACT Quant Aware Training</td>
    <td>61.48</td>
    <td>8.6</td>
    <td></td>
  </tr>
</tbody>
</table>


## Overview

Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.

This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model.
Y
yukavio 已提交
118
PaddleSlim (GitHub: https://github.com/PaddlePaddle/PaddleSlim), an open source library which integrates model pruning, quantization (including quantization training and offline quantization), distillation, neural network architecture search, and many other commonly used and leading model compression technique in the industry.
Y
yukavio 已提交
119 120 121 122 123 124 125 126 127 128 129 130 131

It is recommended that you could understand following pages before reading this example,:



\- [The training strategy of OCR model](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)

\- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/)



## Install PaddleSlim

Y
yukavio 已提交
132
```bash
Y
yukavio 已提交
133 134 135 136 137 138 139

git clone https://github.com/PaddlePaddle/PaddleSlim.git

cd Paddleslim

python setup.py install

Y
yukavio 已提交
140
```
Y
yukavio 已提交
141 142 143 144 145 146 147 148 149


## Download Pretrain Model

[Download link of Detection pretrain model]()


## Pruning sensitivity analysis

Y
yukavio 已提交
150 151 152 153 154 155 156 157 158 159 160 161 162 163
  After the pre-training model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sensitivities_0.data.  After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
  The data format of sensitivity file:
      sensitivities_0.data(Dict){
              'layer_weight_name_0': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
              'layer_weight_name_1': sens_of_each_ratio(Dict){'pruning_ratio_0': acc_loss, 'pruning_ratio_1': acc_loss}
          }

      example:
          {
              'conv10_expand_weights': {0.1: 0.006509952684312718, 0.2: 0.01827734339798862, 0.3: 0.014528405644659832, 0.6: 0.06536008804270439, 0.8: 0.11798612250664964, 0.7: 0.12391408417493704, 0.4: 0.030615754498018757, 0.5: 0.047105205602406594}
              'conv10_linear_weights': {0.1: 0.05113190831455035, 0.2: 0.07705573833558801, 0.3: 0.12096721757739311, 0.6: 0.5135061352930738, 0.8: 0.7908166677143281, 0.7: 0.7272187676899062, 0.4: 0.1819252083008504, 0.5: 0.3728054727792405}
          }
  The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of correspoding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)

Y
yukavio 已提交
164 165 166

Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command:

Y
yukavio 已提交
167
```bash
Y
yukavio 已提交
168 169 170

python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1

Y
yukavio 已提交
171
```
Y
yukavio 已提交
172 173 174 175 176 177 178



## Model pruning and Fine-tune

  When pruning, the previous sensitivity analysis file would determines the pruning ratio of each network layer. In the specific implementation, in order to retain as many low-level features extracted from the image as possible, we skipped the 4 convolutional layers close to the input in the backbone. Similarly, in order to reduce the model performance loss caused by pruning, we selected some of the less redundant and more sensitive [network layer](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41) through the sensitivity table obtained from the previous sensitivity analysis.And choose to skip these network layers in the subsequent pruning process. After pruning, the model need a finetune process to recover the performance and the training strategy of finetune is similar to the strategy of training original OCR detection model.

Y
yukavio 已提交
179
```bash
Y
yukavio 已提交
180 181 182

python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1

Y
yukavio 已提交
183
```
Y
yukavio 已提交
184 185 186 187 188 189 190 191





## Export inference model

After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment:
Y
yukavio 已提交
192

Y
yukavio 已提交
193
```bash
Y
yukavio 已提交
194 195 196

python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model

Y
yukavio 已提交
197
```