Merge pull request #751 from yukavio/develop

Improve the Chinese document of pruning and add English doc of pruning

Merge pull request #751 from yukavio/develop
Improve the Chinese document of pruning and add English doc of pruning
f1c4f413 · Double_V · GitHub · aafa88db · 9958fdde · aafa88db
4 changed file
--- a/deploy/slim/prune/README.md
+++ b/deploy/slim/prune/README.md
-> 运行示例前请先安装develop版本PaddleSlim
-# 模型裁剪压缩教程
-## 概述
-该示例使用PaddleSlim提供的[裁剪压缩API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/)对OCR模型进行压缩。
-在阅读该示例前，建议您先了解以下内容：
- [OCR模型的常规训练方法](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
- [PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)
-## 安装PaddleSlim
-可按照[PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)中的步骤安装PaddleSlim。
-## 敏感度分析训练
-进入PaddleOCR根目录，通过以下命令对模型进行敏感度分析：
-```bash
-python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
-```
-## 裁剪模型与fine-tune
-```bash
-python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
-```
-## 评估并导出
-在得到裁剪训练保存的模型后，我们可以将其导出为inference_model，用于预测部署：
-```bash
-python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model
-```
--- a/deploy/slim/prune/README_ch.md
+++ b/deploy/slim/prune/README_ch.md
+\> 运行示例前请先安装develop版本PaddleSlim
+# 模型裁剪压缩教程
+压缩结果：
+<table>
+<thead>
+  <tr>
+    <th>序号</th>
+    <th>任务</th>
+    <th>模型</th>
+    <th>压缩策略<sup><a href="#quant">[3]</a><a href="#prune">[4]</a><sup></th>
+    <th>精度(自建中文数据集)</th>
+    <th>耗时<sup><a href="#latency">[1]</a></sup>(ms)</th>
+    <th>整体耗时<sup><a href="#rec">[2]</a></sup>(ms)</th>
+    <th>加速比</th>
+    <th>整体模型大小(M)</th>
+    <th>压缩比例</th>
+    <th>下载链接</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td rowspan="2">0</td>
+    <td>检测</td>
+    <td>MobileNetV3_DB</td>
+    <td>无</td>
+    <td>61.7</td>
+    <td>224</td>
+    <td rowspan="2">375</td>
+    <td rowspan="2">-</td>
+    <td rowspan="2">8.6</td>
+    <td rowspan="2">-</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>识别</td>
+    <td>MobileNetV3_CRNN</td>
+    <td>无</td>
+    <td>62.0</td>
+    <td>9.52</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="2">1</td>
+    <td>检测</td>
+    <td>SlimTextDet</td>
+    <td>PACT量化训练</td>
+    <td>62.1</td>
+    <td>195</td>
+    <td rowspan="2">348</td>
+    <td rowspan="2">8%</td>
+    <td rowspan="2">2.8</td>
+    <td rowspan="2">67.82%</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>识别</td>
+    <td>SlimTextRec</td>
+    <td>PACT量化训练</td>
+    <td>61.48</td>
+    <td>8.6</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="2">2</td>
+    <td>检测</td>
+    <td>SlimTextDet_quat_pruning</td>
+    <td>剪裁+PACT量化训练</td>
+    <td>60.86</td>
+    <td>142</td>
+    <td rowspan="2">288</td>
+    <td rowspan="2">30%</td>
+    <td rowspan="2">2.8</td>
+    <td rowspan="2">67.82%</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>识别</td>
+    <td>SlimTextRec</td>
+    <td>PACT量化训练</td>
+    <td>61.48</td>
+    <td>8.6</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="2">3</td>
+    <td>检测</td>
+    <td>SlimTextDet_pruning</td>
+    <td>剪裁</td>
+    <td>61.57</td>
+    <td>138</td>
+    <td rowspan="2">295</td>
+    <td rowspan="2">27%</td>
+    <td rowspan="2">2.9</td>
+    <td rowspan="2">66.28%</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>识别</td>
+    <td>SlimTextRec</td>
+    <td>PACT量化训练</td>
+    <td>61.48</td>
+    <td>8.6</td>
+    <td></td>
+  </tr>
+</tbody>
+</table>
+## 概述
+复杂的模型有利于提高模型的性能，但也导致模型中存在一定冗余，模型裁剪通过移出网络模型中的子模型来减少这种冗余，达到减少模型计算复杂度，提高模型推理性能的目的。
+该示例使用PaddleSlim提供的[裁剪压缩API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/)对OCR模型进行压缩。
+在阅读该示例前，建议您先了解以下内容：
+\- [OCR模型的常规训练方法](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
+\- [PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)
+## 安装PaddleSlim
+\```bash
+git clone https://github.com/PaddlePaddle/PaddleSlim.git
+cd Paddleslim
+python setup.py install
+\```
+## 获取预训练模型
+[检测预训练模型下载地址]()
+## 敏感度分析训练
+  加载预训练模型后，通过对现有模型的每个网络层进行敏感度分析，了解各网络层冗余度，从而决定每个网络层的裁剪比例。敏感度分析的具体细节见：[敏感度分析](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
+进入PaddleOCR根目录，通过以下命令对模型进行敏感度分析：
+\```bash
+python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
+\```
+## 裁剪模型与fine-tune
+  裁剪时通过之前的敏感度分析文件决定每个网络层的裁剪比例。在具体实现时，为了尽可能多的保留从图像中提取的低阶特征，我们跳过了backbone中靠近输入的4个卷积层。同样，为了减少由于裁剪导致的模型性能损失，我们通过之前敏感度分析所获得的敏感度表，挑选出了一些冗余较少，对裁剪较为敏感的[网络层](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41)，并在之后的裁剪过程中选择避开这些网络层。裁剪过后finetune的过程沿用OCR检测模型原始的训练策略。
+\```bash
+python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
+\```
+## 导出模型
+在得到裁剪训练保存的模型后，我们可以将其导出为inference_model，用于预测部署：
+\```bash
+python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model
+\```
--- a/deploy/slim/prune/README_en.md
+++ b/deploy/slim/prune/README_en.md
+\> PaddleSlim develop version should be installed before runing this example.
+# Model compress tutorial (Pruning)
+Compress results：
+<table>
+<thead>
+  <tr>
+    <th>ID</th>
+    <th>Task</th>
+    <th>Model</th>
+    <th>Compress Strategy<sup><a href="#quant">[3]</a><a href="#prune">[4]</a><sup></th>
+    <th>Criterion(Chinese dataset)</th>
+    <th>Inference Time<sup><a href="#latency">[1]</a></sup>(ms)</th>
+    <th>Inference Time(Total model)<sup><a href="#rec">[2]</a></sup>(ms)</th>
+    <th>Acceleration Ratio</th>
+    <th>Model Size(MB)</th>
+    <th>Commpress Ratio</th>
+    <th>Download Link</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td rowspan="2">0</td>
+    <td>Detection</td>
+    <td>MobileNetV3_DB</td>
+    <td>None</td>
+    <td>61.7</td>
+    <td>224</td>
+    <td rowspan="2">375</td>
+    <td rowspan="2">-</td>
+    <td rowspan="2">8.6</td>
+    <td rowspan="2">-</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>Recognition</td>
+    <td>MobileNetV3_CRNN</td>
+    <td>None</td>
+    <td>62.0</td>
+    <td>9.52</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="2">1</td>
+    <td>Detection</td>
+    <td>SlimTextDet</td>
+    <td>PACT Quant Aware Training</td>
+    <td>62.1</td>
+    <td>195</td>
+    <td rowspan="2">348</td>
+    <td rowspan="2">8%</td>
+    <td rowspan="2">2.8</td>
+    <td rowspan="2">67.82%</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>Recognition</td>
+    <td>SlimTextRec</td>
+    <td>PACT Quant Aware Training</td>
+    <td>61.48</td>
+    <td>8.6</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="2">2</td>
+    <td>Detection</td>
+    <td>SlimTextDet_quat_pruning</td>
+    <td>Pruning+PACT Quant Aware Training</td>
+    <td>60.86</td>
+    <td>142</td>
+    <td rowspan="2">288</td>
+    <td rowspan="2">30%</td>
+    <td rowspan="2">2.8</td>
+    <td rowspan="2">67.82%</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>Recognition</td>
+    <td>SlimTextRec</td>
+    <td>PPACT Quant Aware Training</td>
+    <td>61.48</td>
+    <td>8.6</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="2">3</td>
+    <td>Detection</td>
+    <td>SlimTextDet_pruning</td>
+    <td>Pruning</td>
+    <td>61.57</td>
+    <td>138</td>
+    <td rowspan="2">295</td>
+    <td rowspan="2">27%</td>
+    <td rowspan="2">2.9</td>
+    <td rowspan="2">66.28%</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>Recognition</td>
+    <td>SlimTextRec</td>
+    <td>PACT Quant Aware Training</td>
+    <td>61.48</td>
+    <td>8.6</td>
+    <td></td>
+  </tr>
+</tbody>
+</table>
+## Overview
+Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. Model Pruning is a technique that reduces this redundancy by removing the sub-models in the neural network model, so as to reduce model calculation complexity and improve model inference performance.
+This example uses PaddleSlim provided[APIs of Pruning](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) to compress the OCR model.
+It is recommended that you could understand following pages before reading this example,：
+\- [The training strategy of OCR model](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/detection.md)
+\- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/)
+## Install PaddleSlim
+\```bash
+git clone https://github.com/PaddlePaddle/PaddleSlim.git
+cd Paddleslim
+python setup.py install
+\```
+## Download Pretrain Model
+[Download link of Detection pretrain model]()
+## Pruning sensitivity analysis
+  After the pre-training model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, thereby determining the pruning ratio of each network layer. For specific details of sensitivity analysis, see：[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
+Enter the PaddleOCR root directory，perform sensitivity analysis on the model with the following command：
+\```bash
+python deploy/slim/prune/sensitivity_anal.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
+\```
+## Model pruning and Fine-tune
+  When pruning, the previous sensitivity analysis file would determines the pruning ratio of each network layer. In the specific implementation, in order to retain as many low-level features extracted from the image as possible, we skipped the 4 convolutional layers close to the input in the backbone. Similarly, in order to reduce the model performance loss caused by pruning, we selected some of the less redundant and more sensitive [network layer](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/pruning_and_finetune.py#L41) through the sensitivity table obtained from the previous sensitivity analysis.And choose to skip these network layers in the subsequent pruning process. After pruning, the model need a finetune process to recover the performance and the training strategy of finetune is similar to the strategy of training original OCR detection model.
+\```bash
+python deploy/slim/prune/pruning_and_finetune.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./deploy/slim/prune/pretrain_models/det_mv3_db/best_accuracy Global.test_batch_size_per_card=1
+\```
+## Export inference model
+After getting the model after pruning and finetuning we, can export it as inference_model for predictive deployment:
+\```bash
+python deploy/slim/prune/export_prune_model.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./output/det_db/best_accuracy Global.test_batch_size_per_card=1 Global.save_inference_dir=inference_model
+\```
--- a/deploy/slim/quantization/README.md
+++ b/deploy/slim/quantization/README.md
@@ -25,7 +25,7 @@ python deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o Global
-## 评估并导出
+## 导出模型
 在得到量化训练保存的模型后，我们可以将其导出为inference_model，用于预测部署：