cherry pick some pr (#1295)

* fixed-docs (#1283) * [documentation] fix typos (#1287) * add YOLOv7 ACT example (#1291) Co-authored-by: N leiqing <54695910+leiqing1@users.noreply.github.com> Co-authored-by: N minghaoBD <79566150+minghaoBD@users.noreply.github.com>

cherry pick some pr (#1295)
* fixed-docs (#1283) * [documentation] fix typos (#1287) * add YOLOv7 ACT example (#1291) Co-authored-by: N leiqing <54695910+leiqing1@users.noreply.github.com> Co-authored-by: N minghaoBD <79566150+minghaoBD@users.noreply.github.com>
15c0b8b4 · Guanghua Yu · GitHub · dc427410 · 15c0b8b4 · 15c0b8b4
17 changed file
--- a/README.md
+++ b/README.md
@@ -20,7 +20,7 @@ PaddleSlim是一个专注于深度学习模型压缩的工具库，提供**低
    - 支持代码无感知压缩：用户只需提供推理模型文件和数据，既可进行离线量化（PTQ）、量化训练（QAT）、稀疏训练等压缩任务。
    - 支持自动策略选择，根据任务特点和部署环境特性：自动搜索合适的离线量化方法,自动搜索最佳的压缩策略组合方式。
    - 发布[自然语言处理](example/auto_compression/nlp)、[图像语义分割](example/auto_compression/semantic_segmentation)、[图像目标检测](example/auto_compression/detection)三个方向的自动化压缩示例。
-    - 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolov5)、[YOLOv6](example/auto_compression/pytorch_yolov6)、[HuggingFace](example/auto_compression/pytorch_huggingface)、[MobileNet](example/auto_compression/tensorflow_mobilenet)。
+    - 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolov5)、[YOLOv6](example/auto_compression/pytorch_yolov6)、[YOLOv7](example/auto_compression/pytorch_yolov7)、[HuggingFace](example/auto_compression/pytorch_huggingface)、[MobileNet](example/auto_compression/tensorflow_mobilenet)。
  - 升级量化功能

--- a/example/auto_compression/README.md
+++ b/example/auto_compression/README.md
-# 自动化压缩工具ACT（Auto Compression Toolkit）
+# 模型自动化压缩工具ACT（Auto Compression Toolkit）
-## 简介
+------------------------------------------------------------------------------------------
-PaddleSlim推出全新自动化压缩工具（ACT），旨在通过Source-Free的方式，自动对预测模型进行压缩，压缩后模型可直接部署应用。ACT自动化压缩工具主要特性如下：
- **『更便捷』**：开发者无需了解或修改模型源码，直接使用导出的预测模型进行压缩；
+<p align="center">
- **『更智能』**：开发者简单配置即可启动压缩，ACT工具会自动优化得到最好预测模型；
+    <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
- **『更丰富』**：ACT中提供了量化训练、蒸馏、结构化剪枝、非结构化剪枝、多种离线量化方法及超参搜索等等，可任意搭配使用。
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/Paddle?color=ffa"></a>
+    <a href=""><img src="https://img.shields.io/badge/python-3.6.2+-aff.svg"></a>
+    <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSlim?color=9ea"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSlim?color=3af"></a>
+    <a href="https://pypi.org/project/PaddleSlim/"><img src="https://img.shields.io/pypi/dm/PaddleSlim?color=9cf"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSlim?color=9cc"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSlim?color=ccf"></a>
+</p>
+<h4 align="center">
+  <a href=#特性> 特性 </a> |
+  <a href=#模型压缩效果Benchmark> Benchmark </a> |
+  <a href=#环境准备> 安装 </a> |
+  <a href=#快速开始> 快速开始 </a> |
+  <a href=#进阶使用> 进阶使用 </a> |
+  <a href=#社区交流> 社区交流 </a>
+</h4>
-## 环境准备
+## **简介**
- 安装PaddlePaddle >= 2.3 （从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
+PaddleSlim推出全新自动化压缩工具（Auto Compression Toolkit, ACT），旨在通过Source-Free的方式，自动对预测模型进行压缩，压缩后模型可直接部署应用。
- 安装PaddleSlim >=2.3
-（1）安装paddlepaddle：
+## **News** 📢
-```shell
-# CPU
-pip install paddlepaddle
-# GPU
-pip install paddlepaddle-gpu
-```
-（2）安装paddleslim：
+* 🎉 2022.7.6 [**PaddleSlim v2.3.0**](https://github.com/PaddlePaddle/PaddleSlim/releases/tag/v2.3.0)全新发布！目前已经在图像分类、目标检测、图像分割、NLP等20多个模型验证正向效果。
-```shell
+* 🔥 2022.7.14 晚 20:30，PaddleSlim自动压缩天使用户沟通会。与开发者共同探讨模型压缩痛点问题，欢迎大家扫码报名入群获取会议链接。
-pip install paddleslim
-```
+  <div align="center">
+  <img src="https://user-images.githubusercontent.com/54695910/178181077-57a3a631-f495-4821-878d-ef5e74981718.jpg"  width = "150" height = "150" />
+  </div>
+## **特性**
+- <a href=#解耦训练代码>  **🚀『解耦训练代码』** </a>：开发者无需了解或修改模型源码，直接使用导出的预测模型进行压缩；
+- <a href=#全流程自动优化> **🎛️『全流程自动优化』** </a>：开发者简单配置即可启动压缩，ACT工具会自动优化得到最好预测模型；
+- <a href=#支持丰富压缩算法> **📦『支持丰富压缩算法』** </a>：ACT中提供了量化训练、蒸馏、结构化剪枝、非结构化剪枝、多种离线量化方法及超参搜索等等，可任意搭配使用
+### **ACT核心思想**
+相比于传统手工压缩，自动化压缩的“自动”主要体现在4个方面：解耦训练代码、离线量化超参搜索、算法
+<p align="center">
+  <img src="https://user-images.githubusercontent.com/23690325/178102488-9f09e991-bfd6-4827-8641-849d9c3fa83c.png" align="middle"  width="800" />
+</p>
+### **模型压缩效果示例**
+ACT相比传统的模型压缩方法，
+- 代码量减少 50% 以上
+- 压缩精度与手工压缩基本持平。在 PP-YOLOE 模型上，效果还优于手动压缩，
+- 自动化压缩后的推理性能收益与手工压缩持平，相比压缩前，推理速度可以提升1.4~7.1倍。
+<p align="center">
+  <img src="https://user-images.githubusercontent.com/23690325/178102623-6de25af1-eec8-4825-bb15-4dad5bee7c9c.png" align="middle"  width="800" />
+</p>
+### **模型压缩效果Benchmark**
+<font size=5>  </font>
+<font size=0.5>
+| 模型类型                            | model name                   | 压缩前<br/>精度(Top1 Acc %) | 压缩后<br/>精度(Top1 Acc %) | 压缩前<br/>推理时延（ms） | 压缩后<br/>推理时延（ms） | 推理<br/>加速比 | 芯片                |
+| ------------------------------- | ---------------------------- | ---------------------- | ---------------------- | ---------------- | ---------------- | ---------- | ----------------- |
+| [图像分类](./image_classification)  | MobileNetV1                  | 70.90                  | 70.57                  | 33.15            | 13.64            | **2.43**   | SDM865（骁龙865）     |
+| [图像分类](./image_classification)  | ShuffleNetV2_x1_0            | 68.65                  | 68.32                  | 10.43            | 5.51             | **1.89**   | SDM865（骁龙865）     |
+| [图像分类](./image_classification)  | SqueezeNet1_0_infer          | 59.60                  | 59.45                  | 35.98            | 16.96            | **2.12**   | SDM865（骁龙865）     |
+| [图像分类](./image_classification)  | PPLCNetV2_base               | 76.86                  | 76.43                  | 36.50            | 15.79            | **2.31**   | SDM865（骁龙865）     |
+| [图像分类](./image_classification)  | ResNet50_vd                  | 79.12                  | 78.74                  | 3.19             | 0.92             | **3.47**   | NVIDIA Tesla T4   |
+| [语义分割](./semantic_segmentation) | PPHGNet_tiny                 | 79.59                  | 79.20                  | 2.82             | 0.98             | **2.88**   | NVIDIA Tesla T4   |
+| [语义分割](./semantic_segmentation) | PP-HumanSeg-Lite             | 92.87                  | 92.35                  | 56.36            | 37.71            | **1.49**   | SDM710            |
+| [语义分割](./semantic_segmentation) | PP-LiteSeg                   | 77.04                  | 76.93                  | 1.43             | 1.16             | **1.23**   | NVIDIA Tesla T4   |
+| [语义分割](./semantic_segmentation) | HRNet                        | 78.97                  | 78.90                  | 8.19             | 5.81             | **1.41**   | NVIDIA Tesla T4   |
+| [语义分割](./semantic_segmentation) | UNet                         | 65.00                  | 64.93                  | 15.29            | 10.23            | **1.49**   | NVIDIA Tesla T4   |
+| NLP                             | PP-MiniLM                    | 72.81                 | 72.44                 | 128.01           | 17.97            | **7.12**   | NVIDIA Tesla T4   |
+| NLP                             | ERNIE 3.0-Medium             | 73.09                 | 72.40                 | 29.25(fp16)      | 19.61            | **1.49**   | NVIDIA Tesla T4   |
+| [目标检测](./pytorch_yolov5)             | YOLOv5s<br/>(PyTorch)        | 37.40                  | 36.9                   | 5.95             | 1.87             | **3.18**   | NVIDIA Tesla T4   |
+| [目标检测](./pytorch_yolov6)             | YOLOv6s<br/>(PyTorch)        | 42.4                  | 41.3                   | 9.06             | 1.83             | **4.95**   | NVIDIA Tesla T4   |
+| [目标检测](./pytorch_yolov7)             | YOLOv7<br/>(PyTorch)        | 51.1                  | 50.8                   | 26.84             | 4.55             | **5.89**   | NVIDIA Tesla T4   |
+| [目标检测](./detection)             | PP-YOLOE-l                   | 50.9                   | 50.6                   | 11.2             | 6.7              | **1.67**   | NVIDIA Tesla V100 |
+| [图像分类](./image_classification)  | MobileNetV1<br/>(TensorFlow) | 71.0                   | 70.22                  | 30.45            | 15.86            |  **1.92**  | SDMM865（骁龙865）     |  
+- 备注：目标检测精度指标为mAP（0.5:0.95）精度测量结果。图像分割精度指标为IoU精度测量结果。
+- 更多飞桨模型应用示例及Benchmark可以参考：[图像分类](./image_classification)，[目标检测](./detection)，[语义分割](./semantic_segmentation)，[自然语言处理](./nlp)
+- 更多其它框架应用示例及Benchmark可以参考：[YOLOv5(PyTorch)](./pytorch_yolov5)，[YOLOv6(PyTorch)](./pytorch_yolov6)，[YOLOv7(PyTorch)](./pytorch_yolov7)，[HuggingFace(PyTorch)](./pytorch_huggingface)，[MobileNet(TensorFlow)](./tensorflow_mobilenet)。
+## **环境准备**
+- 安装PaddlePaddle >= 2.3.1：（可以参考[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
+  ```shell
+  # CPU
+  pip install paddlepaddle --upgrade
+  # GPU
+  pip install paddlepaddle-gpu --upgrade
+  ```
-## 快速上手
+- 安装PaddleSlim >=2.3.0：
- 1.准备模型及数据集
+  ```shell
+  pip install paddleslim
+  ```
+## **快速开始**
+- **1. 准备模型及数据集**
 ```shell
 # 下载MobileNet预测模型
@@ -38,7 +122,7 @@ wget https://sys-p0.bj.bcebos.com/slim_ci/ILSVRC2012_data_demo.tar.gz
 tar -xf ILSVRC2012_data_demo.tar.gz
 ```
- 2.运行
+- **2.运行自动化压缩**
 ```python
 # 导入依赖包
@@ -84,82 +168,83 @@ ac = AutoCompression(
 ac.compress()
 ```
- 3.测试精度
+- **3.精度测试**
-测试压缩前模型的精度:
+  - 测试压缩前模型的精度:
-```shell
-CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py
-### Eval Top1: 0.7171724759615384
-```
-测试量化模型的精度:
+    ```shell
-```shell
+    CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py
-CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py --model_dir='MobileNetV1_quant'
+    ### Eval Top1: 0.7171724759615384
-### Eval Top1: 0.7166466346153846
+    ```
-```
-量化后模型的精度相比量化前的模型几乎精度无损，由于是使用的超参搜索的方法来选择的量化参数，所以每次运行得到的量化模型精度会有些许波动。
+  - 测试量化模型的精度:
- 4.推理速度测试
+    ```shell
-量化模型速度的测试依赖推理库的支持，所以确保安装的是带有TensorRT的PaddlePaddle。以下示例和展示的测试结果是基于Tesla V100、CUDA 10.2、python3.7得到的。
+    CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py --model_dir='MobileNetV1_quant'
+    ### Eval Top1: 0.7166466346153846
+    ```
-使用以下指令查看本地cuda版本，并且在[下载链接](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)中下载对应cuda版本和对应python版本的paddlepaddle安装包。
+  - 量化后模型的精度相比量化前的模型几乎精度无损，由于是使用的超参搜索的方法来选择的量化参数，所以每次运行得到的量化模型精度会有些许波动。
-```shell
-cat /usr/local/cuda/version.txt ### CUDA Version 10.2.89
-### 10.2.89 为cuda版本号，可以根据这个版本号选择需要安装的带有TensorRT的PaddlePaddle安装包。
-```
-安装下载的whl包：
+- **4.推理速度测试**
-```
-### 这里通过wget下载到的是python3.7、cuda10.2的PaddlePaddle安装包，若您的环境和示例环境不同，请依赖您自己机器的环境下载对应的安装包，否则运行示例代码会报错。
-wget https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl
-pip install paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl --force-reinstall
-```
-测试FP32模型的速度
+  - 量化模型速度的测试依赖推理库的支持，所以确保安装的是带有TensorRT的PaddlePaddle。以下示例和展示的测试结果是基于Tesla V100、CUDA 10.2、python3.7得到的。
-```
-python ./image_classification/infer.py
-### using tensorrt FP32	batch size: 1 time(ms): 0.6140608787536621
-```
-测试FP16模型的速度
+  - 使用以下指令查看本地cuda版本，并且在[下载链接](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)中下载对应cuda版本和对应python版本的paddlepaddle安装包。
-```
-python ./image_classification/infer.py --use_fp16=True
-### using tensorrt FP16	batch size: 1 time(ms): 0.5795984268188477
-```
-测试INT8模型的速度
+    ```shell
-```
+    cat /usr/local/cuda/version.txt ### CUDA Version 10.2.89
-python ./image_classification/infer.py --model_dir=./MobileNetV1_quant/ --use_int8=True
+    ### 10.2.89 为cuda版本号，可以根据这个版本号选择需要安装的带有TensorRT的PaddlePaddle安装包。
-### using tensorrt INT8 batch size: 1 time(ms): 0.5213963985443115
+    ```
-```
+  - 安装下载的whl包：（这里通过wget下载到的是python3.7、cuda10.2的PaddlePaddle安装包，若您的环境和示例环境不同，请依赖您自己机器的环境下载对应的安装包，否则运行示例代码会报错。）
-**提示：**
+    ```
- DataLoader传入的数据集是待压缩模型所用的数据集，DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader，或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。
+    wget https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl
- 自动化压缩Config中定义量化、蒸馏、剪枝等压缩算法会合并执行，压缩策略有：量化+蒸馏，剪枝+蒸馏等等。示例中选择的配置为离线量化超参搜索。
+    pip install paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl --force-reinstall
- 如果要压缩的模型参数是存储在各自分离的文件中，需要先通过[convert.py](./convert.py) 脚本将其保存成一个单独的二进制文件。
+    ```
-## 应用示例
+  - 测试FP32模型的速度
-#### [图像分类](./image_classification)
+    ```
+    python ./image_classification/infer.py
+    ### using tensorrt FP32    batch size: 1 time(ms): 0.6140608787536621
+    ```
-#### [目标检测](./detection)
+  - 测试FP16模型的速度
-#### [语义分割](./semantic_segmentation)
+    ```
+    python ./image_classification/infer.py --use_fp16=True
+    ### using tensorrt FP16    batch size: 1 time(ms): 0.5795984268188477
+    ```
-#### [NLP](./nlp)
+  - 测试INT8模型的速度
-#### X2Paddle
+    ```
+    python ./image_classification/infer.py --model_dir=./MobileNetV1_quant/ --use_int8=True
+    ### using tensorrt INT8 batch size: 1 time(ms): 0.5213963985443115
+    ```
- [PyTorch YOLOv5](./pytorch_yolov5)
+  - **提示：**
- [HuggingFace](./pytorch_huggingface)
- [TensorFlow MobileNet](./tensorflow_mobilenet)
-#### 即将发布
+    - DataLoader传入的数据集是待压缩模型所用的数据集，DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader，或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。
- [ ] 更多自动化压缩应用示例
+    - 自动化压缩Config中定义量化、蒸馏、剪枝等压缩算法会合并执行，压缩策略有：量化+蒸馏，剪枝+蒸馏等等。示例中选择的配置为离线量化超参搜索。
+    - 如果要压缩的模型参数是存储在各自分离的文件中，需要先通过[convert.py](./convert.py) 脚本将其保存成一个单独的二进制文件。
-## 其他
+## 进阶使用
 - ACT可以自动处理常见的预测模型，如果有更特殊的改造需求，可以参考[ACT超参配置教程](./hyperparameter_tutorial.md)来进行单独配置压缩策略。
+## 社区交流
+- 微信扫描二维码并填写问卷之后，加入技术交流群
+  <div align="center">
+  <img src="https://user-images.githubusercontent.com/54695910/178181077-57a3a631-f495-4821-878d-ef5e74981718.jpg"  width = "150" height = "150" />
+  </div>
 - 如果你发现任何关于ACT自动化压缩工具的问题或者是建议, 欢迎通过[GitHub Issues](https://github.com/PaddlePaddle/PaddleSlim/issues)给我们提issues。同时欢迎贡献更多优秀模型，共建开源生态。
+## License
+本项目遵循[Apache-2.0开源协议](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/LICENSE)
--- a/example/auto_compression/pytorch_yolov6/README.md
+++ b/example/auto_compression/pytorch_yolov6/README.md
@@ -22,7 +22,7 @@
 | 模型  |  策略  | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |  配置文件 | Inference模型  |
 | :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
-| YOLOv6s |  Base模型 | 640*640  |  42.4   |   9.06ms  |   2.90ms   |  -  |  - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/detection/yolov6s_infer.tar) |
+| YOLOv6s |  Base模型 | 640*640  |  42.4   |   9.06ms  |   2.90ms   |  -  |  - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_infer.tar) |
 | YOLOv6s |  KL离线量化 | 640*640  |  30.3   |   - |   -   |  1.83ms  |  - | - |
 | YOLOv6s |  量化蒸馏训练 | 640*640  |  **41.3**   |   - |   -   |  **1.83ms**  |  [config](./configs/yolov6s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_quant.tar) |
@@ -83,7 +83,7 @@ pip install x2paddle sympy onnx
 x2paddle --framework=onnx --model=yolov6s.onnx --save_dir=pd_model
 cp -r pd_model/inference_model/ yolov6s_infer
 ```
-即可得到YOLOv6s模型的预测模型（`model.pdmodel` 和 `model.pdiparams`）。如想快速体验，可直接下载上方表格中YOLOv6s的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/detection/yolov6s_infer.tar)。
+即可得到YOLOv6s模型的预测模型（`model.pdmodel` 和 `model.pdiparams`）。如想快速体验，可直接下载上方表格中YOLOv6s的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_infer.tar)。
 预测模型的格式为：`model.pdmodel` 和 `model.pdiparams`两个，带`pdmodel`的是模型文件，带`pdiparams`后缀的是权重文件。

--- a/example/auto_compression/pytorch_yolov7/README.md
+++ b/example/auto_compression/pytorch_yolov7/README.md
+# YOLOv7自动压缩示例
+目录：
+- [1.简介](#1简介)
+- [2.Benchmark](#2Benchmark)
+- [3.开始自动压缩](#自动压缩流程)
+  - [3.1 环境准备](#31-准备环境)
+  - [3.2 准备数据集](#32-准备数据集)
+  - [3.3 准备预测模型](#33-准备预测模型)
+  - [3.4 测试模型精度](#34-测试模型精度)
+  - [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
+- [4.预测部署](#4预测部署)
+- [5.FAQ](5FAQ)
+## 1. 简介
+飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨（PaddlePaddle）的预测模型。借助X2Paddle的能力，各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。
+本示例将以[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)目标检测模型为例，将PyTorch框架模型转换为Paddle框架模型，再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。
+## 2.Benchmark
+| 模型  |  策略  | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |  配置文件 | Inference模型  |
+| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
+| YOLOv7 |  Base模型 | 640*640  |  51.1   |   26.84ms  |   7.44ms   |  -  |  - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_infer.tar) |
+| YOLOv7 |  KL离线量化 | 640*640  |  50.2   |   - |   -   |  4.55ms  |  - | - |
+| YOLOv7 |  量化蒸馏训练 | 640*640  |  **50.8**   |   - |   -   |  **4.55ms**  |  [config](./configs/yolov7_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar) |
+说明：
+- mAP的指标均在COCO val2017数据集中评测得到。
+- YOLOv7模型在Tesla T4的GPU环境下开启TensorRT 8.4.1，batch_size=1， 测试脚本是[cpp_infer](./cpp_infer)。
+## 3. 自动压缩流程
+#### 3.1 准备环境
+- PaddlePaddle >= 2.3 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
+- PaddleSlim > 2.3版本
+- PaddleDet >= 2.4
+- [X2Paddle](https://github.com/PaddlePaddle/X2Paddle) >= 1.3.6
+- opencv-python
+（1）安装paddlepaddle：
+```shell
+# CPU
+pip install paddlepaddle
+# GPU
+pip install paddlepaddle-gpu
+```
+（2）安装paddleslim：
+```shell
+pip install paddleslim
+```
+（3）安装paddledet：
+```shell
+pip install paddledet
+```
+注：安装PaddleDet的目的只是为了直接使用PaddleDetection中的Dataloader组件。
+（4）安装X2Paddle的1.3.6以上版本：
+```shell
+pip install x2paddle sympy onnx
+```
+#### 3.2 准备数据集
+本案例默认以COCO数据进行自动压缩实验，并且依赖PaddleDetection中数据读取模块，如果自定义COCO数据，或者其他格式数据，请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。
+如果已经准备好数据集，请直接修改[./configs/yolov7_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。
+#### 3.3 准备预测模型
+（1）准备ONNX模型：
+可通过[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)的导出脚本来准备ONNX模型，具体步骤如下：
+```shell
+git clone https://github.com/WongKinYiu/yolov7.git
+# 切换分支到u5分支，保持导出的ONNX模型后处理和YOLOv5一致
+git checkout u5
+# 下载好yolov7.pt权重后执行：
+python export.py --weights yolov7.pt --include onnx
+```
+也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx)。
+(2) 转换模型：
+```
+x2paddle --framework=onnx --model=yolov7.onnx --save_dir=pd_model
+cp -r pd_model/inference_model/ yolov7_infer
+```
+即可得到YOLOv7模型的预测模型（`model.pdmodel` 和 `model.pdiparams`）。如想快速体验，可直接下载上方表格中YOLOv7的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_infer.tar)。
+预测模型的格式为：`model.pdmodel` 和 `model.pdiparams`两个，带`pdmodel`的是模型文件，带`pdiparams`后缀的是权重文件。
+#### 3.4 自动压缩并产出模型
+蒸馏量化自动压缩示例通过run.py脚本启动，会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数，配置完成后便可对模型进行量化和蒸馏。具体运行命令为：
+- 单卡训练：
+```
+export CUDA_VISIBLE_DEVICES=0
+python run.py --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
+```
+- 多卡训练：
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
+          --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
+```
+#### 3.5 测试模型精度
+修改[yolov7_qat_dis.yaml](./configs/yolov7_qat_dis.yaml)中`model_dir`字段为模型存储路径，然后使用eval.py脚本得到模型的mAP：
+```
+export CUDA_VISIBLE_DEVICES=0
+python eval.py --config_path=./configs/yolov7_qat_dis.yaml
+```
+## 4.预测部署
+#### Paddle-TensorRT C++部署
+进入[cpp_infer](./cpp_infer)文件夹内，请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译，然后开始测试：
+```shell
+# 编译
+bash complie.sh
+# 执行
+./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
+```
+#### Paddle-TensorRT Python部署:
+首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)。
+然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署：
+```shell
+python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
+```
+## 5.FAQ
+- 如果想测试离线量化模型精度，可执行：
+```shell
+python post_quant.py --config_path=./configs/yolov7_qat_dis.yaml
+```
--- a/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
+++ b/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
+Global:
+  reader_config: configs/yolov7_reader.yaml
+  input_list: {'image': 'x2paddle_images'}
+  Evaluation: True
+  model_dir: ./yolov7_infer
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+Quantization:
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 8000
+  eval_iter: 1000
+  learning_rate: 
+    type: CosineAnnealingDecay 
+    learning_rate: 0.00003
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 0.00004
--- a/example/auto_compression/pytorch_yolov7/configs/yolov7_reader.yaml
+++ b/example/auto_compression/pytorch_yolov7/configs/yolov7_reader.yaml
+metric: COCO
+num_classes: 80
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco/
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+worker_num: 0
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: True}
+    - Pad: {size: [640, 640], fill_value: [114., 114., 114.]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/example/auto_compression/pytorch_yolov7/cpp_infer/CMakeLists.txt
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/CMakeLists.txt
+cmake_minimum_required(VERSION 3.0)
+project(cpp_inference_demo CXX C)
+option(WITH_MKL        "Compile demo with MKL/OpenBlas support, default use MKL."       ON)
+option(WITH_GPU        "Compile demo with GPU/CPU, default use CPU."                    OFF)
+option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static."   ON)
+option(USE_TENSORRT "Compile demo with TensorRT."   OFF)
+option(WITH_ROCM "Compile demo with rocm." OFF)
+option(WITH_ONNXRUNTIME "Compile demo with ONNXRuntime"  OFF)
+option(WITH_ARM "Compile demo with ARM" OFF)
+option(WITH_MIPS "Compile demo with MIPS" OFF)
+option(WITH_SW   "Compile demo with SW" OFF) 
+option(WITH_XPU "Compile demow ith xpu" OFF)
+option(WITH_NPU "Compile demow ith npu" OFF)
+if(NOT WITH_STATIC_LIB)
+  add_definitions("-DPADDLE_WITH_SHARED_LIB")
+else()
+  # PD_INFER_DECL is mainly used to set the dllimport/dllexport attribute in dynamic library mode. 
+  # Set it to empty in static library mode to avoid compilation issues.
+  add_definitions("/DPD_INFER_DECL=")
+endif()
+macro(safe_set_static_flag)
+    foreach(flag_var
+        CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
+        CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
+      if(${flag_var} MATCHES "/MD")
+        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
+      endif(${flag_var} MATCHES "/MD")
+    endforeach(flag_var)
+endmacro()
+if(NOT DEFINED PADDLE_LIB)
+  message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
+endif()
+if(NOT DEFINED DEMO_NAME)
+  message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
+endif()
+include_directories("${PADDLE_LIB}/")
+set(PADDLE_LIB_THIRD_PARTY_PATH "${PADDLE_LIB}/third_party/install/")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/include")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/lib")
+link_directories("${PADDLE_LIB}/paddle/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib")
+if (WIN32)
+  add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
+  option(MSVC_STATIC_CRT "use static C Runtime library by default" ON)
+  if (MSVC_STATIC_CRT)
+    if (WITH_MKL)
+      set(FLAG_OPENMP "/openmp")
+    endif()
+    set(CMAKE_C_FLAGS_DEBUG   "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
+    set(CMAKE_C_FLAGS_RELEASE  "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
+    set(CMAKE_CXX_FLAGS_DEBUG  "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
+    set(CMAKE_CXX_FLAGS_RELEASE   "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
+    safe_set_static_flag()
+    if (WITH_STATIC_LIB)
+      add_definitions(-DSTATIC_LIB)
+    endif()
+  endif()
+else()
+  if(WITH_MKL)
+    set(FLAG_OPENMP "-fopenmp")
+  endif()
+  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 ${FLAG_OPENMP}")
+endif()
+if(WITH_GPU)
+  if(NOT WIN32)
+    include_directories("/usr/local/cuda/include")
+    if(CUDA_LIB STREQUAL "")
+      set(CUDA_LIB "/usr/local/cuda/lib64/" CACHE STRING "CUDA Library")
+    endif()
+  else()
+    include_directories("C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\include")
+    if(CUDA_LIB STREQUAL "")
+      set(CUDA_LIB "C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\lib\\x64")
+    endif()
+  endif(NOT WIN32)
+endif()
+if (USE_TENSORRT AND WITH_GPU)
+  set(TENSORRT_ROOT "" CACHE STRING "The root directory of TensorRT library")
+  if("${TENSORRT_ROOT}" STREQUAL "")
+      message(FATAL_ERROR "The TENSORRT_ROOT is empty, you must assign it a value with CMake command. Such as: -DTENSORRT_ROOT=TENSORRT_ROOT_PATH ")
+  endif()
+  set(TENSORRT_INCLUDE_DIR ${TENSORRT_ROOT}/include)
+  set(TENSORRT_LIB_DIR ${TENSORRT_ROOT}/lib)
+  file(READ ${TENSORRT_INCLUDE_DIR}/NvInfer.h TENSORRT_VERSION_FILE_CONTENTS)
+  string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
+    "${TENSORRT_VERSION_FILE_CONTENTS}")
+  if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
+    file(READ ${TENSORRT_INCLUDE_DIR}/NvInferVersion.h TENSORRT_VERSION_FILE_CONTENTS)
+    string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
+      "${TENSORRT_VERSION_FILE_CONTENTS}")
+  endif()
+  if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
+    message(SEND_ERROR "Failed to detect TensorRT version.")
+  endif()
+  string(REGEX REPLACE "define NV_TENSORRT_MAJOR +([0-9]+)" "\\1"
+    TENSORRT_MAJOR_VERSION "${TENSORRT_MAJOR_VERSION}")
+  message(STATUS "Current TensorRT header is ${TENSORRT_INCLUDE_DIR}/NvInfer.h. "
+    "Current TensorRT version is v${TENSORRT_MAJOR_VERSION}. ")
+  include_directories("${TENSORRT_INCLUDE_DIR}")
+  link_directories("${TENSORRT_LIB_DIR}")
+endif()
+if(WITH_MKL)
+  set(MATH_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mklml")
+  include_directories("${MATH_LIB_PATH}/include")
+  if(WIN32)
+    set(MATH_LIB ${MATH_LIB_PATH}/lib/mklml${CMAKE_STATIC_LIBRARY_SUFFIX}
+                 ${MATH_LIB_PATH}/lib/libiomp5md${CMAKE_STATIC_LIBRARY_SUFFIX})
+  else()
+    set(MATH_LIB ${MATH_LIB_PATH}/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
+                 ${MATH_LIB_PATH}/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
+  endif()
+  set(MKLDNN_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mkldnn")
+  if(EXISTS ${MKLDNN_PATH})
+    include_directories("${MKLDNN_PATH}/include")
+    if(WIN32)
+      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
+    else(WIN32)
+      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
+    endif(WIN32)
+  endif()
+elseif((NOT WITH_MIPS) AND (NOT WITH_SW))
+  set(OPENBLAS_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}openblas")
+  include_directories("${OPENBLAS_LIB_PATH}/include/openblas")
+  if(WIN32)
+    set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/openblas${CMAKE_STATIC_LIBRARY_SUFFIX})
+  else()
+    set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
+  endif()
+endif()
+if(WITH_STATIC_LIB)
+  set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
+else()
+  if(WIN32)
+    set(DEPS ${PADDLE_LIB}/paddle/lib/paddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
+  else()
+    set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_SHARED_LIBRARY_SUFFIX})
+  endif()
+endif()
+if (WITH_ONNXRUNTIME)
+  if(WIN32)
+    set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.lib paddle2onnx)
+  elseif(APPLE)
+    set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.1.10.0.dylib paddle2onnx)
+  else()
+    set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.so.1.10.0 paddle2onnx)
+  endif()
+endif()
+if (NOT WIN32)
+  set(EXTERNAL_LIB "-lrt -ldl -lpthread")
+  set(DEPS ${DEPS}
+      ${MATH_LIB} ${MKLDNN_LIB}
+      glog gflags protobuf xxhash cryptopp
+      ${EXTERNAL_LIB})
+else()
+  set(DEPS ${DEPS}
+      ${MATH_LIB} ${MKLDNN_LIB}
+      glog gflags_static libprotobuf xxhash cryptopp-static ${EXTERNAL_LIB})
+  set(DEPS ${DEPS} shlwapi.lib)
+endif(NOT WIN32)
+if(WITH_GPU)
+  if(NOT WIN32)
+    if (USE_TENSORRT)
+      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
+      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
+    endif()
+    set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
+  else()
+    if(USE_TENSORRT)
+      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
+      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
+      if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
+        set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_STATIC_LIBRARY_SUFFIX})
+      endif()
+    endif()
+    set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
+    set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
+    set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX} )
+  endif()
+endif()
+if(WITH_ROCM AND NOT WIN32)
+  set(DEPS ${DEPS} ${ROCM_LIB}/libamdhip64${CMAKE_SHARED_LIBRARY_SUFFIX})
+endif()
+if(WITH_XPU AND NOT WIN32)
+  set(XPU_INSTALL_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}xpu")
+  set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpuapi${CMAKE_SHARED_LIBRARY_SUFFIX})
+  set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpurt${CMAKE_SHARED_LIBRARY_SUFFIX})
+endif()
+if(WITH_NPU AND NOT WIN32)
+  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libgraph${CMAKE_SHARED_LIBRARY_SUFFIX})
+  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libge_runner${CMAKE_SHARED_LIBRARY_SUFFIX})
+  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
+  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
+  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libacl_op_compiler${CMAKE_SHARED_LIBRARY_SUFFIX})
+endif()
+add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
+target_link_libraries(${DEMO_NAME} ${DEPS})
+if(WIN32)
+  if(USE_TENSORRT)
+    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+            COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_SHARED_LIBRARY_SUFFIX}
+              ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
+            COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX}
+              ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
+    )
+    if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
+      add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+              COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_SHARED_LIBRARY_SUFFIX}
+                ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE})
+    endif()
+  endif()
+  if(WITH_MKL)
+    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+          COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/mklml.dll ${CMAKE_BINARY_DIR}/Release
+          COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/libiomp5md.dll ${CMAKE_BINARY_DIR}/Release
+          COMMAND ${CMAKE_COMMAND} -E copy ${MKLDNN_PATH}/lib/mkldnn.dll  ${CMAKE_BINARY_DIR}/Release
+    )
+  else()
+    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+          COMMAND ${CMAKE_COMMAND} -E copy ${OPENBLAS_LIB_PATH}/lib/openblas.dll ${CMAKE_BINARY_DIR}/Release
+    )
+  endif()
+  if(WITH_ONNXRUNTIME)
+    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+    COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.dll
+      ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
+    COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib/paddle2onnx.dll
+      ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
+    )
+  endif()
+  if(NOT WITH_STATIC_LIB)
+      add_custom_command(TARGET ${DEMO_NAME} POST_BUILD 
+        COMMAND ${CMAKE_COMMAND} -E copy "${PADDLE_LIB}/paddle/lib/paddle_inference.dll" ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
+      )
+  endif()
+endif()
--- a/example/auto_compression/pytorch_yolov7/cpp_infer/README.md
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/README.md
+# YOLOv7 TensorRT Benchmark测试（Linux）
+## 环境准备
+- CUDA、CUDNN：确认环境中已经安装CUDA和CUDNN，并且提前获取其安装路径。
+- TensorRT：可通过NVIDIA官网下载[TensorRT 8.4.1.5](https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.4.1/tars/tensorrt-8.4.1.5.linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz)或其他版本安装包。
+- Paddle Inference C++预测库：编译develop版本请参考[编译文档](https://www.paddlepaddle.org.cn/inference/user_guides/source_compile.html)。编译完成后，会在build目录下生成`paddle_inference_install_dir`文件夹，这个就是我们需要的C++预测库文件。
+## 编译可执行程序
+- (1)修改`compile.sh`中依赖库路径，主要是以下内容：
+```shell
+# Paddle Inference预测库路径
+LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
+# CUDNN路径
+CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
+# CUDA路径
+CUDA_LIB=/usr/local/cuda/lib64
+# TensorRT安装包路径，为TRT资源包解压完成后的绝对路径，其中包含`lib`和`include`文件夹
+TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
+```
+## 测试
+- FP32
+```
+./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp32
+```
+- FP16
+```
+./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp16
+```
+- INT8
+```
+./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
+```
+## 性能对比
+| 预测库 |  模型  | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |
+| :--------: | :--------: |:-------- |:--------: | :---------------------: |
+| Paddle TensorRT | YOLOv7 |   26.84ms  |   7.44ms   |  4.55ms  |
+| TensorRT  | YOLOv7 |   28.25ms  |   7.23ms   |  4.67ms  |
+环境：
+- Tesla T4，TensorRT 8.4.1，CUDA 11.2
+- batch_size=1
--- a/example/auto_compression/pytorch_yolov7/cpp_infer/compile.sh
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/compile.sh
+#!/bin/bash
+set +x
+set -e
+work_path=$(dirname $(readlink -f $0))
+mkdir -p build
+cd build
+rm -rf *
+DEMO_NAME=trt_run
+WITH_MKL=ON
+WITH_GPU=ON
+USE_TENSORRT=ON
+LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
+CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
+CUDA_LIB=/usr/local/cuda/lib64
+TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
+WITH_ROCM=OFF
+ROCM_LIB=/opt/rocm/lib
+cmake .. -DPADDLE_LIB=${LIB_DIR} \
+  -DWITH_MKL=${WITH_MKL} \
+  -DDEMO_NAME=${DEMO_NAME} \
+  -DWITH_GPU=${WITH_GPU} \
+  -DWITH_STATIC_LIB=OFF \
+  -DUSE_TENSORRT=${USE_TENSORRT} \
+  -DWITH_ROCM=${WITH_ROCM} \
+  -DROCM_LIB=${ROCM_LIB} \
+  -DCUDNN_LIB=${CUDNN_LIB} \
+  -DCUDA_LIB=${CUDA_LIB} \
+  -DTENSORRT_ROOT=${TENSORRT_ROOT}
+make -j
--- a/example/auto_compression/pytorch_yolov7/cpp_infer/trt_run.cc
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/trt_run.cc
+#include <chrono>
+#include <iostream>
+#include <memory>
+#include <numeric>
+#include <gflags/gflags.h>
+#include <glog/logging.h>
+#include <cuda_runtime.h>
+#include "paddle/include/paddle_inference_api.h"
+#include "paddle/include/experimental/phi/common/float16.h"
+using paddle_infer::Config;
+using paddle_infer::Predictor;
+using paddle_infer::CreatePredictor;
+using paddle_infer::PrecisionType;
+using phi::dtype::float16;
+DEFINE_string(model_dir, "", "Directory of the inference model.");
+DEFINE_string(model_file, "", "Path of the inference model file.");
+DEFINE_string(params_file, "", "Path of the inference params file.");
+DEFINE_string(run_mode, "trt_fp32", "run_mode which can be: trt_fp32, trt_fp16 and trt_int8");
+DEFINE_int32(batch_size, 1, "Batch size.");
+DEFINE_int32(gpu_id, 0, "GPU card ID num.");
+DEFINE_int32(trt_min_subgraph_size, 3, "tensorrt min_subgraph_size");
+DEFINE_int32(warmup, 50, "warmup");
+DEFINE_int32(repeats, 1000, "repeats");
+using Time = decltype(std::chrono::high_resolution_clock::now());
+Time time() { return std::chrono::high_resolution_clock::now(); };
+double time_diff(Time t1, Time t2) {
+  typedef std::chrono::microseconds ms;
+  auto diff = t2 - t1;
+  ms counter = std::chrono::duration_cast<ms>(diff);
+  return counter.count() / 1000.0;
+}
+std::shared_ptr<Predictor> InitPredictor() {
+  Config config;
+  std::string model_path;
+  if (FLAGS_model_dir != "") {
+    config.SetModel(FLAGS_model_dir);
+    model_path = FLAGS_model_dir.substr(0, FLAGS_model_dir.find_last_of("/"));
+  } else {
+    config.SetModel(FLAGS_model_file, FLAGS_params_file);
+    model_path = FLAGS_model_file.substr(0, FLAGS_model_file.find_last_of("/"));
+  }
+  // enable tune
+  std::cout << "model_path: " << model_path << std::endl;
+  config.EnableUseGpu(256, FLAGS_gpu_id);
+  if (FLAGS_run_mode == "trt_fp32") {
+    config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
+                                PrecisionType::kFloat32, false, false);
+  } else if (FLAGS_run_mode == "trt_fp16") {
+    config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
+                                PrecisionType::kHalf, false, false);
+  } else if (FLAGS_run_mode == "trt_int8") {
+    config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
+                                PrecisionType::kInt8, false, false);
+  }
+  config.EnableMemoryOptim();
+  config.SwitchIrOptim(true);
+  return CreatePredictor(config);
+}
+template <typename type>
+void run(Predictor *predictor, const std::vector<type> &input,
+         const std::vector<int> &input_shape, type* out_data, std::vector<int> out_shape) {
+    // prepare input
+    int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1,
+                                    std::multiplies<int>());
+    auto input_names = predictor->GetInputNames();
+    auto input_t = predictor->GetInputHandle(input_names[0]);
+    input_t->Reshape(input_shape);
+    input_t->CopyFromCpu(input.data());
+  for (int i = 0; i < FLAGS_warmup; ++i)
+    CHECK(predictor->Run());
+  auto st = time();
+  for (int i = 0; i < FLAGS_repeats; ++i) {
+    auto input_names = predictor->GetInputNames();
+    auto input_t = predictor->GetInputHandle(input_names[0]);
+    input_t->Reshape(input_shape);
+    input_t->CopyFromCpu(input.data());
+    CHECK(predictor->Run());
+    auto output_names = predictor->GetOutputNames();
+    auto output_t = predictor->GetOutputHandle(output_names[0]);
+    std::vector<int> output_shape = output_t->shape();
+    output_t -> ShareExternalData<type>(out_data, out_shape, paddle_infer::PlaceType::kGPU);
+  }
+  LOG(INFO) << "[" << FLAGS_run_mode << " bs-" << FLAGS_batch_size << " ] run avg time is " << time_diff(st, time()) / FLAGS_repeats
+            << " ms";
+}
+int main(int argc, char *argv[]) {
+  google::ParseCommandLineFlags(&argc, &argv, true);
+  auto predictor = InitPredictor();
+  std::vector<int> input_shape = {FLAGS_batch_size, 3, 640, 640};
+  // float16
+  using dtype = float16;
+  std::vector<dtype> input_data(FLAGS_batch_size * 3 * 640 * 640, dtype(1.0));
+  dtype *out_data;
+  int out_data_size = FLAGS_batch_size * 25200 * 85;
+  cudaHostAlloc((void**)&out_data, sizeof(float) * out_data_size, cudaHostAllocMapped);
+  std::vector<int> out_shape{ FLAGS_batch_size, 1, 25200, 85};
+  run<dtype>(predictor.get(), input_data, input_shape, out_data, out_shape);
+  return 0;
+}
--- a/example/auto_compression/pytorch_yolov7/eval.py
+++ b/example/auto_compression/pytorch_yolov7/eval.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import sys
+import numpy as np
+import argparse
+import paddle
+from ppdet.core.workspace import load_config, merge_config
+from ppdet.core.workspace import create
+from ppdet.metrics import COCOMetric, VOCMetric
+from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from post_process import YOLOv7PostProcess
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        '--config_path',
+        type=str,
+        default=None,
+        help="path of compression strategy config.",
+        required=True)
+    parser.add_argument(
+        '--devices',
+        type=str,
+        default='gpu',
+        help="which device used to compress.")
+    return parser
+def reader_wrapper(reader, input_list):
+    def gen():
+        for data in reader:
+            in_dict = {}
+            if isinstance(input_list, list):
+                for input_name in input_list:
+                    in_dict[input_name] = data[input_name]
+            elif isinstance(input_list, dict):
+                for input_name in input_list.keys():
+                    in_dict[input_list[input_name]] = data[input_name]
+            yield in_dict
+    return gen
+def convert_numpy_data(data, metric):
+    data_all = {}
+    data_all = {k: np.array(v) for k, v in data.items()}
+    if isinstance(metric, VOCMetric):
+        for k, v in data_all.items():
+            if not isinstance(v[0], np.ndarray):
+                tmp_list = []
+                for t in v:
+                    tmp_list.append(np.array(t))
+                data_all[k] = np.array(tmp_list)
+    else:
+        data_all = {k: np.array(v) for k, v in data.items()}
+    return data_all
+def eval():
+    place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
+    exe = paddle.static.Executor(place)
+    val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(
+        global_config["model_dir"],
+        exe,
+        model_filename=global_config["model_filename"],
+        params_filename=global_config["params_filename"])
+    print('Loaded model from: {}'.format(global_config["model_dir"]))
+    metric = global_config['metric']
+    for batch_id, data in enumerate(val_loader):
+        data_all = convert_numpy_data(data, metric)
+        data_input = {}
+        for k, v in data.items():
+            if isinstance(global_config['input_list'], list):
+                if k in global_config['input_list']:
+                    data_input[k] = np.array(v)
+            elif isinstance(global_config['input_list'], dict):
+                if k in global_config['input_list'].keys():
+                    data_input[global_config['input_list'][k]] = np.array(v)
+        outs = exe.run(val_program,
+                       feed=data_input,
+                       fetch_list=fetch_targets,
+                       return_numpy=False)
+        res = {}
+        postprocess = YOLOv7PostProcess(
+            score_threshold=0.001, nms_threshold=0.65, multi_label=True)
+        res = postprocess(np.array(outs[0]), data_all['scale_factor'])
+        metric.update(data_all, res)
+        if batch_id % 100 == 0:
+            print('Eval iter:', batch_id)
+    metric.accumulate()
+    metric.log()
+    metric.reset()
+def main():
+    global global_config
+    all_config = load_slim_config(FLAGS.config_path)
+    global_config = all_config["Global"]
+    reader_cfg = load_config(global_config['reader_config'])
+    dataset = reader_cfg['EvalDataset']
+    global val_loader
+    val_loader = create('EvalReader')(reader_cfg['EvalDataset'],
+                                      reader_cfg['worker_num'],
+                                      return_list=True)
+    metric = None
+    if reader_cfg['metric'] == 'COCO':
+        clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
+        anno_file = dataset.get_anno()
+        metric = COCOMetric(
+            anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
+    elif reader_cfg['metric'] == 'VOC':
+        metric = VOCMetric(
+            label_list=dataset.get_label_list(),
+            class_num=reader_cfg['num_classes'],
+            map_type=reader_cfg['map_type'])
+    else:
+        raise ValueError("metric currently only supports COCO and VOC.")
+    global_config['metric'] = metric
+    eval()
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
+    paddle.set_device(FLAGS.devices)
+    main()
--- a/example/auto_compression/pytorch_yolov7/images/000000570688.jpg
+++ b/example/auto_compression/pytorch_yolov7/images/000000570688.jpg
--- a/example/auto_compression/pytorch_yolov7/paddle_trt_infer.py
+++ b/example/auto_compression/pytorch_yolov7/paddle_trt_infer.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import cv2
+import numpy as np
+import argparse
+import time
+from paddle.inference import Config
+from paddle.inference import create_predictor
+from post_process import YOLOv7PostProcess
+CLASS_LABEL = [
+    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
+    'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
+    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
+    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
+    'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
+    'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+    'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
+    'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+    'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
+    'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
+    'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
+    'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
+    'hair drier', 'toothbrush'
+]
+def generate_scale(im, target_shape, keep_ratio=True):
+    """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+        Returns:
+            im_scale_x: the resize ratio of X
+            im_scale_y: the resize ratio of Y
+        """
+    origin_shape = im.shape[:2]
+    if keep_ratio:
+        im_size_min = np.min(origin_shape)
+        im_size_max = np.max(origin_shape)
+        target_size_min = np.min(target_shape)
+        target_size_max = np.max(target_shape)
+        im_scale = float(target_size_min) / float(im_size_min)
+        if np.round(im_scale * im_size_max) > target_size_max:
+            im_scale = float(target_size_max) / float(im_size_max)
+        im_scale_x = im_scale
+        im_scale_y = im_scale
+    else:
+        resize_h, resize_w = target_shape
+        im_scale_y = resize_h / float(origin_shape[0])
+        im_scale_x = resize_w / float(origin_shape[1])
+    return im_scale_y, im_scale_x
+def image_preprocess(img_path, target_shape):
+    img = cv2.imread(img_path)
+    # Resize
+    im_scale_y, im_scale_x = generate_scale(img, target_shape)
+    img = cv2.resize(
+        img,
+        None,
+        None,
+        fx=im_scale_x,
+        fy=im_scale_y,
+        interpolation=cv2.INTER_LINEAR)
+    # Pad
+    im_h, im_w = img.shape[:2]
+    h, w = target_shape[:]
+    if h != im_h or w != im_w:
+        canvas = np.ones((h, w, 3), dtype=np.float32)
+        canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
+        canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
+        img = canvas
+    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+    img = np.transpose(img, [2, 0, 1]) / 255
+    img = np.expand_dims(img, 0)
+    scale_factor = np.array([[im_scale_y, im_scale_x]])
+    return img.astype(np.float32), scale_factor
+def get_color_map_list(num_classes):
+    color_map = num_classes * [0, 0, 0]
+    for i in range(0, num_classes):
+        j = 0
+        lab = i
+        while lab:
+            color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
+            color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
+            color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
+            j += 1
+            lab >>= 3
+    color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
+    return color_map
+def draw_box(image_file, results, class_label, threshold=0.5):
+    srcimg = cv2.imread(image_file, 1)
+    for i in range(len(results)):
+        color_list = get_color_map_list(len(class_label))
+        clsid2color = {}
+        classid, conf = int(results[i, 0]), results[i, 1]
+        if conf < threshold:
+            continue
+        xmin, ymin, xmax, ymax = int(results[i, 2]), int(results[i, 3]), int(
+            results[i, 4]), int(results[i, 5])
+        if classid not in clsid2color:
+            clsid2color[classid] = color_list[classid]
+        color = tuple(clsid2color[classid])
+        cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2)
+        print(class_label[classid] + ': ' + str(round(conf, 3)))
+        cv2.putText(
+            srcimg,
+            class_label[classid] + ':' + str(round(conf, 3)), (xmin, ymin - 10),
+            cv2.FONT_HERSHEY_SIMPLEX,
+            0.8, (0, 255, 0),
+            thickness=2)
+    return srcimg
+def load_predictor(model_dir,
+                   run_mode='paddle',
+                   batch_size=1,
+                   device='CPU',
+                   min_subgraph_size=3,
+                   use_dynamic_shape=False,
+                   trt_min_shape=1,
+                   trt_max_shape=1280,
+                   trt_opt_shape=640,
+                   trt_calib_mode=False,
+                   cpu_threads=1,
+                   enable_mkldnn=False,
+                   enable_mkldnn_bfloat16=False,
+                   delete_shuffle_pass=False):
+    """set AnalysisConfig, generate AnalysisPredictor
+    Args:
+        model_dir (str): root path of __model__ and __params__
+        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
+        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8)
+        use_dynamic_shape (bool): use dynamic shape or not
+        trt_min_shape (int): min shape for dynamic shape in trt
+        trt_max_shape (int): max shape for dynamic shape in trt
+        trt_opt_shape (int): opt shape for dynamic shape in trt
+        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
+            calibration, trt_calib_mode need to set True
+        delete_shuffle_pass (bool): whether to remove shuffle_channel_detect_pass in TensorRT. 
+                                    Used by action model.
+    Returns:
+        predictor (PaddlePredictor): AnalysisPredictor
+    Raises:
+        ValueError: predict by TensorRT need device == 'GPU'.
+    """
+    if device != 'GPU' and run_mode != 'paddle':
+        raise ValueError(
+            "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
+            .format(run_mode, device))
+    config = Config(
+        os.path.join(model_dir, 'model.pdmodel'),
+        os.path.join(model_dir, 'model.pdiparams'))
+    if device == 'GPU':
+        # initial GPU memory(M), device ID
+        config.enable_use_gpu(200, 0)
+        # optimize graph and fuse op
+        config.switch_ir_optim(True)
+    elif device == 'XPU':
+        config.enable_lite_engine()
+        config.enable_xpu(10 * 1024 * 1024)
+    else:
+        config.disable_gpu()
+        config.set_cpu_math_library_num_threads(cpu_threads)
+        if enable_mkldnn:
+            try:
+                # cache 10 different shapes for mkldnn to avoid memory leak
+                config.set_mkldnn_cache_capacity(10)
+                config.enable_mkldnn()
+                if enable_mkldnn_bfloat16:
+                    config.enable_mkldnn_bfloat16()
+            except Exception as e:
+                print(
+                    "The current environment does not support `mkldnn`, so disable mkldnn."
+                )
+                pass
+    precision_map = {
+        'trt_int8': Config.Precision.Int8,
+        'trt_fp32': Config.Precision.Float32,
+        'trt_fp16': Config.Precision.Half
+    }
+    if run_mode in precision_map.keys():
+        config.enable_tensorrt_engine(
+            workspace_size=(1 << 25) * batch_size,
+            max_batch_size=batch_size,
+            min_subgraph_size=min_subgraph_size,
+            precision_mode=precision_map[run_mode],
+            use_static=False,
+            use_calib_mode=trt_calib_mode)
+        if use_dynamic_shape:
+            min_input_shape = {
+                'image': [batch_size, 3, trt_min_shape, trt_min_shape]
+            }
+            max_input_shape = {
+                'image': [batch_size, 3, trt_max_shape, trt_max_shape]
+            }
+            opt_input_shape = {
+                'image': [batch_size, 3, trt_opt_shape, trt_opt_shape]
+            }
+            config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape,
+                                              opt_input_shape)
+            print('trt set dynamic shape done!')
+    # disable print log when predict
+    config.disable_glog_info()
+    # enable shared memory
+    config.enable_memory_optim()
+    # disable feed, fetch OP, needed by zero_copy_run
+    config.switch_use_feed_fetch_ops(False)
+    if delete_shuffle_pass:
+        config.delete_pass("shuffle_channel_detect_pass")
+    predictor = create_predictor(config)
+    return predictor
+def predict_image(predictor,
+                  image_file,
+                  image_shape=[640, 640],
+                  warmup=1,
+                  repeats=1,
+                  threshold=0.5,
+                  arch='YOLOv5'):
+    img, scale_factor = image_preprocess(image_file, image_shape)
+    inputs = {}
+    if arch == 'YOLOv5':
+        inputs['x2paddle_images'] = img
+    input_names = predictor.get_input_names()
+    for i in range(len(input_names)):
+        input_tensor = predictor.get_input_handle(input_names[i])
+        input_tensor.copy_from_cpu(inputs[input_names[i]])
+    for i in range(warmup):
+        predictor.run()
+    np_boxes = None
+    predict_time = 0.
+    time_min = float("inf")
+    time_max = float('-inf')
+    for i in range(repeats):
+        start_time = time.time()
+        predictor.run()
+        output_names = predictor.get_output_names()
+        boxes_tensor = predictor.get_output_handle(output_names[0])
+        np_boxes = boxes_tensor.copy_to_cpu()
+        end_time = time.time()
+        timed = end_time - start_time
+        time_min = min(time_min, timed)
+        time_max = max(time_max, timed)
+        predict_time += timed
+    time_avg = predict_time / repeats
+    print('Inference time(ms): min={}, max={}, avg={}'.format(
+        round(time_min * 1000, 2),
+        round(time_max * 1000, 1), round(time_avg * 1000, 1)))
+    postprocess = YOLOv7PostProcess(
+        score_threshold=0.001, nms_threshold=0.65, multi_label=True)
+    res = postprocess(np_boxes, scale_factor)
+    res_img = draw_box(
+        image_file, res['bbox'], CLASS_LABEL, threshold=threshold)
+    cv2.imwrite('result.jpg', res_img)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--image_file', type=str, default=None, help="image path")
+    parser.add_argument(
+        '--model_path', type=str, help="inference model filepath")
+    parser.add_argument(
+        '--benchmark',
+        type=bool,
+        default=False,
+        help="Whether run benchmark or not.")
+    parser.add_argument(
+        '--run_mode',
+        type=str,
+        default='paddle',
+        help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
+    parser.add_argument(
+        '--device',
+        type=str,
+        default='GPU',
+        help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
+    )
+    parser.add_argument('--img_shape', type=int, default=640, help="input_size")
+    args = parser.parse_args()
+    predictor = load_predictor(
+        args.model_path, run_mode=args.run_mode, device=args.device)
+    warmup, repeats = 1, 1
+    if args.benchmark:
+        warmup, repeats = 50, 100
+    predict_image(
+        predictor,
+        args.image_file,
+        image_shape=[args.img_shape, args.img_shape],
+        warmup=warmup,
+        repeats=repeats)
--- a/example/auto_compression/pytorch_yolov7/post_process.py
+++ b/example/auto_compression/pytorch_yolov7/post_process.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+import cv2
+def box_area(boxes):
+    """
+    Args:
+        boxes(np.ndarray): [N, 4]
+    return: [N]
+    """
+    return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
+def box_iou(box1, box2):
+    """
+    Args:
+        box1(np.ndarray): [N, 4]
+        box2(np.ndarray): [M, 4]
+    return: [N, M]
+    """
+    area1 = box_area(box1)
+    area2 = box_area(box2)
+    lt = np.maximum(box1[:, np.newaxis, :2], box2[:, :2])
+    rb = np.minimum(box1[:, np.newaxis, 2:], box2[:, 2:])
+    wh = rb - lt
+    wh = np.maximum(0, wh)
+    inter = wh[:, :, 0] * wh[:, :, 1]
+    iou = inter / (area1[:, np.newaxis] + area2 - inter)
+    return iou
+def nms(boxes, scores, iou_threshold):
+    """
+    Non Max Suppression numpy implementation.
+    args:
+        boxes(np.ndarray): [N, 4]
+        scores(np.ndarray): [N, 1]
+        iou_threshold(float): Threshold of IoU.
+    """
+    idxs = scores.argsort()
+    keep = []
+    while idxs.size > 0:
+        max_score_index = idxs[-1]
+        max_score_box = boxes[max_score_index][None, :]
+        keep.append(max_score_index)
+        if idxs.size == 1:
+            break
+        idxs = idxs[:-1]
+        other_boxes = boxes[idxs]
+        ious = box_iou(max_score_box, other_boxes)
+        idxs = idxs[ious[0] <= iou_threshold]
+    keep = np.array(keep)
+    return keep
+class YOLOv7PostProcess(object):
+    """
+    Post process of YOLOv6 network.
+    args:
+        score_threshold(float): Threshold to filter out bounding boxes with low 
+                confidence score. If not provided, consider all boxes.
+        nms_threshold(float): The threshold to be used in NMS.
+        multi_label(bool): Whether keep multi label in boxes.
+        keep_top_k(int): Number of total bboxes to be kept per image after NMS
+                step. -1 means keeping all bboxes after NMS step.
+    """
+    def __init__(self,
+                 score_threshold=0.25,
+                 nms_threshold=0.5,
+                 multi_label=False,
+                 keep_top_k=300):
+        self.score_threshold = score_threshold
+        self.nms_threshold = nms_threshold
+        self.multi_label = multi_label
+        self.keep_top_k = keep_top_k
+    def _xywh2xyxy(self, x):
+        # Convert from [x, y, w, h] to [x1, y1, x2, y2]
+        y = np.copy(x)
+        y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
+        y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
+        y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
+        y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
+        return y
+    def _non_max_suppression(self, prediction):
+        max_wh = 4096  # (pixels) minimum and maximum box width and height
+        nms_top_k = 30000
+        cand_boxes = prediction[..., 4] > self.score_threshold  # candidates
+        output = [np.zeros((0, 6))] * prediction.shape[0]
+        for batch_id, boxes in enumerate(prediction):
+            # Apply constraints
+            boxes = boxes[cand_boxes[batch_id]]
+            if not boxes.shape[0]:
+                continue
+            # Compute conf (conf = obj_conf * cls_conf)
+            boxes[:, 5:] *= boxes[:, 4:5]
+            # Box (center x, center y, width, height) to (x1, y1, x2, y2)
+            convert_box = self._xywh2xyxy(boxes[:, :4])
+            # Detections matrix nx6 (xyxy, conf, cls)
+            if self.multi_label:
+                i, j = (boxes[:, 5:] > self.score_threshold).nonzero()
+                boxes = np.concatenate(
+                    (convert_box[i], boxes[i, j + 5, None],
+                     j[:, None].astype(np.float32)),
+                    axis=1)
+            else:
+                conf = np.max(boxes[:, 5:], axis=1)
+                j = np.argmax(boxes[:, 5:], axis=1)
+                re = np.array(conf.reshape(-1) > self.score_threshold)
+                conf = conf.reshape(-1, 1)
+                j = j.reshape(-1, 1)
+                boxes = np.concatenate((convert_box, conf, j), axis=1)[re]
+            num_box = boxes.shape[0]
+            if not num_box:
+                continue
+            elif num_box > nms_top_k:
+                boxes = boxes[boxes[:, 4].argsort()[::-1][:nms_top_k]]
+            # Batched NMS
+            c = boxes[:, 5:6] * max_wh
+            clean_boxes, scores = boxes[:, :4] + c, boxes[:, 4]
+            keep = nms(clean_boxes, scores, self.nms_threshold)
+            # limit detection box num
+            if keep.shape[0] > self.keep_top_k:
+                keep = keep[:self.keep_top_k]
+            output[batch_id] = boxes[keep]
+        return output
+    def __call__(self, outs, scale_factor):
+        preds = self._non_max_suppression(outs)
+        bboxs, box_nums = [], []
+        for i, pred in enumerate(preds):
+            if len(pred.shape) > 2:
+                pred = np.squeeze(pred)
+            if len(pred.shape) == 1:
+                pred = pred[np.newaxis, :]
+            pred_bboxes = pred[:, :4]
+            scale_factor = np.tile(scale_factor[i][::-1], (1, 2))
+            pred_bboxes /= scale_factor
+            bbox = np.concatenate(
+                [
+                    pred[:, -1][:, np.newaxis], pred[:, -2][:, np.newaxis],
+                    pred_bboxes
+                ],
+                axis=-1)
+            bboxs.append(bbox)
+            box_num = bbox.shape[0]
+            box_nums.append(box_num)
+        bboxs = np.concatenate(bboxs, axis=0)
+        box_nums = np.array(box_nums)
+        return {'bbox': bboxs, 'bbox_num': box_nums}
--- a/example/auto_compression/pytorch_yolov7/post_quant.py
+++ b/example/auto_compression/pytorch_yolov7/post_quant.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import sys
+import numpy as np
+import argparse
+import paddle
+from ppdet.core.workspace import load_config, merge_config
+from ppdet.core.workspace import create
+from ppdet.metrics import COCOMetric, VOCMetric
+from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.quant import quant_post_static
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        '--config_path',
+        type=str,
+        default=None,
+        help="path of compression strategy config.",
+        required=True)
+    parser.add_argument(
+        '--save_dir',
+        type=str,
+        default='ptq_out',
+        help="directory to save compressed model.")
+    parser.add_argument(
+        '--devices',
+        type=str,
+        default='gpu',
+        help="which device used to compress.")
+    parser.add_argument(
+        '--algo', type=str, default='KL', help="post quant algo.")
+    return parser
+def reader_wrapper(reader, input_list):
+    def gen():
+        for data in reader:
+            in_dict = {}
+            if isinstance(input_list, list):
+                for input_name in input_list:
+                    in_dict[input_name] = data[input_name]
+            elif isinstance(input_list, dict):
+                for input_name in input_list.keys():
+                    in_dict[input_list[input_name]] = data[input_name]
+            yield in_dict
+    return gen
+def main():
+    global global_config
+    all_config = load_slim_config(FLAGS.config_path)
+    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
+    global_config = all_config["Global"]
+    reader_cfg = load_config(global_config['reader_config'])
+    train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
+                                        reader_cfg['worker_num'],
+                                        return_list=True)
+    train_loader = reader_wrapper(train_loader, global_config['input_list'])
+    place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
+    exe = paddle.static.Executor(place)
+    quant_post_static(
+        executor=exe,
+        model_dir=global_config["model_dir"],
+        quantize_model_path=FLAGS.save_dir,
+        data_loader=train_loader,
+        model_filename=global_config["model_filename"],
+        params_filename=global_config["params_filename"],
+        batch_size=32,
+        batch_nums=10,
+        algo=FLAGS.algo,
+        hist_percent=0.999,
+        is_full_quantize=False,
+        bias_correction=False,
+        onnx_format=False)
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
+    paddle.set_device(FLAGS.devices)
+    main()
--- a/example/auto_compression/pytorch_yolov7/run.py
+++ b/example/auto_compression/pytorch_yolov7/run.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import sys
+import numpy as np
+import argparse
+import paddle
+from ppdet.core.workspace import load_config, merge_config
+from ppdet.core.workspace import create
+from ppdet.metrics import COCOMetric, VOCMetric
+from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.auto_compression import AutoCompression
+from post_process import YOLOv7PostProcess
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        '--config_path',
+        type=str,
+        default=None,
+        help="path of compression strategy config.",
+        required=True)
+    parser.add_argument(
+        '--save_dir',
+        type=str,
+        default='output',
+        help="directory to save compressed model.")
+    parser.add_argument(
+        '--devices',
+        type=str,
+        default='gpu',
+        help="which device used to compress.")
+    parser.add_argument(
+        '--eval', type=bool, default=False, help="whether to run evaluation.")
+    return parser
+def reader_wrapper(reader, input_list):
+    def gen():
+        for data in reader:
+            in_dict = {}
+            if isinstance(input_list, list):
+                for input_name in input_list:
+                    in_dict[input_name] = data[input_name]
+            elif isinstance(input_list, dict):
+                for input_name in input_list.keys():
+                    in_dict[input_list[input_name]] = data[input_name]
+            yield in_dict
+    return gen
+def convert_numpy_data(data, metric):
+    data_all = {}
+    data_all = {k: np.array(v) for k, v in data.items()}
+    if isinstance(metric, VOCMetric):
+        for k, v in data_all.items():
+            if not isinstance(v[0], np.ndarray):
+                tmp_list = []
+                for t in v:
+                    tmp_list.append(np.array(t))
+                data_all[k] = np.array(tmp_list)
+    else:
+        data_all = {k: np.array(v) for k, v in data.items()}
+    return data_all
+def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
+    metric = global_config['metric']
+    for batch_id, data in enumerate(val_loader):
+        data_all = convert_numpy_data(data, metric)
+        data_input = {}
+        for k, v in data.items():
+            if isinstance(global_config['input_list'], list):
+                if k in test_feed_names:
+                    data_input[k] = np.array(v)
+            elif isinstance(global_config['input_list'], dict):
+                if k in global_config['input_list'].keys():
+                    data_input[global_config['input_list'][k]] = np.array(v)
+        outs = exe.run(compiled_test_program,
+                       feed=data_input,
+                       fetch_list=test_fetch_list,
+                       return_numpy=False)
+        res = {}
+        postprocess = YOLOv7PostProcess(
+            score_threshold=0.001, nms_threshold=0.65, multi_label=True)
+        res = postprocess(np.array(outs[0]), data_all['scale_factor'])
+        metric.update(data_all, res)
+        if batch_id % 100 == 0:
+            print('Eval iter:', batch_id)
+    metric.accumulate()
+    metric.log()
+    map_res = metric.get_results()
+    metric.reset()
+    return map_res['bbox'][0]
+def main():
+    global global_config
+    all_config = load_slim_config(FLAGS.config_path)
+    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
+    global_config = all_config["Global"]
+    reader_cfg = load_config(global_config['reader_config'])
+    train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
+                                        reader_cfg['worker_num'],
+                                        return_list=True)
+    train_loader = reader_wrapper(train_loader, global_config['input_list'])
+    if 'Evaluation' in global_config.keys() and global_config[
+            'Evaluation'] and paddle.distributed.get_rank() == 0:
+        eval_func = eval_function
+        dataset = reader_cfg['EvalDataset']
+        global val_loader
+        _eval_batch_sampler = paddle.io.BatchSampler(
+            dataset, batch_size=reader_cfg['EvalReader']['batch_size'])
+        val_loader = create('EvalReader')(dataset,
+                                          reader_cfg['worker_num'],
+                                          batch_sampler=_eval_batch_sampler,
+                                          return_list=True)
+        metric = None
+        if reader_cfg['metric'] == 'COCO':
+            clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
+            anno_file = dataset.get_anno()
+            metric = COCOMetric(
+                anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
+        elif reader_cfg['metric'] == 'VOC':
+            metric = VOCMetric(
+                label_list=dataset.get_label_list(),
+                class_num=reader_cfg['num_classes'],
+                map_type=reader_cfg['map_type'])
+        else:
+            raise ValueError("metric currently only supports COCO and VOC.")
+        global_config['metric'] = metric
+    else:
+        eval_func = None
+    ac = AutoCompression(
+        model_dir=global_config["model_dir"],
+        model_filename=global_config["model_filename"],
+        params_filename=global_config["params_filename"],
+        save_dir=FLAGS.save_dir,
+        config=all_config,
+        train_dataloader=train_loader,
+        eval_callback=eval_func)
+    ac.compress()
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
+    paddle.set_device(FLAGS.devices)
+    main()
--- a/example/auto_compression/semantic_segmentation/README.md
+++ b/example/auto_compression/semantic_segmentation/README.md
@@ -14,12 +14,10 @@
 ## 1.简介
-本示例将以语义分割模型PP-HumanSeg-Lite为例，介绍如何使用PaddleSeg中Inference部署模型进行自动压缩。本示例使用的自动压缩策略为非结构化稀疏、蒸馏和量化、蒸馏。
+本示例将以语义分割模型[PP-HumanSeg-Lite](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/contrib/PP-HumanSeg#portrait-segmentation)为例，介绍如何使用PaddleSeg中Inference部署模型进行自动压缩。本示例使用的自动压缩策略为非结构化稀疏、蒸馏和量化、蒸馏。
 ## 2.Benchmark
- [PP-HumanSeg-Lite](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/contrib/PP-HumanSeg#portrait-segmentation)
 | 模型 | 策略  | Total IoU | ARM CPU耗时(ms)<br>thread=1 |Nvidia GPU耗时(ms)| 配置文件 | Inference模型  |
 |:-----:|:-----:|:----------:|:---------:| :------:|:------:|:------:|
 | PP-HumanSeg-Lite | Baseline |  92.87 | 56.363 |-| - | [model](https://paddleseg.bj.bcebos.com/dygraph/ppseg/ppseg_lite_portrait_398x224_with_softmax.tar.gz) |
@@ -34,7 +32,7 @@
 | Deeplabv3-ResNet50 | Baseline |  79.90 | -|12.766| -| [model](https://paddleseg.bj.bcebos.com/tipc/easyedge/RES-paddle2-Deeplabv3-ResNet50.zip)|
 | Deeplabv3-ResNet50 | 量化训练 |  78.89 | - |8.839|[config](./configs/deeplabv3/deeplabv3_qat.yaml) | - |
- ARM CPU测试环境：`SDM710 2*A75(2.2GHz) 6*A55(1.7GHz)`；
+- ARM CPU测试环境：`高通骁龙710处理器(SDM710 2*A75(2.2GHz) 6*A55(1.7GHz))`；
 - Nvidia GPU测试环境：
@@ -65,6 +63,11 @@ pip install paddlepaddle-gpu
 pip install paddleslim
 ```
+准备paddleslim示例代码：
+```shell
+git clone https://github.com/PaddlePaddle/PaddleSlim.git
+```
 安装paddleseg
 ```shell
@@ -77,15 +80,16 @@ pip install paddleseg
 开发者可下载开源数据集 (如[AISegment](https://github.com/aisegmentcn/matting_human_datasets)) 或自定义语义分割数据集。请参考[PaddleSeg数据准备文档](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/data/marker/marker_cn.md)来检查、对齐数据格式即可。
-本示例使用示例开源数据集 AISegment 数据集为例介绍如何对PP-HumanSeg-Lite进行自动压缩。示例中的数据集仅用于快速跑通自动压缩流程，并不能复现出 benckmark 表中的压缩效果。
+本示例使用示例开源数据集 AISegment 数据集为例介绍如何对PP-HumanSeg-Lite进行自动压缩。示例数据集仅用于快速跑通自动压缩流程，并不能复现出 benckmark 表中的压缩效果。
 可以通过以下命令下载人像分割示例数据:
 ```shell
+cd PaddleSlim/example/auto_compression/semantic_segmentation
 python ./data/download_data.py mini_humanseg
 ### 下载后的数据位置为 ./data/humanseg/
 ```
-** 提示: **
+**提示:**
 - PP-HumanSeg-Lite压缩过程使用的数据集
  - 数据集：AISegment + PP-HumanSeg14K + 内部自建数据集。其中 AISegment 是开源数据集，可从[链接](https://github.com/aisegmentcn/matting_human_datasets)处获取；PP-HumanSeg14K 是 PaddleSeg 自建数据集，可从[官方渠道](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/contrib/PP-HumanSeg/paper.md#pp-humanseg14k-a-large-scale-teleconferencing-video-dataset)获取；内部数据集不对外公开。