diff --git a/README.md b/README.md
index 0e2ce0254c666e9b63ae9c21617734ddb6e0ad40..5289fb2e4d53ef57ee39b86a1a4c61025a1d101f 100755
--- a/README.md
+++ b/README.md
@@ -20,7 +20,7 @@ PaddleSlim是一个专注于深度学习模型压缩的工具库,提供**低
- 支持代码无感知压缩:用户只需提供推理模型文件和数据,既可进行离线量化(PTQ)、量化训练(QAT)、稀疏训练等压缩任务。
- 支持自动策略选择,根据任务特点和部署环境特性:自动搜索合适的离线量化方法,自动搜索最佳的压缩策略组合方式。
- 发布[自然语言处理](example/auto_compression/nlp)、[图像语义分割](example/auto_compression/semantic_segmentation)、[图像目标检测](example/auto_compression/detection)三个方向的自动化压缩示例。
- - 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolov5)、[YOLOv6](example/auto_compression/pytorch_yolov6)、[HuggingFace](example/auto_compression/pytorch_huggingface)、[MobileNet](example/auto_compression/tensorflow_mobilenet)。
+ - 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolov5)、[YOLOv6](example/auto_compression/pytorch_yolov6)、[YOLOv7](example/auto_compression/pytorch_yolov7)、[HuggingFace](example/auto_compression/pytorch_huggingface)、[MobileNet](example/auto_compression/tensorflow_mobilenet)。
- 升级量化功能
diff --git a/example/auto_compression/README.md b/example/auto_compression/README.md
index c9c9a91ddc545eab3ed57295e5365c3cd81917ef..e907908bcf4e8fb05f0787cc6256407dc1ae18d4 100644
--- a/example/auto_compression/README.md
+++ b/example/auto_compression/README.md
@@ -1,165 +1,250 @@
-# 自动化压缩工具ACT(Auto Compression Toolkit)
-
-## 简介
-PaddleSlim推出全新自动化压缩工具(ACT),旨在通过Source-Free的方式,自动对预测模型进行压缩,压缩后模型可直接部署应用。ACT自动化压缩工具主要特性如下:
-- **『更便捷』**:开发者无需了解或修改模型源码,直接使用导出的预测模型进行压缩;
-- **『更智能』**:开发者简单配置即可启动压缩,ACT工具会自动优化得到最好预测模型;
-- **『更丰富』**:ACT中提供了量化训练、蒸馏、结构化剪枝、非结构化剪枝、多种离线量化方法及超参搜索等等,可任意搭配使用。
-
-
-## 环境准备
-
-- 安装PaddlePaddle >= 2.3 (从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装)
-- 安装PaddleSlim >=2.3
-
-(1)安装paddlepaddle:
-```shell
-# CPU
-pip install paddlepaddle
-# GPU
-pip install paddlepaddle-gpu
-```
-
-(2)安装paddleslim:
-```shell
-pip install paddleslim
-```
-
-## 快速上手
-
-- 1.准备模型及数据集
-
-```shell
-# 下载MobileNet预测模型
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_infer.tar
-tar -xf MobileNetV1_infer.tar
-# 下载ImageNet小型数据集
-wget https://sys-p0.bj.bcebos.com/slim_ci/ILSVRC2012_data_demo.tar.gz
-tar -xf ILSVRC2012_data_demo.tar.gz
-```
-
-- 2.运行
-
-```python
-# 导入依赖包
-import paddle
-from PIL import Image
-from paddle.vision.datasets import DatasetFolder
-from paddle.vision.transforms import transforms
-from paddleslim.auto_compression import AutoCompression
-paddle.enable_static()
-# 定义DataSet
-class ImageNetDataset(DatasetFolder):
- def __init__(self, path, image_size=224):
- super(ImageNetDataset, self).__init__(path)
- normalize = transforms.Normalize(
- mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375])
- self.transform = transforms.Compose([
- transforms.Resize(256),
- transforms.CenterCrop(image_size), transforms.Transpose(),
- normalize
- ])
-
- def __getitem__(self, idx):
- img_path, _ = self.samples[idx]
- return self.transform(Image.open(img_path).convert('RGB'))
-
- def __len__(self):
- return len(self.samples)
-
-# 定义DataLoader
-train_dataset = ImageNetDataset("./ILSVRC2012_data_demo/ILSVRC2012/train/")
-image = paddle.static.data(
- name='inputs', shape=[None] + [3, 224, 224], dtype='float32')
-train_loader = paddle.io.DataLoader(train_dataset, feed_list=[image], batch_size=32, return_list=False)
-# 开始自动压缩
-ac = AutoCompression(
- model_dir="./MobileNetV1_infer",
- model_filename="inference.pdmodel",
- params_filename="inference.pdiparams",
- save_dir="MobileNetV1_quant",
- config={'Quantization': {}, "HyperParameterOptimization": {'ptq_algo': ['avg'], 'max_quant_count': 3}},
- train_dataloader=train_loader,
- eval_dataloader=train_loader)
-ac.compress()
-```
-
-- 3.测试精度
-
-测试压缩前模型的精度:
-```shell
-CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py
-### Eval Top1: 0.7171724759615384
-```
-
-测试量化模型的精度:
-```shell
-CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py --model_dir='MobileNetV1_quant'
-### Eval Top1: 0.7166466346153846
-```
-
-量化后模型的精度相比量化前的模型几乎精度无损,由于是使用的超参搜索的方法来选择的量化参数,所以每次运行得到的量化模型精度会有些许波动。
-
-- 4.推理速度测试
-量化模型速度的测试依赖推理库的支持,所以确保安装的是带有TensorRT的PaddlePaddle。以下示例和展示的测试结果是基于Tesla V100、CUDA 10.2、python3.7得到的。
-
-使用以下指令查看本地cuda版本,并且在[下载链接](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)中下载对应cuda版本和对应python版本的paddlepaddle安装包。
-```shell
-cat /usr/local/cuda/version.txt ### CUDA Version 10.2.89
-### 10.2.89 为cuda版本号,可以根据这个版本号选择需要安装的带有TensorRT的PaddlePaddle安装包。
-```
-
-安装下载的whl包:
-```
-### 这里通过wget下载到的是python3.7、cuda10.2的PaddlePaddle安装包,若您的环境和示例环境不同,请依赖您自己机器的环境下载对应的安装包,否则运行示例代码会报错。
-wget https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl
-pip install paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl --force-reinstall
-```
-
-测试FP32模型的速度
-```
-python ./image_classification/infer.py
-### using tensorrt FP32 batch size: 1 time(ms): 0.6140608787536621
-```
-
-测试FP16模型的速度
-```
-python ./image_classification/infer.py --use_fp16=True
-### using tensorrt FP16 batch size: 1 time(ms): 0.5795984268188477
-```
-
-测试INT8模型的速度
-```
-python ./image_classification/infer.py --model_dir=./MobileNetV1_quant/ --use_int8=True
-### using tensorrt INT8 batch size: 1 time(ms): 0.5213963985443115
-```
-
-**提示:**
-- DataLoader传入的数据集是待压缩模型所用的数据集,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。
-- 自动化压缩Config中定义量化、蒸馏、剪枝等压缩算法会合并执行,压缩策略有:量化+蒸馏,剪枝+蒸馏等等。示例中选择的配置为离线量化超参搜索。
-- 如果要压缩的模型参数是存储在各自分离的文件中,需要先通过[convert.py](./convert.py) 脚本将其保存成一个单独的二进制文件。
-
-## 应用示例
-
-#### [图像分类](./image_classification)
-
-#### [目标检测](./detection)
-
-#### [语义分割](./semantic_segmentation)
-
-#### [NLP](./nlp)
-
-#### X2Paddle
-
-- [PyTorch YOLOv5](./pytorch_yolov5)
-- [HuggingFace](./pytorch_huggingface)
-- [TensorFlow MobileNet](./tensorflow_mobilenet)
-
-#### 即将发布
-- [ ] 更多自动化压缩应用示例
-
-## 其他
-
-- ACT可以自动处理常见的预测模型,如果有更特殊的改造需求,可以参考[ACT超参配置教程](./hyperparameter_tutorial.md)来进行单独配置压缩策略。
-
-- 如果你发现任何关于ACT自动化压缩工具的问题或者是建议, 欢迎通过[GitHub Issues](https://github.com/PaddlePaddle/PaddleSlim/issues)给我们提issues。同时欢迎贡献更多优秀模型,共建开源生态。
+# 模型自动化压缩工具ACT(Auto Compression Toolkit)
+
+------------------------------------------------------------------------------------------
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+## **简介**
+
+PaddleSlim推出全新自动化压缩工具(Auto Compression Toolkit, ACT),旨在通过Source-Free的方式,自动对预测模型进行压缩,压缩后模型可直接部署应用。
+
+## **News** 📢
+
+* 🎉 2022.7.6 [**PaddleSlim v2.3.0**](https://github.com/PaddlePaddle/PaddleSlim/releases/tag/v2.3.0)全新发布!目前已经在图像分类、目标检测、图像分割、NLP等20多个模型验证正向效果。
+* 🔥 2022.7.14 晚 20:30,PaddleSlim自动压缩天使用户沟通会。与开发者共同探讨模型压缩痛点问题,欢迎大家扫码报名入群获取会议链接。
+
+
+
+
+
+## **特性**
+
+- **🚀『解耦训练代码』** :开发者无需了解或修改模型源码,直接使用导出的预测模型进行压缩;
+- **🎛️『全流程自动优化』** :开发者简单配置即可启动压缩,ACT工具会自动优化得到最好预测模型;
+- **📦『支持丰富压缩算法』** :ACT中提供了量化训练、蒸馏、结构化剪枝、非结构化剪枝、多种离线量化方法及超参搜索等等,可任意搭配使用
+
+### **ACT核心思想**
+
+相比于传统手工压缩,自动化压缩的“自动”主要体现在4个方面:解耦训练代码、离线量化超参搜索、算法
+
+
+
+
+
+### **模型压缩效果示例**
+
+ACT相比传统的模型压缩方法,
+
+- 代码量减少 50% 以上
+- 压缩精度与手工压缩基本持平。在 PP-YOLOE 模型上,效果还优于手动压缩,
+- 自动化压缩后的推理性能收益与手工压缩持平,相比压缩前,推理速度可以提升1.4~7.1倍。
+
+
+
+
+
+### **模型压缩效果Benchmark**
+
+
+
+
+
+| 模型类型 | model name | 压缩前
精度(Top1 Acc %) | 压缩后
精度(Top1 Acc %) | 压缩前
推理时延(ms) | 压缩后
推理时延(ms) | 推理
加速比 | 芯片 |
+| ------------------------------- | ---------------------------- | ---------------------- | ---------------------- | ---------------- | ---------------- | ---------- | ----------------- |
+| [图像分类](./image_classification) | MobileNetV1 | 70.90 | 70.57 | 33.15 | 13.64 | **2.43** | SDM865(骁龙865) |
+| [图像分类](./image_classification) | ShuffleNetV2_x1_0 | 68.65 | 68.32 | 10.43 | 5.51 | **1.89** | SDM865(骁龙865) |
+| [图像分类](./image_classification) | SqueezeNet1_0_infer | 59.60 | 59.45 | 35.98 | 16.96 | **2.12** | SDM865(骁龙865) |
+| [图像分类](./image_classification) | PPLCNetV2_base | 76.86 | 76.43 | 36.50 | 15.79 | **2.31** | SDM865(骁龙865) |
+| [图像分类](./image_classification) | ResNet50_vd | 79.12 | 78.74 | 3.19 | 0.92 | **3.47** | NVIDIA Tesla T4 |
+| [语义分割](./semantic_segmentation) | PPHGNet_tiny | 79.59 | 79.20 | 2.82 | 0.98 | **2.88** | NVIDIA Tesla T4 |
+| [语义分割](./semantic_segmentation) | PP-HumanSeg-Lite | 92.87 | 92.35 | 56.36 | 37.71 | **1.49** | SDM710 |
+| [语义分割](./semantic_segmentation) | PP-LiteSeg | 77.04 | 76.93 | 1.43 | 1.16 | **1.23** | NVIDIA Tesla T4 |
+| [语义分割](./semantic_segmentation) | HRNet | 78.97 | 78.90 | 8.19 | 5.81 | **1.41** | NVIDIA Tesla T4 |
+| [语义分割](./semantic_segmentation) | UNet | 65.00 | 64.93 | 15.29 | 10.23 | **1.49** | NVIDIA Tesla T4 |
+| NLP | PP-MiniLM | 72.81 | 72.44 | 128.01 | 17.97 | **7.12** | NVIDIA Tesla T4 |
+| NLP | ERNIE 3.0-Medium | 73.09 | 72.40 | 29.25(fp16) | 19.61 | **1.49** | NVIDIA Tesla T4 |
+| [目标检测](./pytorch_yolov5) | YOLOv5s
(PyTorch) | 37.40 | 36.9 | 5.95 | 1.87 | **3.18** | NVIDIA Tesla T4 |
+| [目标检测](./pytorch_yolov6) | YOLOv6s
(PyTorch) | 42.4 | 41.3 | 9.06 | 1.83 | **4.95** | NVIDIA Tesla T4 |
+| [目标检测](./pytorch_yolov7) | YOLOv7
(PyTorch) | 51.1 | 50.8 | 26.84 | 4.55 | **5.89** | NVIDIA Tesla T4 |
+| [目标检测](./detection) | PP-YOLOE-l | 50.9 | 50.6 | 11.2 | 6.7 | **1.67** | NVIDIA Tesla V100 |
+| [图像分类](./image_classification) | MobileNetV1
(TensorFlow) | 71.0 | 70.22 | 30.45 | 15.86 | **1.92** | SDMM865(骁龙865) |
+
+- 备注:目标检测精度指标为mAP(0.5:0.95)精度测量结果。图像分割精度指标为IoU精度测量结果。
+- 更多飞桨模型应用示例及Benchmark可以参考:[图像分类](./image_classification),[目标检测](./detection),[语义分割](./semantic_segmentation),[自然语言处理](./nlp)
+- 更多其它框架应用示例及Benchmark可以参考:[YOLOv5(PyTorch)](./pytorch_yolov5),[YOLOv6(PyTorch)](./pytorch_yolov6),[YOLOv7(PyTorch)](./pytorch_yolov7),[HuggingFace(PyTorch)](./pytorch_huggingface),[MobileNet(TensorFlow)](./tensorflow_mobilenet)。
+
+## **环境准备**
+
+- 安装PaddlePaddle >= 2.3.1:(可以参考[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装)
+
+ ```shell
+ # CPU
+ pip install paddlepaddle --upgrade
+ # GPU
+ pip install paddlepaddle-gpu --upgrade
+ ```
+
+- 安装PaddleSlim >=2.3.0:
+
+ ```shell
+ pip install paddleslim
+ ```
+
+## **快速开始**
+
+- **1. 准备模型及数据集**
+
+```shell
+# 下载MobileNet预测模型
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_infer.tar
+tar -xf MobileNetV1_infer.tar
+# 下载ImageNet小型数据集
+wget https://sys-p0.bj.bcebos.com/slim_ci/ILSVRC2012_data_demo.tar.gz
+tar -xf ILSVRC2012_data_demo.tar.gz
+```
+
+- **2.运行自动化压缩**
+
+```python
+# 导入依赖包
+import paddle
+from PIL import Image
+from paddle.vision.datasets import DatasetFolder
+from paddle.vision.transforms import transforms
+from paddleslim.auto_compression import AutoCompression
+paddle.enable_static()
+# 定义DataSet
+class ImageNetDataset(DatasetFolder):
+ def __init__(self, path, image_size=224):
+ super(ImageNetDataset, self).__init__(path)
+ normalize = transforms.Normalize(
+ mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375])
+ self.transform = transforms.Compose([
+ transforms.Resize(256),
+ transforms.CenterCrop(image_size), transforms.Transpose(),
+ normalize
+ ])
+
+ def __getitem__(self, idx):
+ img_path, _ = self.samples[idx]
+ return self.transform(Image.open(img_path).convert('RGB'))
+
+ def __len__(self):
+ return len(self.samples)
+
+# 定义DataLoader
+train_dataset = ImageNetDataset("./ILSVRC2012_data_demo/ILSVRC2012/train/")
+image = paddle.static.data(
+ name='inputs', shape=[None] + [3, 224, 224], dtype='float32')
+train_loader = paddle.io.DataLoader(train_dataset, feed_list=[image], batch_size=32, return_list=False)
+# 开始自动压缩
+ac = AutoCompression(
+ model_dir="./MobileNetV1_infer",
+ model_filename="inference.pdmodel",
+ params_filename="inference.pdiparams",
+ save_dir="MobileNetV1_quant",
+ config={'Quantization': {}, "HyperParameterOptimization": {'ptq_algo': ['avg'], 'max_quant_count': 3}},
+ train_dataloader=train_loader,
+ eval_dataloader=train_loader)
+ac.compress()
+```
+
+- **3.精度测试**
+
+ - 测试压缩前模型的精度:
+
+ ```shell
+ CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py
+ ### Eval Top1: 0.7171724759615384
+ ```
+
+ - 测试量化模型的精度:
+
+ ```shell
+ CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py --model_dir='MobileNetV1_quant'
+ ### Eval Top1: 0.7166466346153846
+ ```
+
+ - 量化后模型的精度相比量化前的模型几乎精度无损,由于是使用的超参搜索的方法来选择的量化参数,所以每次运行得到的量化模型精度会有些许波动。
+
+- **4.推理速度测试**
+
+ - 量化模型速度的测试依赖推理库的支持,所以确保安装的是带有TensorRT的PaddlePaddle。以下示例和展示的测试结果是基于Tesla V100、CUDA 10.2、python3.7得到的。
+
+ - 使用以下指令查看本地cuda版本,并且在[下载链接](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)中下载对应cuda版本和对应python版本的paddlepaddle安装包。
+
+ ```shell
+ cat /usr/local/cuda/version.txt ### CUDA Version 10.2.89
+ ### 10.2.89 为cuda版本号,可以根据这个版本号选择需要安装的带有TensorRT的PaddlePaddle安装包。
+ ```
+
+ - 安装下载的whl包:(这里通过wget下载到的是python3.7、cuda10.2的PaddlePaddle安装包,若您的环境和示例环境不同,请依赖您自己机器的环境下载对应的安装包,否则运行示例代码会报错。)
+
+ ```
+ wget https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl
+ pip install paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl --force-reinstall
+ ```
+
+ - 测试FP32模型的速度
+
+ ```
+ python ./image_classification/infer.py
+ ### using tensorrt FP32 batch size: 1 time(ms): 0.6140608787536621
+ ```
+
+ - 测试FP16模型的速度
+
+ ```
+ python ./image_classification/infer.py --use_fp16=True
+ ### using tensorrt FP16 batch size: 1 time(ms): 0.5795984268188477
+ ```
+
+ - 测试INT8模型的速度
+
+ ```
+ python ./image_classification/infer.py --model_dir=./MobileNetV1_quant/ --use_int8=True
+ ### using tensorrt INT8 batch size: 1 time(ms): 0.5213963985443115
+ ```
+
+ - **提示:**
+
+ - DataLoader传入的数据集是待压缩模型所用的数据集,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。
+ - 自动化压缩Config中定义量化、蒸馏、剪枝等压缩算法会合并执行,压缩策略有:量化+蒸馏,剪枝+蒸馏等等。示例中选择的配置为离线量化超参搜索。
+ - 如果要压缩的模型参数是存储在各自分离的文件中,需要先通过[convert.py](./convert.py) 脚本将其保存成一个单独的二进制文件。
+
+## 进阶使用
+
+- ACT可以自动处理常见的预测模型,如果有更特殊的改造需求,可以参考[ACT超参配置教程](./hyperparameter_tutorial.md)来进行单独配置压缩策略。
+
+## 社区交流
+
+- 微信扫描二维码并填写问卷之后,加入技术交流群
+
+
+
+
+
+- 如果你发现任何关于ACT自动化压缩工具的问题或者是建议, 欢迎通过[GitHub Issues](https://github.com/PaddlePaddle/PaddleSlim/issues)给我们提issues。同时欢迎贡献更多优秀模型,共建开源生态。
+
+## License
+
+本项目遵循[Apache-2.0开源协议](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/LICENSE)
diff --git a/example/auto_compression/pytorch_yolov6/README.md b/example/auto_compression/pytorch_yolov6/README.md
index 41a61b5463be9b92d8a733b89de64b36d9e84c4e..7cdb546476ff8bc3898644d6cfbcde2a1d77a80f 100644
--- a/example/auto_compression/pytorch_yolov6/README.md
+++ b/example/auto_compression/pytorch_yolov6/README.md
@@ -22,7 +22,7 @@
| 模型 | 策略 | 输入尺寸 | mAPval
0.5:0.95 | 预测时延FP32
(ms) |预测时延FP16
(ms) | 预测时延INT8
(ms) | 配置文件 | Inference模型 |
| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
-| YOLOv6s | Base模型 | 640*640 | 42.4 | 9.06ms | 2.90ms | - | - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/detection/yolov6s_infer.tar) |
+| YOLOv6s | Base模型 | 640*640 | 42.4 | 9.06ms | 2.90ms | - | - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_infer.tar) |
| YOLOv6s | KL离线量化 | 640*640 | 30.3 | - | - | 1.83ms | - | - |
| YOLOv6s | 量化蒸馏训练 | 640*640 | **41.3** | - | - | **1.83ms** | [config](./configs/yolov6s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_quant.tar) |
@@ -83,7 +83,7 @@ pip install x2paddle sympy onnx
x2paddle --framework=onnx --model=yolov6s.onnx --save_dir=pd_model
cp -r pd_model/inference_model/ yolov6s_infer
```
-即可得到YOLOv6s模型的预测模型(`model.pdmodel` 和 `model.pdiparams`)。如想快速体验,可直接下载上方表格中YOLOv6s的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/detection/yolov6s_infer.tar)。
+即可得到YOLOv6s模型的预测模型(`model.pdmodel` 和 `model.pdiparams`)。如想快速体验,可直接下载上方表格中YOLOv6s的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_infer.tar)。
预测模型的格式为:`model.pdmodel` 和 `model.pdiparams`两个,带`pdmodel`的是模型文件,带`pdiparams`后缀的是权重文件。
diff --git a/example/auto_compression/pytorch_yolov7/README.md b/example/auto_compression/pytorch_yolov7/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..7dddf99bf5ce2ecaac0e4e62c1a9a9c4509a6ce9
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/README.md
@@ -0,0 +1,152 @@
+# YOLOv7自动压缩示例
+
+目录:
+- [1.简介](#1简介)
+- [2.Benchmark](#2Benchmark)
+- [3.开始自动压缩](#自动压缩流程)
+ - [3.1 环境准备](#31-准备环境)
+ - [3.2 准备数据集](#32-准备数据集)
+ - [3.3 准备预测模型](#33-准备预测模型)
+ - [3.4 测试模型精度](#34-测试模型精度)
+ - [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
+- [4.预测部署](#4预测部署)
+- [5.FAQ](5FAQ)
+
+## 1. 简介
+
+飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。
+
+本示例将以[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)目标检测模型为例,将PyTorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。
+
+## 2.Benchmark
+
+| 模型 | 策略 | 输入尺寸 | mAPval
0.5:0.95 | 预测时延FP32
(ms) |预测时延FP16
(ms) | 预测时延INT8
(ms) | 配置文件 | Inference模型 |
+| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
+| YOLOv7 | Base模型 | 640*640 | 51.1 | 26.84ms | 7.44ms | - | - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_infer.tar) |
+| YOLOv7 | KL离线量化 | 640*640 | 50.2 | - | - | 4.55ms | - | - |
+| YOLOv7 | 量化蒸馏训练 | 640*640 | **50.8** | - | - | **4.55ms** | [config](./configs/yolov7_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar) |
+
+说明:
+- mAP的指标均在COCO val2017数据集中评测得到。
+- YOLOv7模型在Tesla T4的GPU环境下开启TensorRT 8.4.1,batch_size=1, 测试脚本是[cpp_infer](./cpp_infer)。
+
+## 3. 自动压缩流程
+
+#### 3.1 准备环境
+- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装)
+- PaddleSlim > 2.3版本
+- PaddleDet >= 2.4
+- [X2Paddle](https://github.com/PaddlePaddle/X2Paddle) >= 1.3.6
+- opencv-python
+
+(1)安装paddlepaddle:
+```shell
+# CPU
+pip install paddlepaddle
+# GPU
+pip install paddlepaddle-gpu
+```
+
+(2)安装paddleslim:
+```shell
+pip install paddleslim
+```
+
+(3)安装paddledet:
+```shell
+pip install paddledet
+```
+
+注:安装PaddleDet的目的只是为了直接使用PaddleDetection中的Dataloader组件。
+
+(4)安装X2Paddle的1.3.6以上版本:
+```shell
+pip install x2paddle sympy onnx
+```
+
+#### 3.2 准备数据集
+
+本案例默认以COCO数据进行自动压缩实验,并且依赖PaddleDetection中数据读取模块,如果自定义COCO数据,或者其他格式数据,请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。
+
+如果已经准备好数据集,请直接修改[./configs/yolov7_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。
+
+
+#### 3.3 准备预测模型
+
+(1)准备ONNX模型:
+
+可通过[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)的导出脚本来准备ONNX模型,具体步骤如下:
+```shell
+git clone https://github.com/WongKinYiu/yolov7.git
+# 切换分支到u5分支,保持导出的ONNX模型后处理和YOLOv5一致
+git checkout u5
+# 下载好yolov7.pt权重后执行:
+python export.py --weights yolov7.pt --include onnx
+```
+
+也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx)。
+
+
+(2) 转换模型:
+```
+x2paddle --framework=onnx --model=yolov7.onnx --save_dir=pd_model
+cp -r pd_model/inference_model/ yolov7_infer
+```
+即可得到YOLOv7模型的预测模型(`model.pdmodel` 和 `model.pdiparams`)。如想快速体验,可直接下载上方表格中YOLOv7的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_infer.tar)。
+
+
+预测模型的格式为:`model.pdmodel` 和 `model.pdiparams`两个,带`pdmodel`的是模型文件,带`pdiparams`后缀的是权重文件。
+
+
+#### 3.4 自动压缩并产出模型
+
+蒸馏量化自动压缩示例通过run.py脚本启动,会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数,配置完成后便可对模型进行量化和蒸馏。具体运行命令为:
+
+- 单卡训练:
+```
+export CUDA_VISIBLE_DEVICES=0
+python run.py --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
+```
+
+- 多卡训练:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
+ --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
+```
+
+#### 3.5 测试模型精度
+
+修改[yolov7_qat_dis.yaml](./configs/yolov7_qat_dis.yaml)中`model_dir`字段为模型存储路径,然后使用eval.py脚本得到模型的mAP:
+```
+export CUDA_VISIBLE_DEVICES=0
+python eval.py --config_path=./configs/yolov7_qat_dis.yaml
+```
+
+
+## 4.预测部署
+
+#### Paddle-TensorRT C++部署
+
+进入[cpp_infer](./cpp_infer)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译,然后开始测试:
+```shell
+# 编译
+bash complie.sh
+# 执行
+./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
+```
+
+#### Paddle-TensorRT Python部署:
+
+首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)。
+
+然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署:
+```shell
+python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
+```
+
+## 5.FAQ
+
+- 如果想测试离线量化模型精度,可执行:
+```shell
+python post_quant.py --config_path=./configs/yolov7_qat_dis.yaml
+```
diff --git a/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml b/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..6607e3615d30509645b34217681899c89d5eecef
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
@@ -0,0 +1,30 @@
+
+Global:
+ reader_config: configs/yolov7_reader.yaml
+ input_list: {'image': 'x2paddle_images'}
+ Evaluation: True
+ model_dir: ./yolov7_infer
+ model_filename: model.pdmodel
+ params_filename: model.pdiparams
+
+Distillation:
+ alpha: 1.0
+ loss: soft_label
+
+Quantization:
+ activation_quantize_type: 'moving_average_abs_max'
+ quantize_op_types:
+ - conv2d
+ - depthwise_conv2d
+
+TrainConfig:
+ train_iter: 8000
+ eval_iter: 1000
+ learning_rate:
+ type: CosineAnnealingDecay
+ learning_rate: 0.00003
+ T_max: 8000
+ optimizer_builder:
+ optimizer:
+ type: SGD
+ weight_decay: 0.00004
diff --git a/example/auto_compression/pytorch_yolov7/configs/yolov7_reader.yaml b/example/auto_compression/pytorch_yolov7/configs/yolov7_reader.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..cb87c3f8fde8bd0149189cd6a9f3437e7f4548e2
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/configs/yolov7_reader.yaml
@@ -0,0 +1,27 @@
+metric: COCO
+num_classes: 80
+
+# Datset configuration
+TrainDataset:
+ !COCODataSet
+ image_dir: train2017
+ anno_path: annotations/instances_train2017.json
+ dataset_dir: dataset/coco/
+
+EvalDataset:
+ !COCODataSet
+ image_dir: val2017
+ anno_path: annotations/instances_val2017.json
+ dataset_dir: dataset/coco/
+
+worker_num: 0
+
+# preprocess reader in test
+EvalReader:
+ sample_transforms:
+ - Decode: {}
+ - Resize: {target_size: [640, 640], keep_ratio: True}
+ - Pad: {size: [640, 640], fill_value: [114., 114., 114.]}
+ - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+ - Permute: {}
+ batch_size: 1
diff --git a/example/auto_compression/pytorch_yolov7/cpp_infer/CMakeLists.txt b/example/auto_compression/pytorch_yolov7/cpp_infer/CMakeLists.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d5307c657212bc3815353a55904672b6fa5679b4
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/CMakeLists.txt
@@ -0,0 +1,263 @@
+cmake_minimum_required(VERSION 3.0)
+project(cpp_inference_demo CXX C)
+option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
+option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
+option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
+option(USE_TENSORRT "Compile demo with TensorRT." OFF)
+option(WITH_ROCM "Compile demo with rocm." OFF)
+option(WITH_ONNXRUNTIME "Compile demo with ONNXRuntime" OFF)
+option(WITH_ARM "Compile demo with ARM" OFF)
+option(WITH_MIPS "Compile demo with MIPS" OFF)
+option(WITH_SW "Compile demo with SW" OFF)
+option(WITH_XPU "Compile demow ith xpu" OFF)
+option(WITH_NPU "Compile demow ith npu" OFF)
+
+if(NOT WITH_STATIC_LIB)
+ add_definitions("-DPADDLE_WITH_SHARED_LIB")
+else()
+ # PD_INFER_DECL is mainly used to set the dllimport/dllexport attribute in dynamic library mode.
+ # Set it to empty in static library mode to avoid compilation issues.
+ add_definitions("/DPD_INFER_DECL=")
+endif()
+
+macro(safe_set_static_flag)
+ foreach(flag_var
+ CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
+ CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
+ if(${flag_var} MATCHES "/MD")
+ string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
+ endif(${flag_var} MATCHES "/MD")
+ endforeach(flag_var)
+endmacro()
+
+if(NOT DEFINED PADDLE_LIB)
+ message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
+endif()
+if(NOT DEFINED DEMO_NAME)
+ message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
+endif()
+
+include_directories("${PADDLE_LIB}/")
+set(PADDLE_LIB_THIRD_PARTY_PATH "${PADDLE_LIB}/third_party/install/")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/include")
+include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/include")
+
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/lib")
+link_directories("${PADDLE_LIB}/paddle/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib")
+link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib")
+
+if (WIN32)
+ add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
+ option(MSVC_STATIC_CRT "use static C Runtime library by default" ON)
+ if (MSVC_STATIC_CRT)
+ if (WITH_MKL)
+ set(FLAG_OPENMP "/openmp")
+ endif()
+ set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
+ set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
+ set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
+ set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
+ safe_set_static_flag()
+ if (WITH_STATIC_LIB)
+ add_definitions(-DSTATIC_LIB)
+ endif()
+ endif()
+else()
+ if(WITH_MKL)
+ set(FLAG_OPENMP "-fopenmp")
+ endif()
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 ${FLAG_OPENMP}")
+endif()
+
+if(WITH_GPU)
+ if(NOT WIN32)
+ include_directories("/usr/local/cuda/include")
+ if(CUDA_LIB STREQUAL "")
+ set(CUDA_LIB "/usr/local/cuda/lib64/" CACHE STRING "CUDA Library")
+ endif()
+ else()
+ include_directories("C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\include")
+ if(CUDA_LIB STREQUAL "")
+ set(CUDA_LIB "C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\lib\\x64")
+ endif()
+ endif(NOT WIN32)
+endif()
+
+if (USE_TENSORRT AND WITH_GPU)
+ set(TENSORRT_ROOT "" CACHE STRING "The root directory of TensorRT library")
+ if("${TENSORRT_ROOT}" STREQUAL "")
+ message(FATAL_ERROR "The TENSORRT_ROOT is empty, you must assign it a value with CMake command. Such as: -DTENSORRT_ROOT=TENSORRT_ROOT_PATH ")
+ endif()
+ set(TENSORRT_INCLUDE_DIR ${TENSORRT_ROOT}/include)
+ set(TENSORRT_LIB_DIR ${TENSORRT_ROOT}/lib)
+ file(READ ${TENSORRT_INCLUDE_DIR}/NvInfer.h TENSORRT_VERSION_FILE_CONTENTS)
+ string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
+ "${TENSORRT_VERSION_FILE_CONTENTS}")
+ if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
+ file(READ ${TENSORRT_INCLUDE_DIR}/NvInferVersion.h TENSORRT_VERSION_FILE_CONTENTS)
+ string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
+ "${TENSORRT_VERSION_FILE_CONTENTS}")
+ endif()
+ if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
+ message(SEND_ERROR "Failed to detect TensorRT version.")
+ endif()
+ string(REGEX REPLACE "define NV_TENSORRT_MAJOR +([0-9]+)" "\\1"
+ TENSORRT_MAJOR_VERSION "${TENSORRT_MAJOR_VERSION}")
+ message(STATUS "Current TensorRT header is ${TENSORRT_INCLUDE_DIR}/NvInfer.h. "
+ "Current TensorRT version is v${TENSORRT_MAJOR_VERSION}. ")
+ include_directories("${TENSORRT_INCLUDE_DIR}")
+ link_directories("${TENSORRT_LIB_DIR}")
+endif()
+
+if(WITH_MKL)
+ set(MATH_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mklml")
+ include_directories("${MATH_LIB_PATH}/include")
+ if(WIN32)
+ set(MATH_LIB ${MATH_LIB_PATH}/lib/mklml${CMAKE_STATIC_LIBRARY_SUFFIX}
+ ${MATH_LIB_PATH}/lib/libiomp5md${CMAKE_STATIC_LIBRARY_SUFFIX})
+ else()
+ set(MATH_LIB ${MATH_LIB_PATH}/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
+ ${MATH_LIB_PATH}/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
+ endif()
+ set(MKLDNN_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mkldnn")
+ if(EXISTS ${MKLDNN_PATH})
+ include_directories("${MKLDNN_PATH}/include")
+ if(WIN32)
+ set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
+ else(WIN32)
+ set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
+ endif(WIN32)
+ endif()
+elseif((NOT WITH_MIPS) AND (NOT WITH_SW))
+ set(OPENBLAS_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}openblas")
+ include_directories("${OPENBLAS_LIB_PATH}/include/openblas")
+ if(WIN32)
+ set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/openblas${CMAKE_STATIC_LIBRARY_SUFFIX})
+ else()
+ set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
+ endif()
+endif()
+
+if(WITH_STATIC_LIB)
+ set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
+else()
+ if(WIN32)
+ set(DEPS ${PADDLE_LIB}/paddle/lib/paddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
+ else()
+ set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_SHARED_LIBRARY_SUFFIX})
+ endif()
+endif()
+
+if (WITH_ONNXRUNTIME)
+ if(WIN32)
+ set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.lib paddle2onnx)
+ elseif(APPLE)
+ set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.1.10.0.dylib paddle2onnx)
+ else()
+ set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.so.1.10.0 paddle2onnx)
+ endif()
+endif()
+
+if (NOT WIN32)
+ set(EXTERNAL_LIB "-lrt -ldl -lpthread")
+ set(DEPS ${DEPS}
+ ${MATH_LIB} ${MKLDNN_LIB}
+ glog gflags protobuf xxhash cryptopp
+ ${EXTERNAL_LIB})
+else()
+ set(DEPS ${DEPS}
+ ${MATH_LIB} ${MKLDNN_LIB}
+ glog gflags_static libprotobuf xxhash cryptopp-static ${EXTERNAL_LIB})
+ set(DEPS ${DEPS} shlwapi.lib)
+endif(NOT WIN32)
+
+if(WITH_GPU)
+ if(NOT WIN32)
+ if (USE_TENSORRT)
+ set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
+ set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
+ endif()
+ set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
+ else()
+ if(USE_TENSORRT)
+ set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
+ set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
+ if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
+ set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_STATIC_LIBRARY_SUFFIX})
+ endif()
+ endif()
+ set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
+ set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
+ set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX} )
+ endif()
+endif()
+
+if(WITH_ROCM AND NOT WIN32)
+ set(DEPS ${DEPS} ${ROCM_LIB}/libamdhip64${CMAKE_SHARED_LIBRARY_SUFFIX})
+endif()
+
+if(WITH_XPU AND NOT WIN32)
+ set(XPU_INSTALL_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}xpu")
+ set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpuapi${CMAKE_SHARED_LIBRARY_SUFFIX})
+ set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpurt${CMAKE_SHARED_LIBRARY_SUFFIX})
+endif()
+
+if(WITH_NPU AND NOT WIN32)
+ set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libgraph${CMAKE_SHARED_LIBRARY_SUFFIX})
+ set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libge_runner${CMAKE_SHARED_LIBRARY_SUFFIX})
+ set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
+ set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
+ set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libacl_op_compiler${CMAKE_SHARED_LIBRARY_SUFFIX})
+endif()
+
+add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
+target_link_libraries(${DEMO_NAME} ${DEPS})
+if(WIN32)
+ if(USE_TENSORRT)
+ add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_SHARED_LIBRARY_SUFFIX}
+ ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
+ COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX}
+ ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
+ )
+ if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
+ add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_SHARED_LIBRARY_SUFFIX}
+ ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE})
+ endif()
+ endif()
+ if(WITH_MKL)
+ add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/mklml.dll ${CMAKE_BINARY_DIR}/Release
+ COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/libiomp5md.dll ${CMAKE_BINARY_DIR}/Release
+ COMMAND ${CMAKE_COMMAND} -E copy ${MKLDNN_PATH}/lib/mkldnn.dll ${CMAKE_BINARY_DIR}/Release
+ )
+ else()
+ add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy ${OPENBLAS_LIB_PATH}/lib/openblas.dll ${CMAKE_BINARY_DIR}/Release
+ )
+ endif()
+ if(WITH_ONNXRUNTIME)
+ add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.dll
+ ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
+ COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib/paddle2onnx.dll
+ ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
+ )
+ endif()
+ if(NOT WITH_STATIC_LIB)
+ add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
+ COMMAND ${CMAKE_COMMAND} -E copy "${PADDLE_LIB}/paddle/lib/paddle_inference.dll" ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
+ )
+ endif()
+endif()
diff --git a/example/auto_compression/pytorch_yolov7/cpp_infer/README.md b/example/auto_compression/pytorch_yolov7/cpp_infer/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..04c1c23bc027b228ba0f67416d9966c3b1cd1133
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/README.md
@@ -0,0 +1,51 @@
+# YOLOv7 TensorRT Benchmark测试(Linux)
+
+## 环境准备
+
+- CUDA、CUDNN:确认环境中已经安装CUDA和CUDNN,并且提前获取其安装路径。
+
+- TensorRT:可通过NVIDIA官网下载[TensorRT 8.4.1.5](https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.4.1/tars/tensorrt-8.4.1.5.linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz)或其他版本安装包。
+
+- Paddle Inference C++预测库:编译develop版本请参考[编译文档](https://www.paddlepaddle.org.cn/inference/user_guides/source_compile.html)。编译完成后,会在build目录下生成`paddle_inference_install_dir`文件夹,这个就是我们需要的C++预测库文件。
+
+## 编译可执行程序
+
+- (1)修改`compile.sh`中依赖库路径,主要是以下内容:
+```shell
+# Paddle Inference预测库路径
+LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
+# CUDNN路径
+CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
+# CUDA路径
+CUDA_LIB=/usr/local/cuda/lib64
+# TensorRT安装包路径,为TRT资源包解压完成后的绝对路径,其中包含`lib`和`include`文件夹
+TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
+```
+
+## 测试
+
+- FP32
+```
+./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp32
+```
+
+- FP16
+```
+./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp16
+```
+
+- INT8
+```
+./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
+```
+
+## 性能对比
+
+| 预测库 | 模型 | 预测时延FP32
(ms) |预测时延FP16
(ms) | 预测时延INT8
(ms) |
+| :--------: | :--------: |:-------- |:--------: | :---------------------: |
+| Paddle TensorRT | YOLOv7 | 26.84ms | 7.44ms | 4.55ms |
+| TensorRT | YOLOv7 | 28.25ms | 7.23ms | 4.67ms |
+
+环境:
+- Tesla T4,TensorRT 8.4.1,CUDA 11.2
+- batch_size=1
diff --git a/example/auto_compression/pytorch_yolov7/cpp_infer/compile.sh b/example/auto_compression/pytorch_yolov7/cpp_infer/compile.sh
new file mode 100644
index 0000000000000000000000000000000000000000..afff924b4f63c37e2ae8906d505e046066e0a906
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/compile.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+set +x
+set -e
+
+work_path=$(dirname $(readlink -f $0))
+
+mkdir -p build
+cd build
+rm -rf *
+
+DEMO_NAME=trt_run
+
+WITH_MKL=ON
+WITH_GPU=ON
+USE_TENSORRT=ON
+
+LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
+CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
+CUDA_LIB=/usr/local/cuda/lib64
+TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
+
+WITH_ROCM=OFF
+ROCM_LIB=/opt/rocm/lib
+
+cmake .. -DPADDLE_LIB=${LIB_DIR} \
+ -DWITH_MKL=${WITH_MKL} \
+ -DDEMO_NAME=${DEMO_NAME} \
+ -DWITH_GPU=${WITH_GPU} \
+ -DWITH_STATIC_LIB=OFF \
+ -DUSE_TENSORRT=${USE_TENSORRT} \
+ -DWITH_ROCM=${WITH_ROCM} \
+ -DROCM_LIB=${ROCM_LIB} \
+ -DCUDNN_LIB=${CUDNN_LIB} \
+ -DCUDA_LIB=${CUDA_LIB} \
+ -DTENSORRT_ROOT=${TENSORRT_ROOT}
+
+make -j
diff --git a/example/auto_compression/pytorch_yolov7/cpp_infer/trt_run.cc b/example/auto_compression/pytorch_yolov7/cpp_infer/trt_run.cc
new file mode 100644
index 0000000000000000000000000000000000000000..0ae055acb6e42f226d023fadfd5923bd79c19bfd
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/trt_run.cc
@@ -0,0 +1,116 @@
+#include
+#include
+#include
+#include
+
+#include
+#include
+#include
+
+#include "paddle/include/paddle_inference_api.h"
+#include "paddle/include/experimental/phi/common/float16.h"
+
+using paddle_infer::Config;
+using paddle_infer::Predictor;
+using paddle_infer::CreatePredictor;
+using paddle_infer::PrecisionType;
+using phi::dtype::float16;
+
+DEFINE_string(model_dir, "", "Directory of the inference model.");
+DEFINE_string(model_file, "", "Path of the inference model file.");
+DEFINE_string(params_file, "", "Path of the inference params file.");
+DEFINE_string(run_mode, "trt_fp32", "run_mode which can be: trt_fp32, trt_fp16 and trt_int8");
+DEFINE_int32(batch_size, 1, "Batch size.");
+DEFINE_int32(gpu_id, 0, "GPU card ID num.");
+DEFINE_int32(trt_min_subgraph_size, 3, "tensorrt min_subgraph_size");
+DEFINE_int32(warmup, 50, "warmup");
+DEFINE_int32(repeats, 1000, "repeats");
+
+using Time = decltype(std::chrono::high_resolution_clock::now());
+Time time() { return std::chrono::high_resolution_clock::now(); };
+double time_diff(Time t1, Time t2) {
+ typedef std::chrono::microseconds ms;
+ auto diff = t2 - t1;
+ ms counter = std::chrono::duration_cast(diff);
+ return counter.count() / 1000.0;
+}
+
+std::shared_ptr InitPredictor() {
+ Config config;
+ std::string model_path;
+ if (FLAGS_model_dir != "") {
+ config.SetModel(FLAGS_model_dir);
+ model_path = FLAGS_model_dir.substr(0, FLAGS_model_dir.find_last_of("/"));
+ } else {
+ config.SetModel(FLAGS_model_file, FLAGS_params_file);
+ model_path = FLAGS_model_file.substr(0, FLAGS_model_file.find_last_of("/"));
+ }
+ // enable tune
+ std::cout << "model_path: " << model_path << std::endl;
+ config.EnableUseGpu(256, FLAGS_gpu_id);
+ if (FLAGS_run_mode == "trt_fp32") {
+ config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
+ PrecisionType::kFloat32, false, false);
+ } else if (FLAGS_run_mode == "trt_fp16") {
+ config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
+ PrecisionType::kHalf, false, false);
+ } else if (FLAGS_run_mode == "trt_int8") {
+ config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
+ PrecisionType::kInt8, false, false);
+ }
+ config.EnableMemoryOptim();
+ config.SwitchIrOptim(true);
+ return CreatePredictor(config);
+}
+
+template
+void run(Predictor *predictor, const std::vector &input,
+ const std::vector &input_shape, type* out_data, std::vector out_shape) {
+
+ // prepare input
+ int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1,
+ std::multiplies());
+
+ auto input_names = predictor->GetInputNames();
+ auto input_t = predictor->GetInputHandle(input_names[0]);
+ input_t->Reshape(input_shape);
+ input_t->CopyFromCpu(input.data());
+
+ for (int i = 0; i < FLAGS_warmup; ++i)
+ CHECK(predictor->Run());
+
+ auto st = time();
+ for (int i = 0; i < FLAGS_repeats; ++i) {
+ auto input_names = predictor->GetInputNames();
+ auto input_t = predictor->GetInputHandle(input_names[0]);
+ input_t->Reshape(input_shape);
+ input_t->CopyFromCpu(input.data());
+
+ CHECK(predictor->Run());
+
+ auto output_names = predictor->GetOutputNames();
+ auto output_t = predictor->GetOutputHandle(output_names[0]);
+ std::vector output_shape = output_t->shape();
+ output_t -> ShareExternalData(out_data, out_shape, paddle_infer::PlaceType::kGPU);
+ }
+
+ LOG(INFO) << "[" << FLAGS_run_mode << " bs-" << FLAGS_batch_size << " ] run avg time is " << time_diff(st, time()) / FLAGS_repeats
+ << " ms";
+}
+
+int main(int argc, char *argv[]) {
+ google::ParseCommandLineFlags(&argc, &argv, true);
+ auto predictor = InitPredictor();
+ std::vector input_shape = {FLAGS_batch_size, 3, 640, 640};
+ // float16
+ using dtype = float16;
+ std::vector input_data(FLAGS_batch_size * 3 * 640 * 640, dtype(1.0));
+
+ dtype *out_data;
+ int out_data_size = FLAGS_batch_size * 25200 * 85;
+ cudaHostAlloc((void**)&out_data, sizeof(float) * out_data_size, cudaHostAllocMapped);
+
+ std::vector out_shape{ FLAGS_batch_size, 1, 25200, 85};
+ run(predictor.get(), input_data, input_shape, out_data, out_shape);
+ return 0;
+}
diff --git a/example/auto_compression/pytorch_yolov7/eval.py b/example/auto_compression/pytorch_yolov7/eval.py
new file mode 100644
index 0000000000000000000000000000000000000000..478c4e1abd50dc4a1f0697f665005bdfb1bc2b5d
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/eval.py
@@ -0,0 +1,151 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import numpy as np
+import argparse
+import paddle
+from ppdet.core.workspace import load_config, merge_config
+from ppdet.core.workspace import create
+from ppdet.metrics import COCOMetric, VOCMetric
+from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+
+from post_process import YOLOv7PostProcess
+
+
+def argsparser():
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ '--config_path',
+ type=str,
+ default=None,
+ help="path of compression strategy config.",
+ required=True)
+ parser.add_argument(
+ '--devices',
+ type=str,
+ default='gpu',
+ help="which device used to compress.")
+
+ return parser
+
+
+def reader_wrapper(reader, input_list):
+ def gen():
+ for data in reader:
+ in_dict = {}
+ if isinstance(input_list, list):
+ for input_name in input_list:
+ in_dict[input_name] = data[input_name]
+ elif isinstance(input_list, dict):
+ for input_name in input_list.keys():
+ in_dict[input_list[input_name]] = data[input_name]
+ yield in_dict
+
+ return gen
+
+
+def convert_numpy_data(data, metric):
+ data_all = {}
+ data_all = {k: np.array(v) for k, v in data.items()}
+ if isinstance(metric, VOCMetric):
+ for k, v in data_all.items():
+ if not isinstance(v[0], np.ndarray):
+ tmp_list = []
+ for t in v:
+ tmp_list.append(np.array(t))
+ data_all[k] = np.array(tmp_list)
+ else:
+ data_all = {k: np.array(v) for k, v in data.items()}
+ return data_all
+
+
+def eval():
+
+ place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
+ exe = paddle.static.Executor(place)
+
+ val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(
+ global_config["model_dir"],
+ exe,
+ model_filename=global_config["model_filename"],
+ params_filename=global_config["params_filename"])
+ print('Loaded model from: {}'.format(global_config["model_dir"]))
+
+ metric = global_config['metric']
+ for batch_id, data in enumerate(val_loader):
+ data_all = convert_numpy_data(data, metric)
+ data_input = {}
+ for k, v in data.items():
+ if isinstance(global_config['input_list'], list):
+ if k in global_config['input_list']:
+ data_input[k] = np.array(v)
+ elif isinstance(global_config['input_list'], dict):
+ if k in global_config['input_list'].keys():
+ data_input[global_config['input_list'][k]] = np.array(v)
+ outs = exe.run(val_program,
+ feed=data_input,
+ fetch_list=fetch_targets,
+ return_numpy=False)
+ res = {}
+ postprocess = YOLOv7PostProcess(
+ score_threshold=0.001, nms_threshold=0.65, multi_label=True)
+ res = postprocess(np.array(outs[0]), data_all['scale_factor'])
+ metric.update(data_all, res)
+ if batch_id % 100 == 0:
+ print('Eval iter:', batch_id)
+ metric.accumulate()
+ metric.log()
+ metric.reset()
+
+
+def main():
+ global global_config
+ all_config = load_slim_config(FLAGS.config_path)
+ global_config = all_config["Global"]
+ reader_cfg = load_config(global_config['reader_config'])
+
+ dataset = reader_cfg['EvalDataset']
+ global val_loader
+ val_loader = create('EvalReader')(reader_cfg['EvalDataset'],
+ reader_cfg['worker_num'],
+ return_list=True)
+ metric = None
+ if reader_cfg['metric'] == 'COCO':
+ clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
+ anno_file = dataset.get_anno()
+ metric = COCOMetric(
+ anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
+ elif reader_cfg['metric'] == 'VOC':
+ metric = VOCMetric(
+ label_list=dataset.get_label_list(),
+ class_num=reader_cfg['num_classes'],
+ map_type=reader_cfg['map_type'])
+ else:
+ raise ValueError("metric currently only supports COCO and VOC.")
+ global_config['metric'] = metric
+
+ eval()
+
+
+if __name__ == '__main__':
+ paddle.enable_static()
+ parser = argsparser()
+ FLAGS = parser.parse_args()
+
+ assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
+ paddle.set_device(FLAGS.devices)
+
+ main()
diff --git a/example/auto_compression/pytorch_yolov7/images/000000570688.jpg b/example/auto_compression/pytorch_yolov7/images/000000570688.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..cb304bd56c4010c08611a30dcca58ea9140cea54
Binary files /dev/null and b/example/auto_compression/pytorch_yolov7/images/000000570688.jpg differ
diff --git a/example/auto_compression/pytorch_yolov7/paddle_trt_infer.py b/example/auto_compression/pytorch_yolov7/paddle_trt_infer.py
new file mode 100644
index 0000000000000000000000000000000000000000..fedc9cc17fed3bd409d6d3a91630f05f9a907af1
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/paddle_trt_infer.py
@@ -0,0 +1,322 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import cv2
+import numpy as np
+import argparse
+import time
+
+from paddle.inference import Config
+from paddle.inference import create_predictor
+
+from post_process import YOLOv7PostProcess
+
+CLASS_LABEL = [
+ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
+ 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
+ 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
+ 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
+ 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
+ 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+ 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
+ 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+ 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
+ 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
+ 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
+ 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
+ 'hair drier', 'toothbrush'
+]
+
+
+def generate_scale(im, target_shape, keep_ratio=True):
+ """
+ Args:
+ im (np.ndarray): image (np.ndarray)
+ Returns:
+ im_scale_x: the resize ratio of X
+ im_scale_y: the resize ratio of Y
+ """
+ origin_shape = im.shape[:2]
+ if keep_ratio:
+ im_size_min = np.min(origin_shape)
+ im_size_max = np.max(origin_shape)
+ target_size_min = np.min(target_shape)
+ target_size_max = np.max(target_shape)
+ im_scale = float(target_size_min) / float(im_size_min)
+ if np.round(im_scale * im_size_max) > target_size_max:
+ im_scale = float(target_size_max) / float(im_size_max)
+ im_scale_x = im_scale
+ im_scale_y = im_scale
+ else:
+ resize_h, resize_w = target_shape
+ im_scale_y = resize_h / float(origin_shape[0])
+ im_scale_x = resize_w / float(origin_shape[1])
+ return im_scale_y, im_scale_x
+
+
+def image_preprocess(img_path, target_shape):
+ img = cv2.imread(img_path)
+ # Resize
+ im_scale_y, im_scale_x = generate_scale(img, target_shape)
+ img = cv2.resize(
+ img,
+ None,
+ None,
+ fx=im_scale_x,
+ fy=im_scale_y,
+ interpolation=cv2.INTER_LINEAR)
+ # Pad
+ im_h, im_w = img.shape[:2]
+ h, w = target_shape[:]
+ if h != im_h or w != im_w:
+ canvas = np.ones((h, w, 3), dtype=np.float32)
+ canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
+ canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
+ img = canvas
+ img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+ img = np.transpose(img, [2, 0, 1]) / 255
+ img = np.expand_dims(img, 0)
+ scale_factor = np.array([[im_scale_y, im_scale_x]])
+ return img.astype(np.float32), scale_factor
+
+
+def get_color_map_list(num_classes):
+ color_map = num_classes * [0, 0, 0]
+ for i in range(0, num_classes):
+ j = 0
+ lab = i
+ while lab:
+ color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
+ color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
+ color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
+ j += 1
+ lab >>= 3
+ color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
+ return color_map
+
+
+def draw_box(image_file, results, class_label, threshold=0.5):
+ srcimg = cv2.imread(image_file, 1)
+ for i in range(len(results)):
+ color_list = get_color_map_list(len(class_label))
+ clsid2color = {}
+ classid, conf = int(results[i, 0]), results[i, 1]
+ if conf < threshold:
+ continue
+ xmin, ymin, xmax, ymax = int(results[i, 2]), int(results[i, 3]), int(
+ results[i, 4]), int(results[i, 5])
+
+ if classid not in clsid2color:
+ clsid2color[classid] = color_list[classid]
+ color = tuple(clsid2color[classid])
+
+ cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2)
+ print(class_label[classid] + ': ' + str(round(conf, 3)))
+ cv2.putText(
+ srcimg,
+ class_label[classid] + ':' + str(round(conf, 3)), (xmin, ymin - 10),
+ cv2.FONT_HERSHEY_SIMPLEX,
+ 0.8, (0, 255, 0),
+ thickness=2)
+ return srcimg
+
+
+def load_predictor(model_dir,
+ run_mode='paddle',
+ batch_size=1,
+ device='CPU',
+ min_subgraph_size=3,
+ use_dynamic_shape=False,
+ trt_min_shape=1,
+ trt_max_shape=1280,
+ trt_opt_shape=640,
+ trt_calib_mode=False,
+ cpu_threads=1,
+ enable_mkldnn=False,
+ enable_mkldnn_bfloat16=False,
+ delete_shuffle_pass=False):
+ """set AnalysisConfig, generate AnalysisPredictor
+ Args:
+ model_dir (str): root path of __model__ and __params__
+ device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
+ run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8)
+ use_dynamic_shape (bool): use dynamic shape or not
+ trt_min_shape (int): min shape for dynamic shape in trt
+ trt_max_shape (int): max shape for dynamic shape in trt
+ trt_opt_shape (int): opt shape for dynamic shape in trt
+ trt_calib_mode (bool): If the model is produced by TRT offline quantitative
+ calibration, trt_calib_mode need to set True
+ delete_shuffle_pass (bool): whether to remove shuffle_channel_detect_pass in TensorRT.
+ Used by action model.
+ Returns:
+ predictor (PaddlePredictor): AnalysisPredictor
+ Raises:
+ ValueError: predict by TensorRT need device == 'GPU'.
+ """
+ if device != 'GPU' and run_mode != 'paddle':
+ raise ValueError(
+ "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
+ .format(run_mode, device))
+ config = Config(
+ os.path.join(model_dir, 'model.pdmodel'),
+ os.path.join(model_dir, 'model.pdiparams'))
+ if device == 'GPU':
+ # initial GPU memory(M), device ID
+ config.enable_use_gpu(200, 0)
+ # optimize graph and fuse op
+ config.switch_ir_optim(True)
+ elif device == 'XPU':
+ config.enable_lite_engine()
+ config.enable_xpu(10 * 1024 * 1024)
+ else:
+ config.disable_gpu()
+ config.set_cpu_math_library_num_threads(cpu_threads)
+ if enable_mkldnn:
+ try:
+ # cache 10 different shapes for mkldnn to avoid memory leak
+ config.set_mkldnn_cache_capacity(10)
+ config.enable_mkldnn()
+ if enable_mkldnn_bfloat16:
+ config.enable_mkldnn_bfloat16()
+ except Exception as e:
+ print(
+ "The current environment does not support `mkldnn`, so disable mkldnn."
+ )
+ pass
+
+ precision_map = {
+ 'trt_int8': Config.Precision.Int8,
+ 'trt_fp32': Config.Precision.Float32,
+ 'trt_fp16': Config.Precision.Half
+ }
+ if run_mode in precision_map.keys():
+ config.enable_tensorrt_engine(
+ workspace_size=(1 << 25) * batch_size,
+ max_batch_size=batch_size,
+ min_subgraph_size=min_subgraph_size,
+ precision_mode=precision_map[run_mode],
+ use_static=False,
+ use_calib_mode=trt_calib_mode)
+
+ if use_dynamic_shape:
+ min_input_shape = {
+ 'image': [batch_size, 3, trt_min_shape, trt_min_shape]
+ }
+ max_input_shape = {
+ 'image': [batch_size, 3, trt_max_shape, trt_max_shape]
+ }
+ opt_input_shape = {
+ 'image': [batch_size, 3, trt_opt_shape, trt_opt_shape]
+ }
+ config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape,
+ opt_input_shape)
+ print('trt set dynamic shape done!')
+
+ # disable print log when predict
+ config.disable_glog_info()
+ # enable shared memory
+ config.enable_memory_optim()
+ # disable feed, fetch OP, needed by zero_copy_run
+ config.switch_use_feed_fetch_ops(False)
+ if delete_shuffle_pass:
+ config.delete_pass("shuffle_channel_detect_pass")
+ predictor = create_predictor(config)
+ return predictor
+
+
+def predict_image(predictor,
+ image_file,
+ image_shape=[640, 640],
+ warmup=1,
+ repeats=1,
+ threshold=0.5,
+ arch='YOLOv5'):
+ img, scale_factor = image_preprocess(image_file, image_shape)
+ inputs = {}
+ if arch == 'YOLOv5':
+ inputs['x2paddle_images'] = img
+ input_names = predictor.get_input_names()
+ for i in range(len(input_names)):
+ input_tensor = predictor.get_input_handle(input_names[i])
+ input_tensor.copy_from_cpu(inputs[input_names[i]])
+
+ for i in range(warmup):
+ predictor.run()
+
+ np_boxes = None
+ predict_time = 0.
+ time_min = float("inf")
+ time_max = float('-inf')
+ for i in range(repeats):
+ start_time = time.time()
+ predictor.run()
+ output_names = predictor.get_output_names()
+ boxes_tensor = predictor.get_output_handle(output_names[0])
+ np_boxes = boxes_tensor.copy_to_cpu()
+ end_time = time.time()
+ timed = end_time - start_time
+ time_min = min(time_min, timed)
+ time_max = max(time_max, timed)
+ predict_time += timed
+
+ time_avg = predict_time / repeats
+ print('Inference time(ms): min={}, max={}, avg={}'.format(
+ round(time_min * 1000, 2),
+ round(time_max * 1000, 1), round(time_avg * 1000, 1)))
+ postprocess = YOLOv7PostProcess(
+ score_threshold=0.001, nms_threshold=0.65, multi_label=True)
+ res = postprocess(np_boxes, scale_factor)
+ res_img = draw_box(
+ image_file, res['bbox'], CLASS_LABEL, threshold=threshold)
+ cv2.imwrite('result.jpg', res_img)
+
+
+if __name__ == '__main__':
+
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ '--image_file', type=str, default=None, help="image path")
+ parser.add_argument(
+ '--model_path', type=str, help="inference model filepath")
+ parser.add_argument(
+ '--benchmark',
+ type=bool,
+ default=False,
+ help="Whether run benchmark or not.")
+ parser.add_argument(
+ '--run_mode',
+ type=str,
+ default='paddle',
+ help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
+ parser.add_argument(
+ '--device',
+ type=str,
+ default='GPU',
+ help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
+ )
+ parser.add_argument('--img_shape', type=int, default=640, help="input_size")
+ args = parser.parse_args()
+
+ predictor = load_predictor(
+ args.model_path, run_mode=args.run_mode, device=args.device)
+ warmup, repeats = 1, 1
+ if args.benchmark:
+ warmup, repeats = 50, 100
+ predict_image(
+ predictor,
+ args.image_file,
+ image_shape=[args.img_shape, args.img_shape],
+ warmup=warmup,
+ repeats=repeats)
diff --git a/example/auto_compression/pytorch_yolov7/post_process.py b/example/auto_compression/pytorch_yolov7/post_process.py
new file mode 100644
index 0000000000000000000000000000000000000000..853693daf5ac822367c0065d07ea5ef35014c336
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/post_process.py
@@ -0,0 +1,173 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+import cv2
+
+
+def box_area(boxes):
+ """
+ Args:
+ boxes(np.ndarray): [N, 4]
+ return: [N]
+ """
+ return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
+
+
+def box_iou(box1, box2):
+ """
+ Args:
+ box1(np.ndarray): [N, 4]
+ box2(np.ndarray): [M, 4]
+ return: [N, M]
+ """
+ area1 = box_area(box1)
+ area2 = box_area(box2)
+ lt = np.maximum(box1[:, np.newaxis, :2], box2[:, :2])
+ rb = np.minimum(box1[:, np.newaxis, 2:], box2[:, 2:])
+ wh = rb - lt
+ wh = np.maximum(0, wh)
+ inter = wh[:, :, 0] * wh[:, :, 1]
+ iou = inter / (area1[:, np.newaxis] + area2 - inter)
+ return iou
+
+
+def nms(boxes, scores, iou_threshold):
+ """
+ Non Max Suppression numpy implementation.
+ args:
+ boxes(np.ndarray): [N, 4]
+ scores(np.ndarray): [N, 1]
+ iou_threshold(float): Threshold of IoU.
+ """
+ idxs = scores.argsort()
+ keep = []
+ while idxs.size > 0:
+ max_score_index = idxs[-1]
+ max_score_box = boxes[max_score_index][None, :]
+ keep.append(max_score_index)
+ if idxs.size == 1:
+ break
+ idxs = idxs[:-1]
+ other_boxes = boxes[idxs]
+ ious = box_iou(max_score_box, other_boxes)
+ idxs = idxs[ious[0] <= iou_threshold]
+
+ keep = np.array(keep)
+ return keep
+
+
+class YOLOv7PostProcess(object):
+ """
+ Post process of YOLOv6 network.
+ args:
+ score_threshold(float): Threshold to filter out bounding boxes with low
+ confidence score. If not provided, consider all boxes.
+ nms_threshold(float): The threshold to be used in NMS.
+ multi_label(bool): Whether keep multi label in boxes.
+ keep_top_k(int): Number of total bboxes to be kept per image after NMS
+ step. -1 means keeping all bboxes after NMS step.
+ """
+
+ def __init__(self,
+ score_threshold=0.25,
+ nms_threshold=0.5,
+ multi_label=False,
+ keep_top_k=300):
+ self.score_threshold = score_threshold
+ self.nms_threshold = nms_threshold
+ self.multi_label = multi_label
+ self.keep_top_k = keep_top_k
+
+ def _xywh2xyxy(self, x):
+ # Convert from [x, y, w, h] to [x1, y1, x2, y2]
+ y = np.copy(x)
+ y[:, 0] = x[:, 0] - x[:, 2] / 2 # top left x
+ y[:, 1] = x[:, 1] - x[:, 3] / 2 # top left y
+ y[:, 2] = x[:, 0] + x[:, 2] / 2 # bottom right x
+ y[:, 3] = x[:, 1] + x[:, 3] / 2 # bottom right y
+ return y
+
+ def _non_max_suppression(self, prediction):
+ max_wh = 4096 # (pixels) minimum and maximum box width and height
+ nms_top_k = 30000
+
+ cand_boxes = prediction[..., 4] > self.score_threshold # candidates
+ output = [np.zeros((0, 6))] * prediction.shape[0]
+
+ for batch_id, boxes in enumerate(prediction):
+ # Apply constraints
+ boxes = boxes[cand_boxes[batch_id]]
+ if not boxes.shape[0]:
+ continue
+ # Compute conf (conf = obj_conf * cls_conf)
+ boxes[:, 5:] *= boxes[:, 4:5]
+
+ # Box (center x, center y, width, height) to (x1, y1, x2, y2)
+ convert_box = self._xywh2xyxy(boxes[:, :4])
+
+ # Detections matrix nx6 (xyxy, conf, cls)
+ if self.multi_label:
+ i, j = (boxes[:, 5:] > self.score_threshold).nonzero()
+ boxes = np.concatenate(
+ (convert_box[i], boxes[i, j + 5, None],
+ j[:, None].astype(np.float32)),
+ axis=1)
+ else:
+ conf = np.max(boxes[:, 5:], axis=1)
+ j = np.argmax(boxes[:, 5:], axis=1)
+ re = np.array(conf.reshape(-1) > self.score_threshold)
+ conf = conf.reshape(-1, 1)
+ j = j.reshape(-1, 1)
+ boxes = np.concatenate((convert_box, conf, j), axis=1)[re]
+
+ num_box = boxes.shape[0]
+ if not num_box:
+ continue
+ elif num_box > nms_top_k:
+ boxes = boxes[boxes[:, 4].argsort()[::-1][:nms_top_k]]
+
+ # Batched NMS
+ c = boxes[:, 5:6] * max_wh
+ clean_boxes, scores = boxes[:, :4] + c, boxes[:, 4]
+ keep = nms(clean_boxes, scores, self.nms_threshold)
+ # limit detection box num
+ if keep.shape[0] > self.keep_top_k:
+ keep = keep[:self.keep_top_k]
+ output[batch_id] = boxes[keep]
+ return output
+
+ def __call__(self, outs, scale_factor):
+ preds = self._non_max_suppression(outs)
+ bboxs, box_nums = [], []
+ for i, pred in enumerate(preds):
+ if len(pred.shape) > 2:
+ pred = np.squeeze(pred)
+ if len(pred.shape) == 1:
+ pred = pred[np.newaxis, :]
+ pred_bboxes = pred[:, :4]
+ scale_factor = np.tile(scale_factor[i][::-1], (1, 2))
+ pred_bboxes /= scale_factor
+ bbox = np.concatenate(
+ [
+ pred[:, -1][:, np.newaxis], pred[:, -2][:, np.newaxis],
+ pred_bboxes
+ ],
+ axis=-1)
+ bboxs.append(bbox)
+ box_num = bbox.shape[0]
+ box_nums.append(box_num)
+ bboxs = np.concatenate(bboxs, axis=0)
+ box_nums = np.array(box_nums)
+ return {'bbox': bboxs, 'bbox_num': box_nums}
diff --git a/example/auto_compression/pytorch_yolov7/post_quant.py b/example/auto_compression/pytorch_yolov7/post_quant.py
new file mode 100644
index 0000000000000000000000000000000000000000..8c86672761829d6541a6d3c827f5254d0eecc527
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/post_quant.py
@@ -0,0 +1,104 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import numpy as np
+import argparse
+import paddle
+from ppdet.core.workspace import load_config, merge_config
+from ppdet.core.workspace import create
+from ppdet.metrics import COCOMetric, VOCMetric
+from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.quant import quant_post_static
+
+
+def argsparser():
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ '--config_path',
+ type=str,
+ default=None,
+ help="path of compression strategy config.",
+ required=True)
+ parser.add_argument(
+ '--save_dir',
+ type=str,
+ default='ptq_out',
+ help="directory to save compressed model.")
+ parser.add_argument(
+ '--devices',
+ type=str,
+ default='gpu',
+ help="which device used to compress.")
+ parser.add_argument(
+ '--algo', type=str, default='KL', help="post quant algo.")
+
+ return parser
+
+
+def reader_wrapper(reader, input_list):
+ def gen():
+ for data in reader:
+ in_dict = {}
+ if isinstance(input_list, list):
+ for input_name in input_list:
+ in_dict[input_name] = data[input_name]
+ elif isinstance(input_list, dict):
+ for input_name in input_list.keys():
+ in_dict[input_list[input_name]] = data[input_name]
+ yield in_dict
+
+ return gen
+
+
+def main():
+ global global_config
+ all_config = load_slim_config(FLAGS.config_path)
+ assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
+ global_config = all_config["Global"]
+ reader_cfg = load_config(global_config['reader_config'])
+
+ train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
+ reader_cfg['worker_num'],
+ return_list=True)
+ train_loader = reader_wrapper(train_loader, global_config['input_list'])
+
+ place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
+ exe = paddle.static.Executor(place)
+ quant_post_static(
+ executor=exe,
+ model_dir=global_config["model_dir"],
+ quantize_model_path=FLAGS.save_dir,
+ data_loader=train_loader,
+ model_filename=global_config["model_filename"],
+ params_filename=global_config["params_filename"],
+ batch_size=32,
+ batch_nums=10,
+ algo=FLAGS.algo,
+ hist_percent=0.999,
+ is_full_quantize=False,
+ bias_correction=False,
+ onnx_format=False)
+
+
+if __name__ == '__main__':
+ paddle.enable_static()
+ parser = argsparser()
+ FLAGS = parser.parse_args()
+
+ assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
+ paddle.set_device(FLAGS.devices)
+
+ main()
diff --git a/example/auto_compression/pytorch_yolov7/run.py b/example/auto_compression/pytorch_yolov7/run.py
new file mode 100644
index 0000000000000000000000000000000000000000..ed73d81aaea20a37ad922d880fecd75f815337ce
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/run.py
@@ -0,0 +1,172 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import numpy as np
+import argparse
+import paddle
+from ppdet.core.workspace import load_config, merge_config
+from ppdet.core.workspace import create
+from ppdet.metrics import COCOMetric, VOCMetric
+from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.auto_compression import AutoCompression
+
+from post_process import YOLOv7PostProcess
+
+
+def argsparser():
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ '--config_path',
+ type=str,
+ default=None,
+ help="path of compression strategy config.",
+ required=True)
+ parser.add_argument(
+ '--save_dir',
+ type=str,
+ default='output',
+ help="directory to save compressed model.")
+ parser.add_argument(
+ '--devices',
+ type=str,
+ default='gpu',
+ help="which device used to compress.")
+ parser.add_argument(
+ '--eval', type=bool, default=False, help="whether to run evaluation.")
+
+ return parser
+
+
+def reader_wrapper(reader, input_list):
+ def gen():
+ for data in reader:
+ in_dict = {}
+ if isinstance(input_list, list):
+ for input_name in input_list:
+ in_dict[input_name] = data[input_name]
+ elif isinstance(input_list, dict):
+ for input_name in input_list.keys():
+ in_dict[input_list[input_name]] = data[input_name]
+ yield in_dict
+
+ return gen
+
+
+def convert_numpy_data(data, metric):
+ data_all = {}
+ data_all = {k: np.array(v) for k, v in data.items()}
+ if isinstance(metric, VOCMetric):
+ for k, v in data_all.items():
+ if not isinstance(v[0], np.ndarray):
+ tmp_list = []
+ for t in v:
+ tmp_list.append(np.array(t))
+ data_all[k] = np.array(tmp_list)
+ else:
+ data_all = {k: np.array(v) for k, v in data.items()}
+ return data_all
+
+
+def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
+ metric = global_config['metric']
+ for batch_id, data in enumerate(val_loader):
+ data_all = convert_numpy_data(data, metric)
+ data_input = {}
+ for k, v in data.items():
+ if isinstance(global_config['input_list'], list):
+ if k in test_feed_names:
+ data_input[k] = np.array(v)
+ elif isinstance(global_config['input_list'], dict):
+ if k in global_config['input_list'].keys():
+ data_input[global_config['input_list'][k]] = np.array(v)
+ outs = exe.run(compiled_test_program,
+ feed=data_input,
+ fetch_list=test_fetch_list,
+ return_numpy=False)
+ res = {}
+ postprocess = YOLOv7PostProcess(
+ score_threshold=0.001, nms_threshold=0.65, multi_label=True)
+ res = postprocess(np.array(outs[0]), data_all['scale_factor'])
+ metric.update(data_all, res)
+ if batch_id % 100 == 0:
+ print('Eval iter:', batch_id)
+ metric.accumulate()
+ metric.log()
+ map_res = metric.get_results()
+ metric.reset()
+ return map_res['bbox'][0]
+
+
+def main():
+ global global_config
+ all_config = load_slim_config(FLAGS.config_path)
+ assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
+ global_config = all_config["Global"]
+ reader_cfg = load_config(global_config['reader_config'])
+
+ train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
+ reader_cfg['worker_num'],
+ return_list=True)
+ train_loader = reader_wrapper(train_loader, global_config['input_list'])
+
+ if 'Evaluation' in global_config.keys() and global_config[
+ 'Evaluation'] and paddle.distributed.get_rank() == 0:
+ eval_func = eval_function
+ dataset = reader_cfg['EvalDataset']
+ global val_loader
+ _eval_batch_sampler = paddle.io.BatchSampler(
+ dataset, batch_size=reader_cfg['EvalReader']['batch_size'])
+ val_loader = create('EvalReader')(dataset,
+ reader_cfg['worker_num'],
+ batch_sampler=_eval_batch_sampler,
+ return_list=True)
+ metric = None
+ if reader_cfg['metric'] == 'COCO':
+ clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
+ anno_file = dataset.get_anno()
+ metric = COCOMetric(
+ anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
+ elif reader_cfg['metric'] == 'VOC':
+ metric = VOCMetric(
+ label_list=dataset.get_label_list(),
+ class_num=reader_cfg['num_classes'],
+ map_type=reader_cfg['map_type'])
+ else:
+ raise ValueError("metric currently only supports COCO and VOC.")
+ global_config['metric'] = metric
+ else:
+ eval_func = None
+
+ ac = AutoCompression(
+ model_dir=global_config["model_dir"],
+ model_filename=global_config["model_filename"],
+ params_filename=global_config["params_filename"],
+ save_dir=FLAGS.save_dir,
+ config=all_config,
+ train_dataloader=train_loader,
+ eval_callback=eval_func)
+ ac.compress()
+
+
+if __name__ == '__main__':
+ paddle.enable_static()
+ parser = argsparser()
+ FLAGS = parser.parse_args()
+
+ assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
+ paddle.set_device(FLAGS.devices)
+
+ main()
diff --git a/example/auto_compression/semantic_segmentation/README.md b/example/auto_compression/semantic_segmentation/README.md
index aa3842ebb99094185b854705122dfd5f7f255728..f01bb9045d6be6f8b83a417b7ceffc84ed4d9338 100644
--- a/example/auto_compression/semantic_segmentation/README.md
+++ b/example/auto_compression/semantic_segmentation/README.md
@@ -14,12 +14,10 @@
## 1.简介
-本示例将以语义分割模型PP-HumanSeg-Lite为例,介绍如何使用PaddleSeg中Inference部署模型进行自动压缩。本示例使用的自动压缩策略为非结构化稀疏、蒸馏和量化、蒸馏。
+本示例将以语义分割模型[PP-HumanSeg-Lite](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/contrib/PP-HumanSeg#portrait-segmentation)为例,介绍如何使用PaddleSeg中Inference部署模型进行自动压缩。本示例使用的自动压缩策略为非结构化稀疏、蒸馏和量化、蒸馏。
## 2.Benchmark
-- [PP-HumanSeg-Lite](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/contrib/PP-HumanSeg#portrait-segmentation)
-
| 模型 | 策略 | Total IoU | ARM CPU耗时(ms)
thread=1 |Nvidia GPU耗时(ms)| 配置文件 | Inference模型 |
|:-----:|:-----:|:----------:|:---------:| :------:|:------:|:------:|
| PP-HumanSeg-Lite | Baseline | 92.87 | 56.363 |-| - | [model](https://paddleseg.bj.bcebos.com/dygraph/ppseg/ppseg_lite_portrait_398x224_with_softmax.tar.gz) |
@@ -34,7 +32,7 @@
| Deeplabv3-ResNet50 | Baseline | 79.90 | -|12.766| -| [model](https://paddleseg.bj.bcebos.com/tipc/easyedge/RES-paddle2-Deeplabv3-ResNet50.zip)|
| Deeplabv3-ResNet50 | 量化训练 | 78.89 | - |8.839|[config](./configs/deeplabv3/deeplabv3_qat.yaml) | - |
-- ARM CPU测试环境:`SDM710 2*A75(2.2GHz) 6*A55(1.7GHz)`;
+- ARM CPU测试环境:`高通骁龙710处理器(SDM710 2*A75(2.2GHz) 6*A55(1.7GHz))`;
- Nvidia GPU测试环境:
@@ -65,6 +63,11 @@ pip install paddlepaddle-gpu
pip install paddleslim
```
+准备paddleslim示例代码:
+```shell
+git clone https://github.com/PaddlePaddle/PaddleSlim.git
+```
+
安装paddleseg
```shell
@@ -77,15 +80,16 @@ pip install paddleseg
开发者可下载开源数据集 (如[AISegment](https://github.com/aisegmentcn/matting_human_datasets)) 或自定义语义分割数据集。请参考[PaddleSeg数据准备文档](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/data/marker/marker_cn.md)来检查、对齐数据格式即可。
-本示例使用示例开源数据集 AISegment 数据集为例介绍如何对PP-HumanSeg-Lite进行自动压缩。示例中的数据集仅用于快速跑通自动压缩流程,并不能复现出 benckmark 表中的压缩效果。
+本示例使用示例开源数据集 AISegment 数据集为例介绍如何对PP-HumanSeg-Lite进行自动压缩。示例数据集仅用于快速跑通自动压缩流程,并不能复现出 benckmark 表中的压缩效果。
可以通过以下命令下载人像分割示例数据:
```shell
+cd PaddleSlim/example/auto_compression/semantic_segmentation
python ./data/download_data.py mini_humanseg
### 下载后的数据位置为 ./data/humanseg/
```
-** 提示: **
+**提示:**
- PP-HumanSeg-Lite压缩过程使用的数据集
- 数据集:AISegment + PP-HumanSeg14K + 内部自建数据集。其中 AISegment 是开源数据集,可从[链接](https://github.com/aisegmentcn/matting_human_datasets)处获取;PP-HumanSeg14K 是 PaddleSeg 自建数据集,可从[官方渠道](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/contrib/PP-HumanSeg/paper.md#pp-humanseg14k-a-large-scale-teleconferencing-video-dataset)获取;内部数据集不对外公开。