# 模型自动化压缩工具ACT（Auto Compression Toolkit） ------------------------------------------------------------------------------------------

特性 | Benchmark | 安装 | 快速开始 | 进阶使用 | 社区交流

## **简介** PaddleSlim推出全新自动化压缩工具（Auto Compression Toolkit, ACT），旨在通过Source-Free的方式，自动对预测模型进行压缩，压缩后模型可直接部署应用。 ## **News** 📢 * 🔥 【**直播分享**】**2022.11.7 晚 20:30～21:30《PaddleSlim自动压缩CV专场》。扫码报名，进入直播技术交流群** * 🔥 【**直播分享**】**2022.11.8 晚 20:30～21:30《PaddleSlim自动压缩NLP专场》。扫码报名，进入直播技术交流群**

## **特性** - **🚀『解耦训练代码』** ：开发者无需了解或修改模型源码，直接使用导出的预测模型进行压缩； - **🎛️『全流程自动优化』** ：开发者简单配置即可启动压缩，ACT工具会自动优化得到最好预测模型； - **📦『支持丰富压缩算法』** ：ACT中提供了量化训练、蒸馏、结构化剪枝、非结构化剪枝、多种离线量化方法及超参搜索等等，可任意搭配使用 ### **ACT核心思想** 相比于传统手工压缩，自动化压缩的“自动”主要体现在4个方面：解耦训练代码、离线量化超参搜索、策略自动组合、硬件感知（硬件延时预估）。

### **模型压缩效果示例** ACT相比传统的模型压缩方法， - 代码量减少 50% 以上 - 压缩精度与手工压缩基本持平。在 **[PP-YOLOE](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/detection)** 模型上，效果优于手动压缩 - 自动化压缩后的推理性能收益与手工压缩持平，相比压缩前，推理速度可以提升1.4~7.1倍。

### **模型压缩效果Benchmark** | 模型类型 | model name | 压缩前
精度(Top1 Acc %) | 压缩后
精度(Top1 Acc %) | 压缩前
推理时延（ms） | 压缩后
推理时延（ms） | 推理
加速比 | 芯片 | | ------------------------------- | ----------------------------- | ---------------------- | ---------------------- | ---------------- | ---------------- | ---------- | --------------- | | [图像分类](./image_classification) | MobileNetV1 | 70.90 | 70.57 | 33.15 | 13.64 | **2.43** | SDM865（骁龙865） | | [图像分类](./image_classification) | MobileNetV3_large_x1_0 | 75.32 | 74.04 | 16.62 | 9.85 | **1.69** | SDM865（骁龙865） | | [图像分类](./image_classification) | MobileNetV3_large_x1_0_ssld | 78.96 | 77.17 | 16.62 | 9.85 | **1.69** | SDM865（骁龙865） | | [图像分类](./image_classification) | ShuffleNetV2_x1_0 | 68.65 | 68.32 | 10.43 | 5.51 | **1.89** | SDM865（骁龙865） | | [图像分类](./image_classification) | SqueezeNet1_0_infer | 59.60 | 59.45 | 35.98 | 16.96 | **2.12** | SDM865（骁龙865） | | [图像分类](./image_classification) | PPLCNetV2_base | 76.86 | 76.39 | 36.50 | 15.79 | **2.31** | SDM865（骁龙865） | | [图像分类](./image_classification) | ResNet50_vd | 79.12 | 78.74 | 3.19 | 0.92 | **3.47** | NVIDIA Tesla T4 | | [图像分类](./image_classification) | PPHGNet_tiny | 79.59 | 79.20 | 2.82 | 0.98 | **2.88** | NVIDIA Tesla T4 | | [图像分类](./image_classification) | InceptionV3 | 79.14 | 78.32 | 4.79 | 1.47 | **3.26** | NVIDIA Tesla T4 | | [图像分类](./image_classification) | EfficientNetB0 | 77.02 | 74.27 | 1.95 | 1.44 | **1.35** | NVIDIA Tesla T4 | | [图像分类](./image_classification) | GhostNet_x1_0 | 74.02 | 72.62 | 2.93 | 1.03 | **2.84** | NVIDIA Tesla T4 | | [图像分类](./image_classification) | ViT_base_patch16_224 | 81.89 | 82.05 | 367.17 | 51.70 | **7.10** | NVIDIA Tesla T4 | | [语义分割](./semantic_segmentation) | PP-HumanSeg-Lite | 92.87 | 92.35 | 56.36 | 37.71 | **1.49** | SDM710 | | [语义分割](./semantic_segmentation) | PP-LiteSeg | 77.04 | 76.93 | 1.43 | 1.16 | **1.23** | NVIDIA Tesla T4 | | [语义分割](./semantic_segmentation) | HRNet | 78.97 | 78.90 | 8.188 | 5.812 | **1.41** | NVIDIA Tesla T4 | | [语义分割](./semantic_segmentation) | UNet | 65.00 | 64.93 | 15.29 | 10.23 | **1.49** | NVIDIA Tesla T4 | | [语义分割](./semantic_segmentation) | Deeplabv3-ResNet50 | 79.90 | 79.26 | 12.766 | 8.839 | **1.44** | NVIDIA Tesla T4 | | [语义分割](./semantic_segmentation) | BiSeNetV2 | 73.17 | 73.20 | 35.61 | 15.94 | **2.23** | NVIDIA Tesla T4 | | [NLP](./nlp) | PP-MiniLM | 72.81 | 72.44 | 128.01 | 17.97 | **7.12** | NVIDIA Tesla T4 | | [NLP](./nlp) | ERNIE 3.0-Medium | 73.09 | 72.16 | 29.25(fp16) | 19.61 | **1.49** | NVIDIA Tesla T4 | | [NLP](./pytorch_huggingface) | bert-base-cased（Hugging-Face） | 81.35 | 81.51 | 11.60 | 4.83 | **2.40** | NVIDIA Tesla T4 | | [目标检测](./detection) | SSD-MobileNetv1 | 73.8(voc) | 73.52 | 4.0 | 1.7 | **2.35** | NVIDIA Tesla T4 | | [目标检测](./pytorch_yolo_series) | YOLOv5s
(PyTorch) | 37.4 | 36.9 | 5.95 | 1.87 | **3.18** | NVIDIA Tesla T4 | | [目标检测](./pytorch_yolo_series) | YOLOv6s
(PyTorch) | 42.4 | 41.3 | 9.06 | 1.83 | **4.95** | NVIDIA Tesla T4 | | [目标检测](./pytorch_yolo_series) | YOLOv6s_v2(PyTorch) | 43.4 | 43.0 | 9.06 | 1.83 | **4.95** | NVIDIA Tesla T4 | | [目标检测](./pytorch_yolo_series) | YOLOv7-Tiny(PyTorch) | 37.3 | 37.0 | 5.06 | 1.68 | **3.01** | NVIDIA Tesla T4 | | [目标检测](./pytorch_yolo_series) | YOLOv7
(PyTorch) | 51.1 | 50.8 | 26.84 | 4.55 | **5.89** | NVIDIA Tesla T4 | | [目标检测](./detection) | PP-YOLOE-l | 50.9 | 50.6 | 11.2 | 6.7 | **1.67** | NVIDIA Tesla T4 | | [目标检测](./detection) | PP-YOLOE-s | 43.1 | 42.6 | 6.51 | 2.12 | **3.07** | NVIDIA Tesla T4 | | [图像分类](./image_classification) | MobileNetV1
(TensorFlow) | 71.0 | 70.22 | 30.45 | 15.86 | **1.92** | SDMM865（骁龙865） | - 备注：目标检测精度指标为mAP（0.5:0.95）精度测量结果。图像分割精度指标为IoU精度测量结果。 - 更多飞桨模型应用示例及Benchmark可以参考：[图像分类](./image_classification)，[目标检测](./detection)，[语义分割](./semantic_segmentation)，[自然语言处理](./nlp) - 更多其它框架应用示例及Benchmark可以参考：[YOLOv5(PyTorch)](./pytorch_yolo_series)，[YOLOv6(PyTorch)](./pytorch_yolo_series)，[YOLOv7(PyTorch)](./pytorch_yolo_series)，[HuggingFace(PyTorch)](./pytorch_huggingface)，[MobileNet(TensorFlow)](./tensorflow_mobilenet)。 ## **环境准备** - 安装PaddlePaddle >= 2.3.2：（可以参考[飞桨官网安装文档](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装） ```shell # CPU pip install paddlepaddle --upgrade # GPU 以CUDA11.2为例 python -m pip install paddlepaddle-gpu==2.3.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html ``` - 安装PaddleSlim >=2.3.3： ```shell pip install paddleslim==2.3.3 ``` ## **快速开始** - **1. 准备模型及数据集** ```shell # 下载MobileNet预测模型 wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_infer.tar tar -xf MobileNetV1_infer.tar # 下载ImageNet小型数据集 wget https://sys-p0.bj.bcebos.com/slim_ci/ILSVRC2012_data_demo.tar.gz tar -xf ILSVRC2012_data_demo.tar.gz ``` - **2.运行自动化压缩** 由于目前离线量化超参搜索仅支持Linux系统，以下默认示例需在Linux环境中测试。如果想要在Windows环境中测试，可以使用代码中Windows环境的config，由于Windows环境中配置的压缩策略为量化训练，所以需要全量数据集，否则会有一定的精度下降。 ```python # 导入依赖包 import paddle from PIL import Image from paddle.vision.datasets import DatasetFolder from paddle.vision.transforms import transforms from paddleslim.auto_compression import AutoCompression paddle.enable_static() # 定义DataSet class ImageNetDataset(DatasetFolder): def __init__(self, path, image_size=224): super(ImageNetDataset, self).__init__(path) normalize = transforms.Normalize( mean=[123.675, 116.28, 103.53], std=[58.395, 57.120, 57.375]) self.transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(image_size), transforms.Transpose(), normalize ]) def __getitem__(self, idx): img_path, _ = self.samples[idx] return self.transform(Image.open(img_path).convert('RGB')) def __len__(self): return len(self.samples) # 定义DataLoader train_dataset = ImageNetDataset("./ILSVRC2012_data_demo/ILSVRC2012/train/") image = paddle.static.data( name='inputs', shape=[None] + [3, 224, 224], dtype='float32') train_loader = paddle.io.DataLoader(train_dataset, feed_list=[image], batch_size=32, return_list=False) # 开始自动压缩 ac = AutoCompression( model_dir="./MobileNetV1_infer", model_filename="inference.pdmodel", params_filename="inference.pdiparams", save_dir="MobileNetV1_quant", config={"QuantPost": {}, "HyperParameterOptimization": {'ptq_algo': ['avg'], 'max_quant_count': 3}}, ### config={"QuantAware": {}, "Distillation": {}}, ### 如果您的系统为Windows系统, 请使用当前这一行配置 train_dataloader=train_loader, eval_dataloader=train_loader) ac.compress() ``` - **3.精度测试** - 测试压缩前模型的精度: ```shell CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py ### Eval Top1: 0.7171724759615384 ``` - 测试量化模型的精度: ```shell CUDA_VISIBLE_DEVICES=0 python ./image_classification/eval.py --model_dir='MobileNetV1_quant' ### Eval Top1: 0.7166466346153846 ``` - 量化后模型的精度相比量化前的模型几乎精度无损，由于是使用的超参搜索的方法来选择的量化参数，所以每次运行得到的量化模型精度会有些许波动。 - **4.推理速度测试** - 量化模型速度的测试依赖推理库的支持，所以确保安装的是带有TensorRT的PaddlePaddle。以下示例和展示的测试结果是基于Tesla V100、CUDA 10.2、Python3.7、TensorRT得到的。 - 使用以下指令查看本地cuda版本，并且在[下载链接](https://www.paddlepaddle.org.cn/inference/user_guides/download_lib.html#python)中下载对应cuda版本和对应python版本的PaddlePaddle安装包。 ```shell cat /usr/local/cuda/version.txt ### CUDA Version 10.2.89 ### 10.2.89 为cuda版本号，可以根据这个版本号选择需要安装的带有TensorRT的PaddlePaddle安装包。 ``` - 安装下载的whl包：（这里通过wget下载到的是Python3.7、CUDA10.2、TensorRT7的PaddlePaddle安装包（注意需要自己安装TensorRT），若您的环境和示例环境不同，请依赖您自己机器的环境下载对应的安装包，否则运行示例代码会报错。） ``` wget https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl pip install paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl --force-reinstall ``` - 测试FP32模型的速度 ``` python ./image_classification/paddle_inference_eval.py --model_path='./MobileNetV1_infer' --use_gpu=True --use_trt=True ### using tensorrt FP32 batch size: 1 time(ms): 0.6140608787536621 ``` - 测试FP16模型的速度 ``` python ./image_classification/paddle_inference_eval.py --model_path='./MobileNetV1_infer' --use_gpu=True --use_trt=True --use_fp16=True ### using tensorrt FP16 batch size: 1 time(ms): 0.5795984268188477 ``` - 测试INT8模型的速度 ``` python ./image_classification/paddle_inference_eval.py --model_path='./MobileNetV1_quant/' --use_gpu=True --use_trt=True --use_int8=True ### using tensorrt INT8 batch size: 1 time(ms): 0.5213963985443115 ``` - **提示：** - DataLoader传入的数据集是待压缩模型所用的数据集，DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader，或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。 - 自动化压缩Config中定义量化、蒸馏、剪枝等压缩算法会合并执行，压缩策略有：量化+蒸馏，剪枝+蒸馏等等。示例中选择的配置为离线量化超参搜索。 - 如果要压缩的模型参数是存储在各自分离的文件中，需要先通过[convert.py](./convert.py) 脚本将其保存成一个单独的二进制文件。 ## 进阶使用 - ACT可以自动处理常见的预测模型，如果有更特殊的改造需求，可以参考[ACT超参配置教程](./hyperparameter_tutorial.md)来进行单独配置压缩策略。 ## 社区交流 - 微信扫描二维码并填写问卷之后，加入技术交流群

- 如果你发现任何关于ACT自动化压缩工具的问题或者是建议, 欢迎通过[GitHub Issues](https://github.com/PaddlePaddle/PaddleSlim/issues)给我们提issues。同时欢迎贡献更多优秀模型，共建开源生态。 ## License 本项目遵循[Apache-2.0开源协议](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/LICENSE)