Add PicoDet PTQ DEMO (#1383)

aa122039 · Chang Xu · GitHub · f48ee7d6 · aa122039 · aa122039
11 changed file
--- a/example/auto_compression/detection/README.md
+++ b/example/auto_compression/detection/README.md
@@ -145,7 +145,5 @@ python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml

 ## 5.FAQ

- 如果想测试离线量化模型精度，可执行：
-```shell
-python post_quant.py --config_path=./configs/ppyoloe_s_qat_dis.yaml
-```
+
+- 如果想对模型进行离线量化，可进入[Detection模型离线量化示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/detection)中进行实验。
--- a/example/post_training_quantization/detection/README.md
+++ b/example/post_training_quantization/detection/README.md
+# 目标检测模型离线量化示例
+
+目录：
+- [1.简介](#1简介)
+- [2.Benchmark](#2Benchmark)
+- [3.开始离线量化](#离线量化流程)
+  - [3.1 准备环境](#31-准备环境)
+  - [3.2 准备数据集](#32-准备数据集)
+  - [3.3 准备预测模型](#33-准备预测模型)
+  - [3.4 测试模型精度](#34-测试模型精度)
+  - [3.5 离线量化并产出模型](#35-离线量化并产出模型)
+  - [3.6 提高离线量化精度](#36-提高离线量化精度)
+
+- [4.预测部署](#4预测部署)
+- [5.FAQ](5FAQ)
+
+## 1. 简介
+本示例将以目标检测模型PP-YOLOE和PicoDet为例，介绍如何使用PaddleDetection中Inference部署模型，使用离线量化功能进行压缩，并使用敏感度分析功能提升离线量化精度。
+
+
+## 2.Benchmark
+
+| 模型  |  策略  | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |  配置文件 | Inference模型  |
+| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
+| PP-YOLOE-s |  Base模型 | 640*640  |  43.1   |   11.2ms  |   7.7ms   |    -    |    -   | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_s_300e_coco.tar) |
+| PP-YOLOE-s |  离线量化 | 640*640  |  42.6    |     -     |     -     |  6.7ms  |    -   |   [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_s_ptq.tar) |
+|  |  |  |  |  |  |  |  |  |
+| PicoDet-s |  Base模型 | 416*416  |  32.5   |   -  |   -   |  -  |  - | [Model](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar) |
+| PicoDet-s |  离线量化(量化分析前) | 416*416  |  0.0   |   - |   -   |  -  |  -  | - |
+| PicoDet-s |  离线量化(量化分析后) | 416*416  |  24.9   |   - |   -   |  -  |  -  | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_ptq.tar) |
+
+- mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+
+
+## 3. 离线量化流程
+
+#### 3.1 准备环境
+- PaddlePaddle >= 2.3 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
+- PaddleSlim >= 2.3
+- PaddleDet >= 2.4
+- opencv-python
+
+安装paddlepaddle：
+```shell
+# CPU
+pip install paddlepaddle
+# GPU
+pip install paddlepaddle-gpu
+```
+
+安装paddleslim：
+```shell
+pip install paddleslim
+```
+
+安装paddledet：
+```shell
+pip install paddledet
+```
+
+注：安装PaddleDet的目的是为了直接使用PaddleDetection中的Dataloader组件。
+
+#### 3.2 准备数据集
+
+本案例默认以COCO数据进行离线量化实验，如果自定义COCO数据，或者其他格式数据，请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。
+
+如果数据集为非COCO格式数据，请修改[configs](./configs)中reader配置文件中的Dataset字段。
+
+以PP-YOLOE模型为例，如果已经准备好数据集，请直接修改[./configs/ppyoloe_s_ptq.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。
+
+#### 3.3 准备预测模型
+
+预测模型的格式为：`model.pdmodel` 和 `model.pdiparams`两个，带`pdmodel`的是模型文件，带`pdiparams`后缀的是权重文件。
+
+注：其他像`__model__`和`__params__`分别对应`model.pdmodel` 和 `model.pdiparams`文件。
+
+
+根据[PaddleDetection文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#8-%E6%A8%A1%E5%9E%8B%E5%AF%BC%E5%87%BA) 导出Inference模型，具体可参考下方PP-YOLOE模型的导出示例：
+- 下载代码
+```
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+```
+- 导出预测模型
+
+
+- PPYOLOE-s模型，不包含NMS：如快速体验，可直接下载[PP-YOLOE-s导出模型](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_s_300e_coco.tar)
+```shell
+python tools/export_model.py \
+        -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \
+        -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams \
+        trt=True exclude_nms=True \
+```
+
+- PicoDet-s模型，包含NMS：如快速体验，可直接下载[PicoDet-s导出模型](https://paddledet.bj.bcebos.com/deploy/Inference/picodet_s_416_coco_lcnet.tar)
+
+```shell
+python tools/export_model.py -c configs/picodet/picodet_s_320_coco_lcnet.yml \
+       -o weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams \
+       --output_dir=output_inference \
+```
+
+#### 3.4 离线量化并产出模型
+
+离线量化示例通过post_quant.py脚本启动，会使用接口```paddleslim.quant.quant_post_static```对模型进行量化。配置config文件中模型路径、数据路径和量化相关的参数，配置完成后便可对模型进行离线量化。具体运行命令为：
+
+- PPYOLOE-s：
+
+```
+export CUDA_VISIBLE_DEVICES=0
+python post_quant.py --config_path=./configs/ppyoloe_s_ptq.yaml --save_dir=./ppyoloe_s_ptq
+```
+
+- PicoDet-s：
+
+```
+export CUDA_VISIBLE_DEVICES=0
+python post_quant.py --config_path=./configs/picodet_s_ptq.yaml --save_dir=./picodet_s_ptq
+```
+
+
+#### 3.5 测试模型精度
+
+使用eval.py脚本得到模型的mAP：
+```
+export CUDA_VISIBLE_DEVICES=0
+python eval.py --config_path=./configs/ppyoloe_s_ptq.yaml
+```
+
+**注意**：
+- 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。
+
+#### 3.6 提高离线量化精度
+本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据，且使用简单、能快速得到量化模型，但往往会造成较大的精度损失。PaddleSlim提供量化分析工具，会使用接口```paddleslim.quant.AnalysisQuant```，可视化展示出不适合量化的层，通过跳过这些层，提高离线量化模型精度。
+
+经过多个实验，包括尝试多种激活算法（avg，KL等）、weight的量化方式（abs_max，channel_wise_abs_max），对PicoDet-s进行离线量化后精度均为0，以PicoDet-s为例，量化分析工具具体使用方法如下：
+
+```shell
+python analysis.py --config_path=./configs/picodet_s_analysis.yaml
+```
+
+如下图，经过量化分析之后，可以发现`conv2d_1.w_0`， `conv2d_3.w_0`，`conv2d_5.w_0`， `conv2d_7.w_0`， `conv2d_9.w_0` 这些层会导致较大的精度损失，这些层均为主干网络中靠前部分的`depthwise_conv`。
+
+<p align="center">
+<img src="./images/picodet_analysis.png" width=849 hspace='10'/> <br />
+</p>
+
+经此分析，在进行离线量化时，可以跳过这些导致精度下降较多的层，可使用 [picodet_s_analyzed_ptq.yaml](./configs/picodet_s_analyzed_ptq.yaml)，然后再次进行离线量化。跳过这些层后，离线量化精度上升24.9个点。
+
+```shell
+python post_quant.py --config_path=./configs/picodet_s_analyzed_ptq.yaml --save_dir=./picodet_s_analyzed_ptq_out
+```
+
+## 4.预测部署
+预测部署可参考[Detection模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/detection)
+
+## 5.FAQ
+
+- 如果想对模型进行自动压缩，可进入[Detection模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/detection)中进行实验。
--- a/example/post_training_quantization/detection/configs/picodet_s_analyzed_ptq.yaml
+++ b/example/post_training_quantization/detection/configs/picodet_s_analyzed_ptq.yaml
+input_list: ['image', 'scale_factor']
+model_dir: ./picodet_s_416_coco_lcnet/
+model_filename: model.pdmodel
+params_filename: model.pdiparams
+skip_tensor_list: ['conv2d_9.w_0', 'conv2d_7.w_0', 'conv2d_3.w_0', 'conv2d_5.w_0', 'conv2d_1.w_0', ]
+
+metric: COCO
+num_classes: 80
+
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: /paddle/dataset/coco/
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: /paddle/dataset/coco/
+
+eval_height: &eval_height 416
+eval_width: &eval_width 416
+eval_size: &eval_size [*eval_height, *eval_width]
+
+worker_num: 0
+
+EvalReader:
+  inputs_def:
+    image_shape: [1, 3, *eval_height, *eval_width]
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_size: 32
+
--- a/example/post_training_quantization/detection/configs/ppyoloe_s_analysis.yaml
+++ b/example/post_training_quantization/detection/configs/ppyoloe_s_analysis.yaml
+input_list: ['image']
+arch: PPYOLOE    # When export exclude_nms=True, need set arch: PPYOLOE
+model_dir: ./ppyoloe_crn_s_300e_coco
+model_filename: model.pdmodel
+params_filename: model.pdiparams
+save_dir: ./analysis_results_ppyoloe
+metric: COCO
+num_classes: 80
+
+PTQ:
+  quantizable_op_type: ["conv2d", "depthwise_conv2d"]
+  weight_quantize_type: 'abs_max'
+  activation_quantize_type: 'moving_average_abs_max'
+  is_full_quantize: False
+  batch_size: 32
+  batch_nums: 10
+  
+
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: /dataset/coco/
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: /dataset/coco/
+
+worker_num: 0
+
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 32
\ No newline at end of file
--- a/example/post_training_quantization/detection/configs/ppyoloe_s_ptq.yaml
+++ b/example/post_training_quantization/detection/configs/ppyoloe_s_ptq.yaml
+input_list: ['image']
+arch: PPYOLOE    # When export exclude_nms=True, need set arch: PPYOLOE
+model_dir: ./ppyoloe_crn_s_300e_coco
+model_filename: model.pdmodel
+params_filename: model.pdiparams
+metric: COCO
+num_classes: 80
+  
+
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: /dataset/coco/
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: /dataset/coco/
+
+worker_num: 0
+
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 32
\ No newline at end of file
--- a/example/post_training_quantization/detection/eval.py
+++ b/example/post_training_quantization/detection/eval.py
@@ -20,7 +20,7 @@ import paddle
 from ppdet.core.workspace import load_config, merge_config
 from ppdet.core.workspace import create
 from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
-from paddleslim.common import load_config as load_slim_config
+from paddleslim.common import load_inference_model
 from keypoint_utils import keypoint_post_process
 from post_process import PPYOLOEPostProcess

@@ -77,34 +77,35 @@ def eval():
    place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
    exe = paddle.static.Executor(place)

-    val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(
-        global_config["model_dir"].rstrip('/'),
+    val_program, feed_target_names, fetch_targets = load_inference_model(
+        config["model_dir"].rstrip('/'),
        exe,
-        model_filename=global_config["model_filename"],
-        params_filename=global_config["params_filename"])
-    print('Loaded model from: {}'.format(global_config["model_dir"]))
+        model_filename=config["model_filename"],
+        params_filename=config["params_filename"])

-    metric = global_config['metric']
+    print('Loaded model from: {}'.format(config["model_dir"]))
+
+    metric = config['metric']
    for batch_id, data in enumerate(val_loader):
        data_all = convert_numpy_data(data, metric)
        data_input = {}
        for k, v in data.items():
-            if isinstance(global_config['input_list'], list):
-                if k in global_config['input_list']:
+            if isinstance(config['input_list'], list):
+                if k in config['input_list']:
                    data_input[k] = np.array(v)
-            elif isinstance(global_config['input_list'], dict):
-                if k in global_config['input_list'].keys():
-                    data_input[global_config['input_list'][k]] = np.array(v)
+            elif isinstance(config['input_list'], dict):
+                if k in config['input_list'].keys():
+                    data_input[config['input_list'][k]] = np.array(v)

        outs = exe.run(val_program,
                       feed=data_input,
                       fetch_list=fetch_targets,
                       return_numpy=False)
        res = {}
-        if 'arch' in global_config and global_config['arch'] == 'keypoint':
+        if 'arch' in config and config['arch'] == 'keypoint':
            res = keypoint_post_process(data, data_input, exe, val_program,
                                        fetch_targets, outs)
-        if 'arch' in global_config and global_config['arch'] == 'PPYOLOE':
+        if 'arch' in config and config['arch'] == 'PPYOLOE':
            postprocess = PPYOLOEPostProcess(
                score_threshold=0.01, nms_threshold=0.6)
            res = postprocess(np.array(outs[0]), data_all['scale_factor'])
@@ -124,34 +125,32 @@ def eval():


 def main():
-    global global_config
-    all_config = load_slim_config(FLAGS.config_path)
-    global_config = all_config["Global"]
-    reader_cfg = load_config(global_config['reader_config'])
+    global config
+    config = load_config(FLAGS.config_path)

-    dataset = reader_cfg['EvalDataset']
+    dataset = config['EvalDataset']
    global val_loader
-    val_loader = create('EvalReader')(reader_cfg['EvalDataset'],
-                                      reader_cfg['worker_num'],
+    val_loader = create('EvalReader')(config['EvalDataset'],
+                                      config['worker_num'],
                                      return_list=True)
    metric = None
-    if reader_cfg['metric'] == 'COCO':
+    if config['metric'] == 'COCO':
        clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
        anno_file = dataset.get_anno()
        metric = COCOMetric(
            anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
-    elif reader_cfg['metric'] == 'VOC':
+    elif config['metric'] == 'VOC':
        metric = VOCMetric(
            label_list=dataset.get_label_list(),
-            class_num=reader_cfg['num_classes'],
-            map_type=reader_cfg['map_type'])
-    elif reader_cfg['metric'] == 'KeyPointTopDownCOCOEval':
+            class_num=config['num_classes'],
+            map_type=config['map_type'])
+    elif config['metric'] == 'KeyPointTopDownCOCOEval':
        anno_file = dataset.get_anno()
        metric = KeyPointTopDownCOCOEval(anno_file,
                                         len(dataset), 17, 'output_eval')
    else:
        raise ValueError("metric currently only supports COCO and VOC.")
-    global_config['metric'] = metric
+    config['metric'] = metric

    eval()


--- a/example/post_training_quantization/detection/images/picodet_analysis.png
+++ b/example/post_training_quantization/detection/images/picodet_analysis.png
--- a/example/post_training_quantization/detection/post_quant.py
+++ b/example/post_training_quantization/detection/post_quant.py
@@ -41,7 +41,7 @@ def argsparser():
        default='gpu',
        help="which device used to compress.")
    parser.add_argument(
-        '--algo', type=str, default='KL', help="post quant algo.")
+        '--algo', type=str, default='avg', help="post quant algo.")

    return parser

@@ -79,8 +79,8 @@ def main():
        data_loader=train_loader,
        model_filename=config["model_filename"],
        params_filename=config["params_filename"],
-        batch_size=4,
-        batch_nums=64,
+        batch_size=32,
+        batch_nums=10,
        algo=FLAGS.algo,
        hist_percent=0.999,
        is_full_quantize=False,

--- a/example/post_training_quantization/pytorch_yolo_series/README.md
+++ b/example/post_training_quantization/pytorch_yolo_series/README.md
@@ -144,6 +144,7 @@ python post_quant.py --config_path=./configs/yolov6s_analyzed_ptq.yaml --save_di
 ```

 ## 4.预测部署
+预测部署可参考[YOLO系列模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/pytorch_yolo_series)


 ## 5.FAQ

--- a/example/post_training_quantization/pytorch_yolo_series/eval.py
+++ b/example/post_training_quantization/pytorch_yolo_series/eval.py
@@ -49,7 +49,10 @@ def eval():
    exe = paddle.static.Executor(place)

    val_program, feed_target_names, fetch_targets = load_inference_model(
-        config["model_dir"], exe, "model.pdmodel", "model.pdiparams")
+        config["model_dir"].rstrip('/'),
+        exe,
+        model_filename=config["model_filename"],
+        params_filename=config["params_filename"])

    bboxes_list, bbox_nums_list, image_id_list = [], [], []
    with tqdm(

--- a/example/post_training_quantization/pytorch_yolo_series/post_quant.py
+++ b/example/post_training_quantization/pytorch_yolo_series/post_quant.py
@@ -41,7 +41,7 @@ def argsparser():
        default='gpu',
        help="which device used to compress.")
    parser.add_argument(
-        '--algo', type=str, default='KL', help="post quant algo.")
+        '--algo', type=str, default='avg', help="post quant algo.")

    return parser