Cherry pick some PR (#1354)

96d309a7 · Guanghua Yu · GitHub · 0c34d239 · 96d309a7 · 96d309a7
108 changed file
--- a/README.md
+++ b/README.md
@@ -4,40 +4,51 @@
 </p>
 <p align="center">
-    <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-blue.svg"></a>
+    <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
-    <a href="https://paddleslim.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/Paddle?color=ffa"></a>
-    <a href="https://paddleslim.readthedocs.io/zh_CN/latest/"><img src="https://img.shields.io/badge/中文文档-最新-brightgreen.svg"></a>
+    <a href=""><img src="https://img.shields.io/badge/python-3.6.2+-aff.svg"></a>
+    <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSlim?color=9ea"></a>
+    <a href="https://pypi.org/project/PaddleSlim/"><img src="https://img.shields.io/pypi/dm/PaddleSlim?color=9cf"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSlim?color=9cc"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSlim?color=ccf"></a>
 </p>
-PaddleSlim是一个专注于深度学习模型压缩的工具库，提供**低比特量化、知识蒸馏、稀疏化和模型结构搜索**等模型压缩策略，帮助用户快速实现模型的小型化。
+PaddleSlim是一个专注于深度学习模型压缩的工具库，提供**低比特量化、知识蒸馏、稀疏化和模型结构搜索**等模型压缩策略，帮助开发者快速实现模型的小型化。
 ## 产品动态
+- 🔥 **2022.08.16：自动化压缩功能升级**
+  - 支持直接加载ONNX模型和Paddle模型导出至ONNX
+  - 发布量化分析工具试用版，发布[YOLO系列离线量化工具](example/post_training_quantization/pytorch_yolo_series/)
+  - 更新[YOLO-Series自动化压缩模型库](example/auto_compression/pytorch_yolo_series)
+  | 模型  | Base mAP<sup>val<br>0.5:0.95 | ACT量化mAP<sup>val<br>0.5:0.95  | 模型体积压缩比 | 预测时延<sup><small>FP32</small><sup><br><sup>  | 预测时延<sup><small>INT8</small><sup><br><sup> | 预测加速比 |
+  | :-------- |:-------- |:--------: | :--------: | :---------------------: | :----------------: | :----------------: |
+  | PPYOLOE-s | 43.1 | 42.6  | 3.9倍  | 6.51ms  | 2.12ms  | 3.1倍 |
+  | YOLOv5s | 37.4   | 36.9  | 3.8倍  | 5.95ms  |  1.87ms | 3.2倍 |
+  | YOLOv6s | 42.4   | 41.3 | 3.9倍 |  9.06ms  |   1.83ms   | 5.0倍   |
+  | YOLOv7 |  51.1   | 50.9 | 3.9倍 |  26.84ms  |   4.55ms   |  5.9倍  |
+  | YOLOv7-Tiny | 37.3   | 37.0 | 3.9倍 | 5.06ms  |   1.68ms   |  3.0倍  |
 - 🔥 **2022.07.01: 发布[v2.3.0版本](https://github.com/PaddlePaddle/PaddleSlim/releases/tag/v2.3.0)**
  - 发布[自动化压缩功能](example/auto_compression)
+    - 支持代码无感知压缩：开发者只需提供推理模型文件和数据，既可进行离线量化（PTQ）、量化训练（QAT）、稀疏训练等压缩任务。
-    - 支持代码无感知压缩：用户只需提供推理模型文件和数据，既可进行离线量化（PTQ）、量化训练（QAT）、稀疏训练等压缩任务。
    - 支持自动策略选择，根据任务特点和部署环境特性：自动搜索合适的离线量化方法,自动搜索最佳的压缩策略组合方式。
    - 发布[自然语言处理](example/auto_compression/nlp)、[图像语义分割](example/auto_compression/semantic_segmentation)、[图像目标检测](example/auto_compression/detection)三个方向的自动化压缩示例。
-    - 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolov5)、[YOLOv6](example/auto_compression/pytorch_yolov6)、[YOLOv7](example/auto_compression/pytorch_yolov7)、[HuggingFace](example/auto_compression/pytorch_huggingface)、[MobileNet](example/auto_compression/tensorflow_mobilenet)。
+    - 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolo_series)、[YOLOv6](example/auto_compression/pytorch_yolo_series)、[YOLOv7](example/auto_compression/pytorch_yolo_series)、[HuggingFace](example/auto_compression/pytorch_huggingface)、[MobileNet](example/auto_compression/tensorflow_mobilenet)。
  - 升级量化功能
+    - 统一量化模型格式；离线量化支持while op；修复BERT大模型量化训练过慢的问题。
-    - 统一量化模型格式
+    - 新增7种[离线量化方法](docs/zh_cn/tutorials/quant/post_training_quantization.md), 包括HIST, AVG, EMD, Bias Correction, AdaRound等。
-    - 离线量化支持while op
-    - 新增7种[离线量化方法](docs/zh_cn/tutorials/quant/post_training_quantization.md), 包括HIST, AVG, EMD, Bias Correction, AdaRound等
-    - 修复BERT大模型量化训练过慢的问题
  - 支持半结构化稀疏训练
  - 新增延时预估工具
+    - 支持对稀疏化模型、低比特量化模型的性能预估；支持预估指定模型在特定部署环境下 (ARM CPU + Paddle Lite) 的推理性能；提供 SD625、SD710、RK3288 芯片 + Paddle Lite 的预估接口。
+    - 提供部署环境自动扩展工具，可以自动增加在更多 ARM CPU 设备上的预估工具。
-    - 支持预估指定模型在特定部署环境下 (ARM CPU + Paddle Lite) 的推理性能
+<details>
-    - 提供部署环境自动扩展工具，可以自动增加在更多 ARM CPU 设备上的预估工具
+<summary>历史更新</summary>
-    - 支持对稀疏化模型、低比特量化模型的性能预估
-    - 提供 SD625、SD710、RK3288 芯片 + Paddle Lite 的预估接口
 - **2021.11.15: 发布v2.2.0版本**
@@ -52,6 +63,7 @@ PaddleSlim是一个专注于深度学习模型压缩的工具库，提供**低
 更多信息请参考：[release note](https://github.com/PaddlePaddle/PaddleSlim/releases)
+</details>
 ## 基础压缩功能概览

--- a/demo/quant/pact_quant_aware/train.py
+++ b/demo/quant/pact_quant_aware/train.py
@@ -65,6 +65,8 @@ add_arg('use_pact',          bool, True,
        "Whether to use PACT or not.")
 add_arg('analysis',          bool, False,
        "Whether analysis variables distribution.")
+add_arg('onnx_format',          bool, False,
+        "Whether use onnx format or not.")
 add_arg('ce_test',                 bool,   False,       "Whether to CE test.")
 # yapf: enable
@@ -257,6 +259,8 @@ def compress(args):
        'window_size': 10000,
        # The decay coefficient of moving average, default is 0.9
        'moving_rate': 0.9,
+        # Whether use onnx format or not
+        'onnx_format': args.onnx_format,
    }
    # 2. quantization transform programs (training aware)
@@ -298,9 +302,9 @@ def compress(args):
        places,
        quant_config,
        scope=None,
-        act_preprocess_func=act_preprocess_func,
+        act_preprocess_func=None,
-        optimizer_func=optimizer_func,
+        optimizer_func=None,
-        executor=executor,
+        executor=None,
        for_test=True)
    compiled_train_prog = quant_aware(
        train_prog,
@@ -425,29 +429,23 @@ def compress(args):
    # 3. Freeze the graph after training by adjusting the quantize
    #    operators' order for the inference.
    #    The dtype of float_program's weights is float32, but in int8 range.
-    float_program, int8_program = convert(val_program, places, quant_config, \
+    model_path = os.path.join(quantization_model_save_dir, args.model)
-                                                        scope=None, \
+    if not os.path.isdir(model_path):
-                                                        save_int8=True)
+        os.makedirs(model_path)
+    float_program = convert(val_program, places, quant_config)
    _logger.info("eval best_model after convert")
    final_acc1 = test(best_epoch, float_program)
    _logger.info("final acc:{}".format(final_acc1))
    # 4. Save inference model
-    model_path = os.path.join(quantization_model_save_dir, args.model,
-                              'act_' + quant_config['activation_quantize_type']
-                              + '_w_' + quant_config['weight_quantize_type'])
-    float_path = os.path.join(model_path, 'float')
-    if not os.path.isdir(model_path):
-        os.makedirs(model_path)
    paddle.fluid.io.save_inference_model(
-        dirname=float_path,
+        dirname=model_path,
        feeded_var_names=[image.name],
        target_vars=[out],
        executor=exe,
        main_program=float_program,
-        model_filename=float_path + '/model',
+        model_filename=model_path + '/model.pdmodel',
-        params_filename=float_path + '/params')
+        params_filename=model_path + '/model.pdiparams')
 def main():

--- a/demo/quant/quant_aware/train.py
+++ b/demo/quant/quant_aware/train.py
@@ -126,6 +126,8 @@ def compress(args):
        'window_size': 10000,
        # The decay coefficient of moving average, default is 0.9
        'moving_rate': 0.9,
+        # Whether use onnx format or not
+        'onnx_format': args.onnx_format,
    }
    pretrain = True
@@ -294,10 +296,7 @@ def compress(args):
    #    operators' order for the inference.
    #    The dtype of float_program's weights is float32, but in int8 range.
    ############################################################################################################
-    float_program, int8_program = convert(val_program, places, quant_config, \
+    float_program = convert(val_program, places, quant_config)
-                                                        scope=None, \
-                                                        save_int8=True,
-                                                        onnx_format=args.onnx_format)
    print("eval best_model after convert")
    final_acc1 = test(best_epoch, float_program)
    ############################################################################################################

--- a/demo/quant/quant_post/eval.py
+++ b/demo/quant/quant_post/eval.py
@@ -21,8 +21,7 @@ import functools
 import paddle
 sys.path[0] = os.path.join(
    os.path.dirname("__file__"), os.path.pardir, os.path.pardir)
-sys.path[1] = os.path.join(
+sys.path[1] = os.path.join(os.path.dirname("__file__"), os.path.pardir)
-    os.path.dirname("__file__"), os.path.pardir)
 import imagenet_reader as reader
 from utility import add_arguments, print_arguments
@@ -31,8 +30,8 @@ parser = argparse.ArgumentParser(description=__doc__)
 add_arg = functools.partial(add_arguments, argparser=parser)
 add_arg('use_gpu',          bool, True,                 "Whether to use GPU or not.")
 add_arg('model_path', str,  "./pruning/checkpoints/resnet50/2/eval_model/",                 "Whether to use pretrained model.")
-add_arg('model_name', str,  '__model__', "model filename for inference model")
+add_arg('model_name', str,  'model.pdmodel', "model filename for inference model")
-add_arg('params_name', str, '__params__', "params filename for inference model")
+add_arg('params_name', str, 'model.pdiparams', "params filename for inference model")
 add_arg('batch_size',       int,  64,                 "Minibatch size.")
 # yapf: enable

--- a/docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst
+++ b/docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst
@@ -3,19 +3,19 @@ AutoCompression自动压缩功能
 AutoCompression
 ---------------
-.. py:class:: paddleslim.auto_compression.AutoCompression(model_dir, model_filename, params_filename, save_dir, strategy_config, train_config, train_dataloader, eval_callback, devices='gpu')
+.. py:class:: paddleslim.auto_compression.AutoCompression(model_dir, train_dataloader, model_filename, params_filename, save_dir, strategy_config, train_config, eval_callback, devices='gpu')
-`源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/auto_compression.py#L32>`_
+`源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/auto_compression.py#L49>`_
 根据指定的配置对使用 ``paddle.jit.save`` 接口或者 ``paddle.static.save_inference_model`` 接口保存的推理模型进行压缩。
 **参数: **
 - **model_dir(str)** - 需要压缩的推理模型所在的目录。
+- **train_dataloader(paddle.io.DataLoader)** - 训练数据迭代器。注意：如果选择离线量化超参搜索策略的话, ``train_dataloader`` 和 ``eval_callback`` 设置相同的数据读取即可。
 - **model_filename(str)** - 需要压缩的推理模型文件名称。
 - **params_filename(str)** - 需要压缩的推理模型参数文件名称。
 - **save_dir(str)** - 压缩后模型的所保存的目录。
- **train_dataloader(paddle.io.DataLoader)** - 训练数据迭代器。注意：如果选择离线量化超参搜索策略的话, ``train_dataloader`` 和 ``eval_callback`` 设置相同的数据读取即可。
 - **train_config(dict)** - 训练配置。可以配置的参数请参考: `<https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L103>`_ 。注意：如果选择离线量化超参搜索策略的话， ``train_config`` 直接设置为 ``None`` 即可。
 - **strategy_config(dict, list(dict), 可选)** - 使用的压缩策略，可以通过设置多个单种策略来并行使用这些压缩方式。字典的关键字必须在: 
             ``Quantization`` (量化配置, 可配置的参数参考 `<https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L24>`_ ), 
@@ -82,13 +82,13 @@ AutoCompression
   eval_dataloader = Cifar10(mode='eval')
-   ac = AutoCompression(model_path, model_filename, params_filename, save_dir, \
+   ac = AutoCompression(model_path, train_dataloader, model_filename, params_filename, save_dir, \
                        strategy_config="Quantization": Quantization(**default_ptq_config), 
                        "Distillation": HyperParameterOptimization(**default_distill_config)}, \
-                        train_config=None, train_dataloader=train_dataloader, eval_callback=eval_dataloader,devices='gpu')
+                        train_config=None, eval_callback=eval_dataloader,devices='gpu')
 ```

--- a/docs/zh_cn/api_cn/static/quant/quantization_api.rst
+++ b/docs/zh_cn/api_cn/static/quant/quantization_api.rst
@@ -118,7 +118,7 @@ quant_post_dynamic
 quant_post_static
 ---------------
-.. py:function:: paddleslim.quant.quant_post_static(executor,model_dir, quantize_model_path, batch_generator=None, sample_generator=None, model_filename=None, params_filename=None, save_model_filename='__model__', save_params_filename='__params__', batch_size=16, batch_nums=None, scope=None, algo='KL', round_type='round', quantizable_op_type=["conv2d","depthwise_conv2d","mul"], is_full_quantize=False, weight_bits=8, activation_bits=8, activation_quantize_type='range_abs_max', weight_quantize_type='channel_wise_abs_max', onnx_format=False, skip_tensor_list=None, optimize_model=False)
+.. py:function:: paddleslim.quant.quant_post_static(executor,model_dir, quantize_model_path, batch_generator=None, sample_generator=None, model_filename=None, params_filename=None, save_model_filename='model.pdmodel', save_params_filename='model.pdiparams', batch_size=16, batch_nums=None, scope=None, algo='KL', round_type='round', quantizable_op_type=["conv2d","depthwise_conv2d","mul"], is_full_quantize=False, weight_bits=8, activation_bits=8, activation_quantize_type='range_abs_max', weight_quantize_type='channel_wise_abs_max', onnx_format=False, skip_tensor_list=None, optimize_model=False)
 `源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py>`_
@@ -217,15 +217,15 @@ quant_post_static
        target_vars=[out],
        main_program=val_prog,
        executor=exe,
-        model_filename='__model__',
+        model_filename='model.pdmodel',
-        params_filename='__params__')
+        params_filename='model.pdiparams')
    quant_post_static(
        executor=exe,
        model_dir='./model_path',
        quantize_model_path='./save_path',
        sample_generator=val_reader,
-        model_filename='__model__',
+        model_filename='model.pdmodel',
-        params_filename='__params__',
+        params_filename='model.pdiparams',
        batch_size=16,
        batch_nums=10)

--- a/example/auto_compression/README.md
+++ b/example/auto_compression/README.md
@@ -82,15 +82,15 @@ ACT相比传统的模型压缩方法，
 | [语义分割](./semantic_segmentation) | UNet                         | 65.00                  | 64.93                  | 15.29            | 10.23            | **1.49**   | NVIDIA Tesla T4   |
 | NLP                             | PP-MiniLM                    | 72.81                 | 72.44                 | 128.01           | 17.97            | **7.12**   | NVIDIA Tesla T4   |
 | NLP                             | ERNIE 3.0-Medium             | 73.09                 | 72.40                 | 29.25(fp16)      | 19.61            | **1.49**   | NVIDIA Tesla T4   |
-| [目标检测](./pytorch_yolov5)             | YOLOv5s<br/>(PyTorch)        | 37.40                  | 36.9                   | 5.95             | 1.87             | **3.18**   | NVIDIA Tesla T4   |
+| [目标检测](./pytorch_yolo_series)             | YOLOv5s<br/>(PyTorch)        | 37.40                  | 36.9                   | 5.95             | 1.87             | **3.18**   | NVIDIA Tesla T4   |
-| [目标检测](./pytorch_yolov6)             | YOLOv6s<br/>(PyTorch)        | 42.4                  | 41.3                   | 9.06             | 1.83             | **4.95**   | NVIDIA Tesla T4   |
+| [目标检测](./pytorch_yolo_series)             | YOLOv6s<br/>(PyTorch)        | 42.4                  | 41.3                   | 9.06             | 1.83             | **4.95**   | NVIDIA Tesla T4   |
-| [目标检测](./pytorch_yolov7)             | YOLOv7<br/>(PyTorch)        | 51.1                  | 50.8                   | 26.84             | 4.55             | **5.89**   | NVIDIA Tesla T4   |
+| [目标检测](./pytorch_yolo_series)             | YOLOv7<br/>(PyTorch)        | 51.1                  | 50.8                   | 26.84             | 4.55             | **5.89**   | NVIDIA Tesla T4   |
-| [目标检测](./detection)             | PP-YOLOE-l                   | 50.9                   | 50.6                   | 11.2             | 6.7              | **1.67**   | NVIDIA Tesla V100 |
+| [目标检测](./detection)             | PP-YOLOE-s                   | 43.1                   | 42.6                   |  6.51  |   2.12   |  **3.07**  | NVIDIA Tesla T4 |
 | [图像分类](./image_classification)  | MobileNetV1<br/>(TensorFlow) | 71.0                   | 70.22                  | 30.45            | 15.86            |  **1.92**  | SDMM865（骁龙865）     |  
 - 备注：目标检测精度指标为mAP（0.5:0.95）精度测量结果。图像分割精度指标为IoU精度测量结果。
 - 更多飞桨模型应用示例及Benchmark可以参考：[图像分类](./image_classification)，[目标检测](./detection)，[语义分割](./semantic_segmentation)，[自然语言处理](./nlp)
- 更多其它框架应用示例及Benchmark可以参考：[YOLOv5(PyTorch)](./pytorch_yolov5)，[YOLOv6(PyTorch)](./pytorch_yolov6)，[YOLOv7(PyTorch)](./pytorch_yolov7)，[HuggingFace(PyTorch)](./pytorch_huggingface)，[MobileNet(TensorFlow)](./tensorflow_mobilenet)。
+- 更多其它框架应用示例及Benchmark可以参考：[YOLOv5(PyTorch)](./pytorch_yolo_series)，[YOLOv6(PyTorch)](./pytorch_yolo_series)，[YOLOv7(PyTorch)](./pytorch_yolo_series)，[HuggingFace(PyTorch)](./pytorch_huggingface)，[MobileNet(TensorFlow)](./tensorflow_mobilenet)。
 ## **环境准备**

--- a/example/auto_compression/detection/configs/ppyoloe_l_qat_dis.yaml
+++ b/example/auto_compression/detection/configs/ppyoloe_l_qat_dis.yaml
@@ -12,6 +12,7 @@ Distillation:
  loss: soft_label
 Quantization:
+  onnx_format: true
  use_pact: true
  activation_quantize_type: 'moving_average_abs_max'
  quantize_op_types:

--- a/example/auto_compression/detection/configs/ppyoloe_s_qat_dis.yaml
+++ b/example/auto_compression/detection/configs/ppyoloe_s_qat_dis.yaml
+Global:
+  reader_config: configs/yolo_reader.yml
+  input_list: ['image']
+  arch: PPYOLOE    # When export exclude_nms=True, need set arch: PPYOLOE
+  Evaluation: True
+  model_dir: ./ppyoloe_crn_s_300e_coco
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+Quantization:
+  onnx_format: true
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/example/auto_compression/detection/eval.py
+++ b/example/auto_compression/detection/eval.py
@@ -20,7 +20,7 @@ import paddle
 from ppdet.core.workspace import load_config, merge_config
 from ppdet.core.workspace import create
 from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.common import load_config as load_slim_config
 from keypoint_utils import keypoint_post_process

--- a/example/auto_compression/detection/run.py
+++ b/example/auto_compression/detection/run.py
@@ -20,7 +20,7 @@ import paddle
 from ppdet.core.workspace import load_config, merge_config
 from ppdet.core.workspace import create
 from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.common import load_config as load_slim_config
 from paddleslim.auto_compression import AutoCompression
 from keypoint_utils import keypoint_post_process
@@ -121,7 +121,8 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
 def main():
    global global_config
    all_config = load_slim_config(FLAGS.config_path)
-    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
+    assert "Global" in all_config, "Key 'Global' not found in config file. \n{}".format(
+        all_config)
    global_config = all_config["Global"]
    reader_cfg = load_config(global_config['reader_config'])

--- a/example/auto_compression/hyperparameter_tutorial.md
+++ b/example/auto_compression/hyperparameter_tutorial.md
-# ACT超参详细教程
+# 1. ACT超参详细教程
-## 各压缩方法超参解析
+## 1.1 各压缩方法超参解析
-#### 配置定制量化方案
+### 1.1.1 量化（quantization）
 量化参数主要设置量化比特数和量化op类型，其中量化op包含卷积层（conv2d, depthwise_conv2d）和全连接层（mul, matmul_v2）。以下为只量化卷积层的示例：
 ```yaml
@@ -20,69 +20,148 @@ Quantization:
    moving_rate: 0.9                              # 'moving_average_abs_max' 量化方式的衰减系数，默认 0.9。
    for_tensorrt: false                           # 量化后的模型是否使用 TensorRT 进行预测。如果是的话，量化op类型为： TENSORRT_OP_TYPES 。默认值为False.
    is_full_quantize: false                       # 是否全量化
+    onnx_format: false                            # 是否采用ONNX量化标准格式
 ```
-#### 配置定制蒸馏策略
+以上配置项说明如下：
+- use_pact: 是否开启PACT。一般情况下，开启PACT后，量化产出的模型精度会更高。算法原理请参考：[PACT: Parameterized Clipping Activation for Quantized Neural Networks](https://arxiv.org/abs/1805.06085)
+- activation_bits:  激活量化bit数，可选1~8。默认为8。
+- weight_bits: 参数量化bit数，可选1~8。默认为8。
+- activation_quantize_type: 激活量化方式，可选 'abs_max' , 'range_abs_max' , 'moving_average_abs_max' 。如果使用 TensorRT 加载量化后的模型来预测，请使用 'range_abs_max' 或 'moving_average_abs_max' 。默认为 'moving_average_abs_max'。
+- weight_quantize_type: 参数量化方式。可选 'abs_max' , 'channel_wise_abs_max' , 'range_abs_max' , 'moving_average_abs_max' 。如果使用 TensorRT 加载量化后的模型来预测，请使用 'channel_wise_abs_max' 。 默认 'channel_wise_abs_max' 。
+- not_quant_pattern: 所有 `name_scope` 包含 'not_quant_pattern' 字符串的 op ，都不量化。 `name_scope` 设置方式请参考 [paddle.static.name_scope](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/static/name_scope_cn.html#name-scope)。
+- quantize_op_types：需要进行量化的OP类型。通过以下代码输出所有支持量化的OP类型：
+```
+from paddleslim.quant.quanter import TRANSFORM_PASS_OP_TYPES,QUANT_DEQUANT_PASS_OP_TYPES
+print(TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES)
+```
+- dtype: 量化后的参数类型，默认 int8 , 目前仅支持 int8
+- window_size: 'range_abs_max' 量化方式的 window size ，默认10000。
+- moving_rate: 'moving_average_abs_max' 量化方式的衰减系数，默认 0.9。
+- for_tensorrt: 量化后的模型是否使用 TensorRT 进行预测。默认值为False. 通过以下代码，输出for_tensorrt=True时会量化到的OP：
+```
+from paddleslim.quant.quanter import TENSORRT_OP_TYPES
+print(TENSORRT_OP_TYPES)
+```
+- is_full_quantize: 是否量化所有可支持op类型。默认值为False.
+### 1.1.2 知识蒸馏（knowledge distillation）
 蒸馏参数主要设置蒸馏节点（`node`）和教师预测模型路径，如下所示：
 ```yaml
 Distillation:
-    # alpha: 蒸馏loss所占权重；可输入多个数值，支持不同节点之间使用不同的ahpha值
    alpha: 1.0
-    # loss: 蒸馏loss算法；可输入多个loss，支持不同节点之间使用不同的loss算法
    loss: l2
-    # node: 蒸馏节点，即某层输出的变量名称，可以选择：
-    #                    1. 使用自蒸馏的话，蒸馏结点仅包含学生网络节点即可, 支持多节点蒸馏;
-    #                    2. 使用其他蒸馏的话，蒸馏节点需要包含教师网络节点和对应的学生网络节点,
-    #                    每两个节点组成一对，分别属于教师模型和学生模型，支持多节点蒸馏。
    node:
    - relu_30.tmp_0
-    # teacher_model_dir: 保存预测模型文件和预测模型参数文件的文件夹名称
    teacher_model_dir: ./inference_model
    # teacher_model_filename: 预测模型文件，格式为 *.pdmodel 或 __model__
    teacher_model_filename: model.pdmodel
    # teacher_params_filename: 预测模型参数文件，格式为 *.pdiparams 或 __params__
    teacher_params_filename: model.pdiparams
 ```
+以上配置项说明如下：
+- alpha: 蒸馏loss所占权重；可输入多个数值，支持不同节点之间使用不同的alpha值。
+- loss: 蒸馏loss算法；可输入多个loss，支持不同节点之间使用不同的loss算法。 可选"soft_label"、“l2”或“fsp”。也可自定义loss。具体定义和使用可参考[知识蒸馏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/dist/single_distiller_api.html)。
+- node: 蒸馏节点，即某层输出的变量名称。该选项设置方式分两种情况：
- 蒸馏loss目前支持的有：fsp，l2，soft_label，也可自定义loss。具体定义和使用可参考[知识蒸馏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/dist/single_distiller_api.html)。
+  - 自蒸馏：教师模型为压缩前的推理模型，学生模型为压缩后的推理模型。‘node’ 可设置为变量名称的列表，ACT会自动在该列表中的变量上依次添加知识蒸馏loss。示例如下：
+  ```
+  node:
+  - relu_30.tmp_0
+  - relu_31.tmp_0
+  ```
+  上述示例，会添加两个知识蒸馏loss。第一个loss的输入为教师模型和学生模型的 'relu_30.tmp_0'，第二个loss的输入为教师模型和学生模型的'relu_31.tmp_0'。
+  - 普通蒸馏：教师模型为任意模型，学生模型为压缩后的推理模型。‘node’ 可设置为变量名称的列表，列表中元素数量必须为偶数。示例如下：
+  ```
+  node:
+  - teacher_relu_0.tmp_0
+  - student_relu_0.tmp_0
+  - teacher_relu_1.tmp_0
+  - student_relu_1.tmp_0
+  ```
-#### 配置定制结构化稀疏策略
+  上述示例，会添加两个知识蒸馏loss。第一个loss的输入为教师模型的变量“teacher_relu_0.tmp_0”和学生模型的变量“student_relu_0.tmp_0”，第二个loss的输入为教师模型的变量“teacher_relu_1.tmp_0”和学生模型的“student_relu_1.tmp_0”。
+  如果不设置`node`，则分别取教师模型和学生模型的最后一个带参数的层的输出，组成知识蒸馏loss.
+- teacher_model_dir: 用于监督压缩后模型训练的教师模型所在的路径。如果不设置该选项，则使用压缩前的模型做为教师模型。
+- teacher_model_filename: 教师模型的模型文件名称，格式为 *.pdmodel 或 __model__。仅当设置`teacher_model_dir`后生效。
+- teacher_params_filename: 教师模型的参数文件名称，格式为 *.pdiparams 或 __params__。仅当设置`teacher_model_dir`后生效。
+### 1.1.3 结构化稀疏（sparsity）
 结构化稀疏参数设置如下所示：
 ```yaml
 ChannelPrune:
-  # pruned_ratio: 裁剪比例
  pruned_ratio: 0.25
-  # prune_params_name: 需要裁剪的参数名字
  prune_params_name:
  - conv1_weights
-  # criterion: 评估一个卷积层内通道重要性所参考的指标
  criterion: l1_norm
 ```
- criterion目前支持的有：l1_norm , bn_scale , geometry_median。具体定义和使用可参考[结构化稀疏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/prune/prune_api.html)。
-#### 配置定制ASP半结构化稀疏策略
+- pruned_ratio: 每个卷积层的通道数被剪裁的比例。
+- prune_params_name: 待剪裁的卷积层的权重名称。通过以下脚本获得推理模型中所有卷积层的权重名称：
+```
+import paddle
+paddle.enable_static()
+model_dir="./inference_model"
+exe = paddle.static.Executor(paddle.CPUPlace())
+[inference_program, feed_target_names, fetch_targets] = (
+    paddle.static.load_inference_model(model_dir, exe))
+for var_ in inference_program.list_vars():
+    if var_.persistable and "conv2d" in var_.name:
+        print(f"{var_.name}")
+```
+或者，使用[Netron工具](https://netron.app/) 可视化`*.pdmodel`模型文件，选择合适的卷积层进行剪裁。
+- criterion: 评估卷积通道重要性的指标。可选 “l1_norm” , “bn_scale” , “geometry_median”。具体定义和使用可参考[结构化稀疏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/prune/prune_api.html)。
+### 1.1.4 ASP半结构化稀疏
 半结构化稀疏参数设置如下所示：
 ```yaml
 ASPPrune:
-  # prune_params_name: 需要裁剪的参数名字
  prune_params_name:
  - conv1_weights
 ```
-#### 配置定制针对Transformer结构的结构化剪枝策略
+- prune_params_name: 待剪裁的卷积层的权重名称。通过以下脚本获得推理模型中所有卷积层的权重名称：
+```
+import paddle
+paddle.enable_static()
+model_dir="./inference_model"
+exe = paddle.static.Executor(paddle.CPUPlace())
+[inference_program, feed_target_names, fetch_targets] = (
+    paddle.static.load_inference_model(model_dir, exe))
+for var_ in inference_program.list_vars():
+    if var_.persistable and "conv2d" in var_.name:
+        print(f"{var_.name}")
+```
+或者，使用[Netron工具](https://netron.app/) 可视化`*.pdmodel`模型文件，选择合适的卷积层进行剪裁。
+### 1.1.5 Transformer结构化剪枝
 针对Transformer结构的结构化剪枝参数设置如下所示：
 ```yaml
 TransformerPrune:
-  # pruned_ratio: 每个全链接层的裁剪比例
  pruned_ratio: 0.25
 ```
+- pruned_ratio: 每个全链接层的被剪裁的比例。
-#### 配置定制非结构化稀疏策略
+### 1.1.6 非结构化稀疏策略
 非结构化稀疏参数设置如下所示：
 ```yaml
@@ -122,7 +201,7 @@ UnstructurePrune:
 - local_sparsity 表示剪裁比例（ratio）应用的范围，仅在 'ratio' 模式生效。local_sparsity 开启时意味着每个参与剪裁的参数矩阵稀疏度均为 'ratio'， 关闭时表示只保证模型整体稀疏度达到'ratio'，但是每个参数矩阵的稀疏度可能存在差异。各个矩阵稀疏度保持一致时，稀疏加速更显著。
 - 更多非结构化稀疏的参数含义详见[非结构化稀疏API文档](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/dygraph/pruners/unstructured_pruner.rst)
-#### 配置训练超参
+### 1.1.7 训练超参
 训练参数主要设置学习率、训练次数（epochs）和优化器等。
 ```yaml
@@ -143,12 +222,69 @@ TrainConfig:
    boundaries: [4500] # 设置策略参数
    values: [0.005, 0.0005] # 设置策略参数
 ```
-## 其他参数配置
+## 1.2 FAQ
-#### 1.自动蒸馏效果不理想，怎么自主选择蒸馏节点？
+### 1.自动蒸馏效果不理想，怎么自主选择蒸馏节点？
 首先使用[Netron工具](https://netron.app/) 可视化`model.pdmodel`模型文件，选择模型中某些层输出Tensor名称，对蒸馏节点进行配置。（一般选择Backbone或网络的输出等层进行蒸馏）
 <div align="center">
    <img src="../../docs/images/dis_node.png" width="600">
 </div>
+### 2.如何获得推理模型中的OP类型
+执行以下代码获取推理模型中的OP类型，其中`model_dir`为推理模型存储路径。
+```
+import paddle
+paddle.enable_static()
+model_dir="./inference_model"
+exe = paddle.static.Executor(paddle.CPUPlace())
+inference_program, _, _ = (
+    paddle.static.load_inference_model(model_dir, exe))
+op_types = {}
+for block in inference_program.blocks:
+    for op in block.ops:
+        op_types[op.type] = 1
+print(f"Operators in inference model:\n{op_types.keys()}")
+```
+所用飞桨框架接口：
+- [load_inference_model](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/static/load_inference_model_cn.html#load-inference-model)
+- [Program](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/static/Program_cn.html#program)
+- [Executor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/static/Executor_cn.html#executor)
+### 3. 量化支持对哪些OP进行量化
+执行以下代码，查看当前PaddlePaddle版本的量化功能所支持的OP类型：
+```
+from paddle.fluid.contrib.slim.quantization.utils import _weight_supported_quantizable_op_type, _act_supported_quantizable_op_type
+print(f"_supported_quantizable_op_type:\n{_weight_supported_quantizable_op_type}")
+print(f"_supported_quantizable_op_type:\n{_act_supported_quantizable_op_type}")
+```
+### 4. 如何设置推理模型中OP的‘name_scope’属性
+以下代码，将输出变量为`conv2d_52.tmp_0`的OP的`name_scope`设置为'skip_quant':
+```
+import paddle
+paddle.enable_static()
+model_dir="./original_model"
+exe = paddle.static.Executor(paddle.CPUPlace())
+[inference_program, feed_target_names, fetch_targets] = (
+    paddle.static.load_inference_model(model_dir, exe))
+skips = ['conv2d_52.tmp_0']
+for block in inference_program.blocks:
+    for op in block.ops:
+        if op.output_arg_names[0] in skips:
+            op._set_attr("name_scope", "skip_quant")
+feed_vars = []
+for var_ in inference_program.list_vars():
+    if var_.name in feed_target_names:
+        feed_vars.append(var_)
+paddle.static.save_inference_model("./infer_model", feed_vars, fetch_targets, exe, program=inference_program)
+```
--- a/example/auto_compression/image_classification/README.md
+++ b/example/auto_compression/image_classification/README.md
@@ -21,28 +21,30 @@
 ### PaddleClas模型
-| 模型 | 策略 | Top-1 Acc | GPU 耗时(ms) | ARM CPU 耗时(ms) |
+| 模型 | 策略 | Top-1 Acc | GPU 耗时(ms) | ARM CPU 耗时(ms) | 配置文件 | Inference模型 |
-|:------:|:------:|:------:|:------:|:------:|
+|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
-| MobileNetV1 | Baseline | 70.90 | - | 33.15 |
+| MobileNetV1 | Baseline | 70.90 | - | 33.15 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_infer.tar) |
-| MobileNetV1 | 量化+蒸馏 | 70.57 | - | 13.64 |
+| MobileNetV1 | 量化+蒸馏 | 70.57 | - | 13.64 | [Config](./configs/MobileNetV1/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/MobileNetV1_QAT.tar) |
-| ResNet50_vd | Baseline | 79.12 | 3.19 | - |
+| ResNet50_vd | Baseline | 79.12 | 3.19 | - | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar) |
-| ResNet50_vd | 量化+蒸馏 | 78.74 | 0.92 | - |
+| ResNet50_vd | 量化+蒸馏 | 78.74 | 0.92 | - | [Config](./configs/ResNet50_vd/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/ResNet50_vd_QAT.tar) |
-| ShuffleNetV2_x1_0 | Baseline | 68.65 | - | 10.43 |
+| ShuffleNetV2_x1_0 | Baseline | 68.65 | - | 10.43 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ShuffleNetV2_x1_0_infer.tar) |
-| ShuffleNetV2_x1_0 | 量化+蒸馏 | 68.32 | - | 5.51 |
+| ShuffleNetV2_x1_0 | 量化+蒸馏 | 68.32 | - | 5.51 | [Config](./configs/ShuffleNetV2_x1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/ShuffleNetV2_x1_0_QAT.tar) |
-| SqueezeNet1_0_infer | Baseline | 59.60 | - | 35.98 |
+| SqueezeNet1_0 | Baseline | 59.60 | - | 35.98 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SqueezeNet1_0_infer.tar) |
-| SqueezeNet1_0_infer | 量化+蒸馏 | 59.45 | - | 16.96 |
+| SqueezeNet1_0 | 量化+蒸馏 | 59.45 | - | 16.96 | [Config](./configs/SqueezeNet1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/SqueezeNet1_0_QAT.tar) |
-| PPLCNetV2_base | Baseline | 76.86 | - | 36.50 |
+| PPLCNetV2_base | Baseline | 76.86 | - | 36.50 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_infer.tar) |
-| PPLCNetV2_base | 量化+蒸馏 | 76.43 | - | 15.79 |
+| PPLCNetV2_base | 量化+蒸馏 | 76.43 | - | 15.79 | [Config](./configs/PPLCNetV2_base/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/PPLCNetV2_base_QAT.tar) |
-| PPHGNet_tiny | Baseline | 79.59 | 2.82 | - |
+| PPHGNet_tiny | Baseline | 79.59 | 2.82 | - | - |[Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_tiny_infer.tar) |
-| PPHGNet_tiny | 量化+蒸馏 | 79.20 | 0.98 | - |
+| PPHGNet_tiny | 量化+蒸馏 | 79.20 | 0.98 | - | [Config](./configs/PPHGNet_tiny/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/PPHGNet_tiny_QAT.tar) |
-| InceptionV3 | Baseline | 79.14 | 4.79 | - |
+| InceptionV3 | Baseline | 79.14 | 4.79 | - | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/InceptionV3_infer.tar) |
-| InceptionV3 | 量化+蒸馏 | 78.32 | 1.47 | - |
+| InceptionV3 | 量化+蒸馏 | 78.32 | 1.47 | - | [Config](./configs/InceptionV3/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/InceptionV3_QAT.tar) |
-| EfficientNetB0 | Baseline | 77.02 | 1.95 | - |
+| EfficientNetB0 | Baseline | 77.02 | 1.95 | - | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB0_infer.tar) |
-| EfficientNetB0 | 量化+蒸馏 | 75.39 | 1.44 | - |
+| EfficientNetB0 | 量化+蒸馏 | 75.39 | 1.44 | - | [Config](./configs/EfficientNetB0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/EfficientNetB0_QAT.tar) |
-| GhostNet_x1_0 | Baseline | 74.02 | 2.93 | - |
+| GhostNet_x1_0 | Baseline | 74.02 | 2.93 | - | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/GhostNet_x1_0_infer.tar) |
-| GhostNet_x1_0 | 量化+蒸馏 | 72.62 | 1.03 | - |
+| GhostNet_x1_0 | 量化+蒸馏 | 72.62 | 1.03 | - | [Config](./configs/GhostNet_x1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/GhostNet_x1_0_QAT.tar) |
-| MobileNetV3_large_x1_0 | Baseline | 75.32 | - | 16.62 |
+| MobileNetV3_large_x1_0 | Baseline | 75.32 | - | 16.62 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar) |
-| MobileNetV3_large_x1_0 | 量化+蒸馏 | 70.93 | - | 9.85 |
+| MobileNetV3_large_x1_0 | 量化+蒸馏 | 74.41 | - | 9.85 | [Config](./configs/MobileNetV3_large_x1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/MobileNetV3_large_x1_0_QAT.tar) |
+| MobileNetV3_large_x1_0_ssld | Baseline | 78.96 | - | 16.62 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_ssld_infer.tar) |
+| MobileNetV3_large_x1_0_ssld | 量化+蒸馏 | 77.17 | - | 9.85 | [Config](./configs/MobileNetV3_large_x1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/MobileNetV3_large_x1_0_ssld_QAT.tar) |
 - ARM CPU 测试环境：`SDM865(4xA77+4xA55)`
 - Nvidia GPU 测试环境：

--- a/example/auto_compression/image_classification/eval.py
+++ b/example/auto_compression/image_classification/eval.py
@@ -23,7 +23,7 @@ import paddle
 import paddle.nn as nn
 from paddle.io import DataLoader
 from imagenet_reader import ImageNetDataset
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.common import load_config as load_slim_config
 def argsparser():

--- a/example/auto_compression/image_classification/infer.py
+++ b/example/auto_compression/image_classification/infer.py
@@ -22,7 +22,7 @@ import yaml
 from utils import preprocess, postprocess
 import paddle
 from paddle.inference import create_predictor
-from paddleslim.auto_compression.config_helpers import load_config
+from paddleslim.common import load_config
 def argsparser():

--- a/example/auto_compression/image_classification/run.py
+++ b/example/auto_compression/image_classification/run.py
@@ -24,7 +24,7 @@ import paddle
 import paddle.nn as nn
 from paddle.io import DataLoader
 from imagenet_reader import ImageNetDataset
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.common import load_config as load_slim_config
 from paddleslim.auto_compression import AutoCompression
@@ -46,6 +46,11 @@ def argsparser():
        type=int,
        default=1281167,
        help="the number of total training images.")
+    parser.add_argument(
+        '--devices',
+        type=str,
+        default='gpu',
+        help="which device used to compress.")
    return parser
@@ -122,7 +127,12 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
 def main():
    rank_id = paddle.distributed.get_rank()
-    place = paddle.CUDAPlace(rank_id)
+    if args.devices == 'gpu':
+        place = paddle.CUDAPlace(rank_id)
+        paddle.set_device('gpu')
+    else:
+        place = paddle.CPUPlace()
+        paddle.set_device('cpu')
    global global_config
    all_config = load_slim_config(args.config_path)

--- a/example/auto_compression/nlp/run.py
+++ b/example/auto_compression/nlp/run.py
@@ -15,7 +15,7 @@ from paddlenlp.datasets import load_dataset
 from paddlenlp.data import Stack, Tuple, Pad
 from paddlenlp.data.sampler import SamplerHelper
 from paddlenlp.metrics import Mcc, PearsonAndSpearman
-from paddleslim.auto_compression.config_helpers import load_config
+from paddleslim.common import load_config
 from paddleslim.auto_compression.compressor import AutoCompression

--- a/example/auto_compression/pytorch_huggingface/run.py
+++ b/example/auto_compression/pytorch_huggingface/run.py
@@ -27,7 +27,7 @@ from paddlenlp.transformers import AutoModelForTokenClassification, AutoTokenize
 from paddlenlp.datasets import load_dataset
 from paddlenlp.data import Stack, Tuple, Pad
 from paddlenlp.metrics import AccuracyAndF1, Mcc, PearsonAndSpearman
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.common import load_config as load_slim_config
 from paddleslim.auto_compression.compressor import AutoCompression

--- a/example/auto_compression/pytorch_yolo_series/README.md
+++ b/example/auto_compression/pytorch_yolo_series/README.md
+# YOLO系列模型自动压缩示例
+目录：
+- [1.简介](#1简介)
+- [2.Benchmark](#2Benchmark)
+- [3.开始自动压缩](#自动压缩流程)
+  - [3.1 环境准备](#31-准备环境)
+  - [3.2 准备数据集](#32-准备数据集)
+  - [3.3 准备预测模型](#33-准备预测模型)
+  - [3.4 测试模型精度](#34-测试模型精度)
+  - [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
+- [4.预测部署](#4预测部署)
+- [5.FAQ](5FAQ)
+## 1. 简介
+本示例将以以[ultralytics/yolov5](https://github.com/ultralytics/yolov5)，[meituan/YOLOv6](https://github.com/meituan/YOLOv6) 和 [WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7) 目标检测模型为例，借助[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)的能力，将PyTorch框架模型转换为Paddle框架模型，再使用ACT自动压缩功能进行模型压缩，压缩后的模型可使用Paddle Inference或者导出至ONNX，利用TensorRT部署。
+## 2.Benchmark
+| 模型  |  策略  | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 模型体积 | 预测时延<sup><small>FP32</small><sup><br><sup> |预测时延<sup><small>FP16</small><sup><br><sup> | 预测时延<sup><small>INT8</small><sup><br><sup> |  配置文件 | Inference模型  |
+| :-------- |:-------- |:--------: | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
+| YOLOv5s |  Base模型 | 640*640  |  37.4   | 28.1MB  | 5.95ms  |   2.44ms   |  -  |  - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx) |
+| YOLOv5s |  离线量化 | 640*640  |  36.0   | 7.4MB |   - |   -   |  1.87ms  |  [config](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/pytorch_yolo_series) | - |
+| YOLOv5s |  ACT量化训练  | 640*640  |  **36.9**   | 7.4MB |    - |   -   |  **1.87ms**  |  [config](./configs/yolov5s_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov5s_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov5s_quant.onnx) |
+|  |  |  |  |  |  |  |  |  |
+| YOLOv6s |  Base模型 | 640*640  |  42.4   | 65.9MB |   9.06ms  |   2.90ms   |  -  |  - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov6s.onnx) |
+| YOLOv6s |  KL离线量化 | 640*640  |  30.3   | 16.8MB |   - |   -   |  1.83ms  |  [config](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/pytorch_yolo_series) | - |
+| YOLOv6s |  量化蒸馏训练 | 640*640  |  **41.3**   | 16.8MB |   - |   -   |  **1.83ms**  |  [config](./configs/yolov6s_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_quant.onnx) |
+|  |  |  |  |  |  |  |  |  |
+| YOLOv7 |  Base模型 | 640*640  |  51.1   | 141MB  |  26.84ms  |   7.44ms   |  -  |  - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx) |
+| YOLOv7 |  离线量化 | 640*640  |  50.2   | 36MB |   - |   -   |  4.55ms  |  [config](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/pytorch_yolo_series) | - |
+| YOLOv7 |  ACT量化训练 | 640*640  |  **50.9**   | 36MB |   - |   -   |  **4.55ms**  |  [config](./configs/yolov7_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.onnx) |
+|  |  |  |  |  |  |  |  |  |
+| YOLOv7-Tiny |  Base模型 | 640*640  |  37.3   | 24MB  |  5.06ms  |   2.32ms   |  -  |  - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx) |
+| YOLOv7-Tiny |  离线量化 | 640*640  |  35.8   | 6.1MB  |   - |   -   |  1.68ms  |  - | - |
+| YOLOv7-Tiny |  ACT量化训练 | 640*640  |  **37.0**   | 6.1MB  |  - |   -   |  **1.68ms**  |  [config](./configs/yolov7_tiny_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_tiny_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_tiny_quant.onnx) |
+说明：
+- mAP的指标均在COCO val2017数据集中评测得到。
+- YOLOv7模型在Tesla T4的GPU环境下开启TensorRT 8.4.1，batch_size=1， 测试脚本是[cpp_infer](./cpp_infer)。
+## 3. 自动压缩流程
+#### 3.1 准备环境
+- PaddlePaddle >= 2.3.2版本 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)根据相应环境的安装指令进行安装）
+- PaddleSlim develop 版本
+（1）安装paddlepaddle
+```
+# CPU
+pip install paddlepaddle==2.3.2
+# GPU
+pip install paddlepaddle-gpu==2.3.2
+```
+（2）安装paddleslim：
+```shell
+git clone https://github.com/PaddlePaddle/PaddleSlim.git & cd PaddleSlim
+python setup.py install
+```
+#### 3.2 准备数据集
+本示例默认以COCO数据进行自动压缩实验，可以从[MS COCO官网](https://cocodataset.org)下载[Train](http://images.cocodataset.org/zips/train2017.zip)、[Val](http://images.cocodataset.org/zips/val2017.zip)、[annotation](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)。
+目录格式如下：
+```
+dataset/coco/
+├── annotations
+│   ├── instances_train2017.json
+│   ├── instances_val2017.json
+│   |   ...
+├── train2017
+│   ├── 000000000009.jpg
+│   ├── 000000580008.jpg
+│   |   ...
+├── val2017
+│   ├── 000000000139.jpg
+│   ├── 000000000285.jpg
+```
+如果是自定义数据集，请按照如上COCO数据格式准备数据。
+#### 3.3 准备预测模型
+（1）准备ONNX模型：
+- YOLOv5:
+  本示例模型使用[ultralytics/yolov5](https://github.com/ultralytics/yolov5)的master分支导出，要求v6.1之后的ONNX模型，可以根据官方的[导出教程](https://github.com/ultralytics/yolov5/issues/251)来准备ONNX模型。也可以下载准备好的[yolov5s.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx)。
+  ```shell
+  python export.py --weights yolov5s.pt --include onnx
+  ```
+- YOLOv6:
+  可通过[meituan/YOLOv6](https://github.com/meituan/YOLOv6)官方的[导出教程](https://github.com/meituan/YOLOv6/blob/main/deploy/ONNX/README.md)来准备ONNX模型。也可以下载已经准备好的[yolov6s.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov6s.onnx)。
+- YOLOv7: 可通过[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)的导出脚本来准备ONNX模型，具体步骤如下：
+  ```shell
+  git clone https://github.com/WongKinYiu/yolov7.git
+  python export.py --weights yolov7-tiny.pt --grid
+  ```
+  **注意**：目前ACT支持**不带NMS**模型，使用如上命令导出即可。也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx)。
+#### 3.4 自动压缩并产出模型
+蒸馏量化自动压缩示例通过run.py脚本启动，会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数，配置完成后便可对模型进行量化和蒸馏。
+本示例启动自动压缩以YOLOv7-Tiny为例，如果想要更换模型，可修改`--config_path`路径即可，具体运行命令为：
+- 单卡训练：
+```
+export CUDA_VISIBLE_DEVICES=0
+python run.py --config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/'
+```
+- 多卡训练：
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
+          --config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/'
+```
+#### 3.5 测试模型精度
+修改[yolov7_qat_dis.yaml](./configs/yolov7_qat_dis.yaml)中`model_dir`字段为模型存储路径，然后使用eval.py脚本得到模型的mAP：
+```
+export CUDA_VISIBLE_DEVICES=0
+python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml
+```
+## 4.预测部署
+#### 导出至ONNX使用TensorRT部署
+执行完自动压缩后会默认在`save_dir`中生成`quant_model.onnx`的ONNX模型文件，可以直接使用TensorRT测试脚本进行验证。
+- 进行测试：
+```shell
+python yolov7_onnx_trt.py --model_path=output/quant_model.onnx --image_file=images/000000570688.jpg --precision=int8
+```
+#### Paddle-TensorRT部署
+- C++部署
+进入[cpp_infer](./cpp_infer)文件夹内，请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译，然后开始测试：
+```shell
+# 编译
+bash complie.sh
+# 执行
+./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
+```
+- Python部署:
+首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)。
+然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署：
+```shell
+python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
+```
+## 5.FAQ
+- 如果想对模型进行离线量化，可进入[YOLO系列模型离线量化示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/pytorch_yolo_series)中进行实验。
--- a/example/auto_compression/pytorch_yolov5/configs/yolov5s_qat_dis.yaml
+++ b/example/auto_compression/pytorch_yolov5/configs/yolov5s_qat_dis.yaml
 Global:
-  reader_config: configs/yolov5_reader.yml
+  model_dir: ./yolov5s.onnx
-  input_list: {'image': 'x2paddle_images'}
+  dataset_dir: dataset/coco/
+  train_image_dir: train2017
+  val_image_dir: val2017
+  train_anno_path: annotations/instances_train2017.json
+  val_anno_path: annotations/instances_val2017.json
  Evaluation: True
-  arch: 'YOLOv5'
+  arch: YOLOv5
-  model_dir: ./yolov5s_infer
-  model_filename: model.pdmodel
-  params_filename: model.pdiparams
 Distillation:
  alpha: 1.0
  loss: soft_label
 Quantization:
+  onnx_format: true
  use_pact: true
  activation_quantize_type: 'moving_average_abs_max'
  quantize_op_types:

--- a/example/auto_compression/pytorch_yolov6/configs/yolov6s_qat_dis.yaml
+++ b/example/auto_compression/pytorch_yolov6/configs/yolov6s_qat_dis.yaml
 Global:
-  reader_config: configs/yolov6_reader.yml
+  model_dir: ./yolov6s.onnx
-  input_list: {'image': 'x2paddle_image_arrays'}
+  dataset_dir: dataset/coco/
+  train_image_dir: train2017
+  val_image_dir: val2017
+  train_anno_path: annotations/instances_train2017.json
+  val_anno_path: annotations/instances_val2017.json
  Evaluation: True
-  arch: 'YOLOv6'
+  arch: YOLOv6
-  model_dir: ./yolov6s_infer
-  model_filename: model.pdmodel
-  params_filename: model.pdiparams
 Distillation:
  alpha: 1.0
  loss: soft_label
 Quantization:
+  onnx_format: true
  activation_quantize_type: 'moving_average_abs_max'
  quantize_op_types:
  - conv2d

--- a/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
+++ b/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
 Global:
-  reader_config: configs/yolov7_reader.yaml
+  model_dir: ./yolov7.onnx
-  input_list: {'image': 'x2paddle_images'}
+  dataset_dir: dataset/coco/
+  train_image_dir: train2017
+  val_image_dir: val2017
+  train_anno_path: annotations/instances_train2017.json
+  val_anno_path: annotations/instances_val2017.json
  Evaluation: True
-  model_dir: ./yolov7_infer
+  arch: YOLOv7
-  model_filename: model.pdmodel
-  params_filename: model.pdiparams
 Distillation:
  alpha: 1.0
  loss: soft_label
 Quantization:
+  onnx_format: true
  activation_quantize_type: 'moving_average_abs_max'
  quantize_op_types:
  - conv2d
  - depthwise_conv2d
 TrainConfig:
-  train_iter: 8000
+  train_iter: 5000
  eval_iter: 1000
-  learning_rate: 
+  learning_rate:
    type: CosineAnnealingDecay 
    learning_rate: 0.00003
    T_max: 8000

--- a/example/auto_compression/pytorch_yolo_series/configs/yolov7_tiny_qat_dis.yaml
+++ b/example/auto_compression/pytorch_yolo_series/configs/yolov7_tiny_qat_dis.yaml
+Global:
+  model_dir: ./yolov7-tiny.onnx
+  dataset_dir: dataset/coco/
+  train_image_dir: train2017
+  val_image_dir: val2017
+  train_anno_path: annotations/instances_train2017.json
+  val_anno_path: annotations/instances_val2017.json
+  Evaluation: True
+  arch: YOLOv7
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+Quantization:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:
+    type: CosineAnnealingDecay 
+    learning_rate: 0.00003
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 0.00004
--- a/example/auto_compression/pytorch_yolov5/cpp_infer/CMakeLists.txt
+++ b/example/auto_compression/pytorch_yolov5/cpp_infer/CMakeLists.txt
--- a/example/auto_compression/pytorch_yolov5/cpp_infer/README.md
+++ b/example/auto_compression/pytorch_yolov5/cpp_infer/README.md
-# YOLOv5 TensorRT Benchmark测试（Linux）
+# YOLOv7 TensorRT Benchmark测试（Linux）
 ## 环境准备
@@ -22,21 +22,37 @@ CUDA_LIB=/usr/local/cuda/lib64
 TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
 ```
-## Paddle TensorRT测试
+## Paddle tensorRT测试
- FP32
+- YOLOv5
 ```
+# FP32
 ./build/trt_run --model_file yolov5s_infer/model.pdmodel --params_file yolov5s_infer/model.pdiparams --run_mode=trt_fp32
+# FP16
+./build/trt_run --model_file yolov5s_infer/model.pdmodel --params_file yolov5s_infer/model.pdiparams --run_mode=trt_fp16
+# INT8
+./build/trt_run --model_file yolov5s_quant/model.pdmodel --params_file yolov5s_quant/model.pdiparams --run_mode=trt_int8
 ```
- FP16
+- YOLOv6
 ```
-./build/trt_run --model_file yolov5s_infer/model.pdmodel --params_file yolov5s_infer/model.pdiparams --run_mode=trt_fp16
+# FP32
+./build/trt_run --arch=YOLOv6 --model_file yolov6s_infer/model.pdmodel --params_file yolov6s_infer/model.pdiparams --run_mode=trt_fp32
+# FP16
+./build/trt_run --arch=YOLOv6 --model_file yolov6s_infer/model.pdmodel --params_file yolov6s_infer/model.pdiparams --run_mode=trt_fp16
+# INT8
+./build/trt_run --arch=YOLOv6 --model_file yolov6s_quant/model.pdmodel --params_file yolov6s_quant/model.pdiparams --run_mode=trt_int8
 ```
- INT8
+- YOLOv7
 ```
-./build/trt_run --model_file yolov5s_quant/model.pdmodel --params_file yolov5s_quant/model.pdiparams --run_mode=trt_int8
+# FP32
+./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp32
+# FP16
+./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp16
+# INT8
+./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
 ```
 ## 原生TensorRT测试
@@ -49,6 +65,7 @@ trtexec --onnx=yolov5s.onnx --workspace=1024 --avgRuns=1000 --inputIOFormats=fp1
 # INT8
 trtexec --onnx=yolov5s.onnx --workspace=1024 --avgRuns=1000 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --int8
 ```
+- 注：可把--onnx=yolov5s.onnx替换成yolov6s.onnx和yolov7.onnx模型
 ## 性能对比
@@ -56,6 +73,12 @@ trtexec --onnx=yolov5s.onnx --workspace=1024 --avgRuns=1000 --inputIOFormats=fp1
 | :--------: | :--------: |:-------- |:--------: | :---------------------: |
 | Paddle TensorRT | yolov5s |   5.95ms  |   2.44ms   |  1.87ms  |
 | TensorRT  | yolov5s |   6.16ms  |   2.58ms   |  2.07ms  |
+|  |  |     |     |  |
+| Paddle TensorRT | YOLOv6s |   9.06ms  |   2.90ms   |  1.83ms  |
+| TensorRT | YOLOv6s |   8.59ms  |   2.83ms   |  1.87ms  |
+|  |  |     |     |  |
+| Paddle TensorRT | YOLOv7 |   26.84ms  |   7.44ms   |  4.55ms  |
+| TensorRT  | YOLOv7 |   28.25ms  |   7.23ms   |  4.67ms  |
 环境：
 - Tesla T4，TensorRT 8.4.1，CUDA 11.2

--- a/example/auto_compression/pytorch_yolov5/cpp_infer/compile.sh
+++ b/example/auto_compression/pytorch_yolov5/cpp_infer/compile.sh
--- a/example/auto_compression/pytorch_yolov7/cpp_infer/trt_run.cc
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/trt_run.cc
@@ -19,6 +19,7 @@ using phi::dtype::float16;
 DEFINE_string(model_dir, "", "Directory of the inference model.");
 DEFINE_string(model_file, "", "Path of the inference model file.");
 DEFINE_string(params_file, "", "Path of the inference params file.");
+DEFINE_string(arch, "YOLOv5", "Architectures name, can be: YOLOv5, YOLOv6, YOLOv7.");
 DEFINE_string(run_mode, "trt_fp32", "run_mode which can be: trt_fp32, trt_fp16 and trt_int8");
 DEFINE_int32(batch_size, 1, "Batch size.");
 DEFINE_int32(gpu_id, 0, "GPU card ID num.");
@@ -106,11 +107,15 @@ int main(int argc, char *argv[]) {
  using dtype = float16;
  std::vector<dtype> input_data(FLAGS_batch_size * 3 * 640 * 640, dtype(1.0));
+  int out_box_shape = 25200;
+  if (FLAGS_arch == "YOLOv6"){
+    out_box_shape = 8400;
+  }
  dtype *out_data;
-  int out_data_size = FLAGS_batch_size * 25200 * 85;
+  int out_data_size = FLAGS_batch_size * out_box_shape * 85;
  cudaHostAlloc((void**)&out_data, sizeof(float) * out_data_size, cudaHostAllocMapped);
-  std::vector<int> out_shape{ FLAGS_batch_size, 1, 25200, 85};
+  std::vector<int> out_shape{ FLAGS_batch_size, 1, out_box_shape, 85};
  run<dtype>(predictor.get(), input_data, input_shape, out_data, out_shape);
  return 0;
 }
--- a/example/auto_compression/pytorch_yolo_series/dataset.py
+++ b/example/auto_compression/pytorch_yolo_series/dataset.py
+from pycocotools.coco import COCO
+import cv2
+import os
+import numpy as np
+import paddle
+class COCOValDataset(paddle.io.Dataset):
+    def __init__(self,
+                 dataset_dir=None,
+                 image_dir=None,
+                 anno_path=None,
+                 img_size=[640, 640],
+                 input_name='x2paddle_images'):
+        self.dataset_dir = dataset_dir
+        self.image_dir = image_dir
+        self.img_size = img_size
+        self.input_name = input_name
+        self.ann_file = os.path.join(dataset_dir, anno_path)
+        self.coco = COCO(self.ann_file)
+        ori_ids = list(sorted(self.coco.imgs.keys()))
+        # check gt bbox
+        clean_ids = []
+        for idx in ori_ids:
+            ins_anno_ids = self.coco.getAnnIds(imgIds=[idx], iscrowd=False)
+            instances = self.coco.loadAnns(ins_anno_ids)
+            num_bbox = 0
+            for inst in instances:
+                if inst.get('ignore', False):
+                    continue
+                if 'bbox' not in inst.keys():
+                    continue
+                elif not any(np.array(inst['bbox'])):
+                    continue
+                else:
+                    num_bbox += 1
+            if num_bbox > 0:
+                clean_ids.append(idx)
+        self.ids = clean_ids
+    def __getitem__(self, idx):
+        img_id = self.ids[idx]
+        img = self._get_img_data_from_img_id(img_id)
+        img, scale_factor = self.image_preprocess(img, self.img_size)
+        return {
+            'image': img,
+            'im_id': np.array([img_id]),
+            'scale_factor': scale_factor
+        }
+    def __len__(self):
+        return len(self.ids)
+    def _get_img_data_from_img_id(self, img_id):
+        img_info = self.coco.loadImgs(img_id)[0]
+        img_path = os.path.join(self.dataset_dir, self.image_dir,
+                                img_info['file_name'])
+        img = cv2.imread(img_path)
+        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        return img
+    def _generate_scale(self, im, target_shape, keep_ratio=True):
+        """
+            Args:
+                im (np.ndarray): image (np.ndarray)
+            Returns:
+                im_scale_x: the resize ratio of X
+                im_scale_y: the resize ratio of Y
+            """
+        origin_shape = im.shape[:2]
+        if keep_ratio:
+            im_size_min = np.min(origin_shape)
+            im_size_max = np.max(origin_shape)
+            target_size_min = np.min(target_shape)
+            target_size_max = np.max(target_shape)
+            im_scale = float(target_size_min) / float(im_size_min)
+            if np.round(im_scale * im_size_max) > target_size_max:
+                im_scale = float(target_size_max) / float(im_size_max)
+            im_scale_x = im_scale
+            im_scale_y = im_scale
+        else:
+            resize_h, resize_w = target_shape
+            im_scale_y = resize_h / float(origin_shape[0])
+            im_scale_x = resize_w / float(origin_shape[1])
+        return im_scale_y, im_scale_x
+    def image_preprocess(self, img, target_shape):
+        # Resize image
+        im_scale_y, im_scale_x = self._generate_scale(img, target_shape)
+        img = cv2.resize(
+            img,
+            None,
+            None,
+            fx=im_scale_x,
+            fy=im_scale_y,
+            interpolation=cv2.INTER_LINEAR)
+        # Pad
+        im_h, im_w = img.shape[:2]
+        h, w = target_shape[:]
+        if h != im_h or w != im_w:
+            canvas = np.ones((h, w, 3), dtype=np.float32)
+            canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
+            canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
+            img = canvas
+        img = np.transpose(img / 255, [2, 0, 1])
+        scale_factor = np.array([im_scale_y, im_scale_x])
+        return img.astype(np.float32), scale_factor
+class COCOTrainDataset(COCOValDataset):
+    def __getitem__(self, idx):
+        img_id = self.ids[idx]
+        img = self._get_img_data_from_img_id(img_id)
+        img, scale_factor = self.image_preprocess(img, self.img_size)
+        return {self.input_name: img}
--- a/example/auto_compression/pytorch_yolov6/post_quant.py
+++ b/example/auto_compression/pytorch_yolov6/post_quant.py
@@ -16,14 +16,12 @@ import os
 import sys
 import numpy as np
 import argparse
+from tqdm import tqdm
 import paddle
-from ppdet.core.workspace import load_config, merge_config
+from paddleslim.common import load_config
-from ppdet.core.workspace import create
+from paddleslim.common import load_inference_model
-from ppdet.metrics import COCOMetric, VOCMetric
+from post_process import YOLOPostProcess, coco_metric
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from dataset import COCOValDataset
-from paddleslim.quant import quant_post_static
-from post_process import YOLOv6PostProcess
 def argsparser():
@@ -35,64 +33,62 @@ def argsparser():
        help="path of compression strategy config.",
        required=True)
    parser.add_argument(
-        '--save_dir',
+        '--batch_size', type=int, default=1, help="Batch size of model input.")
-        type=str,
-        default='ptq_out',
-        help="directory to save compressed model.")
    parser.add_argument(
        '--devices',
        type=str,
        default='gpu',
        help="which device used to compress.")
-    parser.add_argument(
-        '--algo', type=str, default='KL', help="post quant algo.")
    return parser
-def reader_wrapper(reader, input_list):
+def eval():
-    def gen():
-        for data in reader:
-            in_dict = {}
-            if isinstance(input_list, list):
-                for input_name in input_list:
-                    in_dict[input_name] = data[input_name]
-            elif isinstance(input_list, dict):
-                for input_name in input_list.keys():
-                    in_dict[input_list[input_name]] = data[input_name]
-            yield in_dict
-    return gen
+    place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
+    exe = paddle.static.Executor(place)
+    val_program, feed_target_names, fetch_targets = load_inference_model(
+        global_config["model_dir"], exe)
+    bboxes_list, bbox_nums_list, image_id_list = [], [], []
+    with tqdm(
+            total=len(val_loader),
+            bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}',
+            ncols=80) as t:
+        for data in val_loader:
+            data_all = {k: np.array(v) for k, v in data.items()}
+            outs = exe.run(val_program,
+                           feed={feed_target_names[0]: data_all['image']},
+                           fetch_list=fetch_targets,
+                           return_numpy=False)
+            postprocess = YOLOPostProcess(
+                score_threshold=0.001, nms_threshold=0.65, multi_label=True)
+            res = postprocess(np.array(outs[0]), data_all['scale_factor'])
+            bboxes_list.append(res['bbox'])
+            bbox_nums_list.append(res['bbox_num'])
+            image_id_list.append(np.array(data_all['im_id']))
+            t.update()
+    coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list)
 def main():
    global global_config
-    all_config = load_slim_config(FLAGS.config_path)
+    all_config = load_config(FLAGS.config_path)
-    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
    global_config = all_config["Global"]
-    reader_cfg = load_config(global_config['reader_config'])
-    train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
-                                        reader_cfg['worker_num'],
-                                        return_list=True)
-    train_loader = reader_wrapper(train_loader, global_config['input_list'])
-    place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
+    global val_loader
-    exe = paddle.static.Executor(place)
+    dataset = COCOValDataset(
-    quant_post_static(
+        dataset_dir=global_config['dataset_dir'],
-        executor=exe,
+        image_dir=global_config['val_image_dir'],
-        model_dir=global_config["model_dir"],
+        anno_path=global_config['val_anno_path'])
-        quantize_model_path=FLAGS.save_dir,
+    global anno_file
-        data_loader=train_loader,
+    anno_file = dataset.ann_file
-        model_filename=global_config["model_filename"],
+    val_loader = paddle.io.DataLoader(
-        params_filename=global_config["params_filename"],
+        dataset, batch_size=FLAGS.batch_size, drop_last=True)
-        batch_size=32,
-        batch_nums=10,
+    eval()
-        algo=FLAGS.algo,
-        hist_percent=0.999,
-        is_full_quantize=False,
-        bias_correction=False,
-        onnx_format=False)
 if __name__ == '__main__':

--- a/example/auto_compression/pytorch_yolov5/images/000000570688.jpg
+++ b/example/auto_compression/pytorch_yolov5/images/000000570688.jpg
--- a/example/auto_compression/pytorch_yolo_series/onnx_trt_infer.py
+++ b/example/auto_compression/pytorch_yolo_series/onnx_trt_infer.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+import cv2
+import tensorrt as trt
+import pycuda.driver as cuda
+import pycuda.autoinit
+import os
+import time
+import random
+import argparse
+EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
+EXPLICIT_PRECISION = 1 << (
+    int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION)
+# load coco labels
+CLASS_LABEL = [
+    "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train",
+    "truck", "boat", "traffic light", "fire hydrant", "stop sign",
+    "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
+    "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag",
+    "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
+    "baseball bat", "baseball glove", "skateboard", "surfboard",
+    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon",
+    "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot",
+    "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant",
+    "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote",
+    "keyboard", "cell phone", "microwave", "oven", "toaster", "sink",
+    "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
+    "hair drier", "toothbrush"
+]
+def preprocess(image, input_size, mean=None, std=None, swap=(2, 0, 1)):
+    if len(image.shape) == 3:
+        padded_img = np.ones((input_size[0], input_size[1], 3)) * 114.0
+    else:
+        padded_img = np.ones(input_size) * 114.0
+    img = np.array(image)
+    r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
+    resized_img = cv2.resize(
+        img,
+        (int(img.shape[1] * r), int(img.shape[0] * r)),
+        interpolation=cv2.INTER_LINEAR, ).astype(np.float32)
+    padded_img[:int(img.shape[0] * r), :int(img.shape[1] * r)] = resized_img
+    padded_img = padded_img[:, :, ::-1]
+    padded_img /= 255.0
+    if mean is not None:
+        padded_img -= mean
+    if std is not None:
+        padded_img /= std
+    padded_img = padded_img.transpose(swap)
+    padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
+    return padded_img, r
+def postprocess(predictions, ratio):
+    boxes = predictions[:, :4]
+    scores = predictions[:, 4:5] * predictions[:, 5:]
+    boxes_xyxy = np.ones_like(boxes)
+    boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2.
+    boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2.
+    boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2.
+    boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2.
+    boxes_xyxy /= ratio
+    dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.45, score_thr=0.1)
+    return dets
+def nms(boxes, scores, nms_thr):
+    """Single class NMS implemented in Numpy."""
+    x1 = boxes[:, 0]
+    y1 = boxes[:, 1]
+    x2 = boxes[:, 2]
+    y2 = boxes[:, 3]
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    order = scores.argsort()[::-1]
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        ovr = inter / (areas[i] + areas[order[1:]] - inter)
+        inds = np.where(ovr <= nms_thr)[0]
+        order = order[inds + 1]
+    return keep
+def multiclass_nms(boxes, scores, nms_thr, score_thr):
+    """Multiclass NMS implemented in Numpy"""
+    final_dets = []
+    num_classes = scores.shape[1]
+    for cls_ind in range(num_classes):
+        cls_scores = scores[:, cls_ind]
+        valid_score_mask = cls_scores > score_thr
+        if valid_score_mask.sum() == 0:
+            continue
+        else:
+            valid_scores = cls_scores[valid_score_mask]
+            valid_boxes = boxes[valid_score_mask]
+            keep = nms(valid_boxes, valid_scores, nms_thr)
+            if len(keep) > 0:
+                cls_inds = np.ones((len(keep), 1)) * cls_ind
+                dets = np.concatenate(
+                    [valid_boxes[keep], valid_scores[keep, None], cls_inds], 1)
+                final_dets.append(dets)
+    if len(final_dets) == 0:
+        return None
+    return np.concatenate(final_dets, 0)
+def get_color_map_list(num_classes):
+    color_map = num_classes * [0, 0, 0]
+    for i in range(0, num_classes):
+        j = 0
+        lab = i
+        while lab:
+            color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
+            color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
+            color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
+            j += 1
+            lab >>= 3
+    color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
+    return color_map
+def draw_box(img, boxes, scores, cls_ids, conf=0.5, class_names=None):
+    color_list = get_color_map_list(len(class_names))
+    for i in range(len(boxes)):
+        box = boxes[i]
+        cls_id = int(cls_ids[i])
+        color = tuple(color_list[cls_id])
+        score = scores[i]
+        if score < conf:
+            continue
+        x0 = int(box[0])
+        y0 = int(box[1])
+        x1 = int(box[2])
+        y1 = int(box[3])
+        text = '{}:{:.1f}%'.format(class_names[cls_id], score * 100)
+        font = cv2.FONT_HERSHEY_SIMPLEX
+        txt_size = cv2.getTextSize(text, font, 0.4, 1)[0]
+        cv2.rectangle(img, (x0, y0), (x1, y1), color, 2)
+        cv2.rectangle(img, (x0, y0 + 1),
+                      (x0 + txt_size[0] + 1, y0 + int(1.5 * txt_size[1])),
+                      color, -1)
+        cv2.putText(
+            img,
+            text, (x0, y0 + txt_size[1]),
+            font,
+            0.8, (0, 255, 0),
+            thickness=2)
+    return img
+def get_engine(precision, model_file_path):
+    # TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
+    TRT_LOGGER = trt.Logger()
+    builder = trt.Builder(TRT_LOGGER)
+    config = builder.create_builder_config()
+    if precision == 'int8':
+        network = builder.create_network(EXPLICIT_BATCH | EXPLICIT_PRECISION)
+    else:
+        network = builder.create_network(EXPLICIT_BATCH)
+    parser = trt.OnnxParser(network, TRT_LOGGER)
+    runtime = trt.Runtime(TRT_LOGGER)
+    if model_file_path.endswith('.trt'):
+        # If a serialized engine exists, use it instead of building an engine.
+        print("Reading engine from file {}".format(model_file_path))
+        with open(model_file_path,
+                  "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
+            engine = runtime.deserialize_cuda_engine(f.read())
+            for i in range(network.num_layers):
+                layer = network.get_layer(i)
+                print(i, layer.name)
+            return engine
+    else:
+        config.max_workspace_size = 1 << 30
+        if precision == "fp16":
+            if not builder.platform_has_fast_fp16:
+                print("FP16 is not supported natively on this platform/device")
+            else:
+                config.set_flag(trt.BuilderFlag.FP16)
+        elif precision == "int8":
+            if not builder.platform_has_fast_int8:
+                print("INT8 is not supported natively on this platform/device")
+            else:
+                if builder.platform_has_fast_fp16:
+                    # Also enable fp16, as some layers may be even more efficient in fp16 than int8
+                    config.set_flag(trt.BuilderFlag.FP16)
+                config.set_flag(trt.BuilderFlag.INT8)
+        builder.max_batch_size = 1
+        print('Loading ONNX file from path {}...'.format(model_file_path))
+        with open(model_file_path, 'rb') as model:
+            print('Beginning ONNX file parsing')
+            if not parser.parse(model.read()):
+                print('ERROR: Failed to parse the ONNX file.')
+                for error in range(parser.num_errors):
+                    print(parser.get_error(error))
+                return None
+        print('Completed parsing of ONNX file')
+        print('Building an engine from file {}; this may take a while...'.
+              format(model_file_path))
+        plan = builder.build_serialized_network(network, config)
+        engine = runtime.deserialize_cuda_engine(plan)
+        print("Completed creating Engine")
+        with open(model_file_path, "wb") as f:
+            f.write(engine.serialize())
+        for i in range(network.num_layers):
+            layer = network.get_layer(i)
+            print(i, layer.name)
+        return engine
+# Simple helper data class that's a little nicer to use than a 2-tuple.
+class HostDeviceMem(object):
+    def __init__(self, host_mem, device_mem):
+        self.host = host_mem
+        self.device = device_mem
+    def __str__(self):
+        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
+    def __repr__(self):
+        return self.__str__()
+def allocate_buffers(engine):
+    inputs = []
+    outputs = []
+    bindings = []
+    stream = cuda.Stream()
+    for binding in engine:
+        size = trt.volume(engine.get_binding_shape(
+            binding)) * engine.max_batch_size
+        dtype = trt.nptype(engine.get_binding_dtype(binding))
+        # Allocate host and device buffers
+        host_mem = cuda.pagelocked_empty(size, dtype)
+        device_mem = cuda.mem_alloc(host_mem.nbytes)
+        # Append the device buffer to device bindings.
+        bindings.append(int(device_mem))
+        # Append to the appropriate list.
+        if engine.binding_is_input(binding):
+            inputs.append(HostDeviceMem(host_mem, device_mem))
+        else:
+            outputs.append(HostDeviceMem(host_mem, device_mem))
+    return inputs, outputs, bindings, stream
+def run_inference(context, bindings, inputs, outputs, stream):
+    # Transfer input data to the GPU.
+    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
+    # Run inference.
+    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
+    # Transfer predictions back from the GPU.
+    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
+    # Synchronize the stream
+    stream.synchronize()
+    # Return only the host outputs.
+    return [out.host for out in outputs]
+def main(args):
+    onnx_model = args.model_path
+    img_path = args.image_file
+    num_class = len(CLASS_LABEL)
+    repeat = 1000
+    engine = get_engine(args.precision, onnx_model)
+    model_all_names = []
+    for idx in range(engine.num_bindings):
+        is_input = engine.binding_is_input(idx)
+        name = engine.get_binding_name(idx)
+        op_type = engine.get_binding_dtype(idx)
+        model_all_names.append(name)
+        shape = engine.get_binding_shape(idx)
+        print('input id:', idx, '   is input: ', is_input, '  binding name:',
+              name, '  shape:', shape, 'type: ', op_type)
+    context = engine.create_execution_context()
+    print('Allocate buffers ...')
+    inputs, outputs, bindings, stream = allocate_buffers(engine)
+    print("TRT set input ...")
+    origin_img = cv2.imread(img_path)
+    input_shape = [args.img_shape, args.img_shape]
+    input_image, ratio = preprocess(origin_img, input_shape)
+    inputs[0].host = np.expand_dims(input_image, axis=0)
+    for _ in range(0, 50):
+        trt_outputs = run_inference(
+            context,
+            bindings=bindings,
+            inputs=inputs,
+            outputs=outputs,
+            stream=stream)
+    time1 = time.time()
+    for _ in range(0, repeat):
+        trt_outputs = run_inference(
+            context,
+            bindings=bindings,
+            inputs=inputs,
+            outputs=outputs,
+            stream=stream)
+    time2 = time.time()
+    # total time cost(ms)
+    total_inference_cost = (time2 - time1) * 1000
+    print("model path: ", onnx_model, " precision: ", args.precision)
+    print("In TensorRT, ",
+          "average latency is : {} ms".format(total_inference_cost / repeat))
+    # Do postprocess
+    output = trt_outputs[0]
+    predictions = np.reshape(output, (1, -1, int(5 + num_class)))[0]
+    dets = postprocess(predictions, ratio)
+    # Draw rectangles and labels on the original image
+    if dets is not None:
+        final_boxes, final_scores, final_cls_inds = dets[:, :
+                                                         4], dets[:, 4], dets[:,
+                                                                              5]
+        origin_img = draw_box(
+            origin_img,
+            final_boxes,
+            final_scores,
+            final_cls_inds,
+            conf=0.5,
+            class_names=CLASS_LABEL)
+    cv2.imwrite('output.jpg', origin_img)
+    print('The prediction results are saved in output.jpg.')
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--model_path',
+        type=str,
+        default="quant_model.onnx",
+        help="inference model filepath")
+    parser.add_argument(
+        '--image_file', type=str, default="bus.jpg", help="image path")
+    parser.add_argument(
+        '--precision', type=str, default='fp32', help="support fp32/fp16/int8.")
+    parser.add_argument('--img_shape', type=int, default=640, help="input_size")
+    args = parser.parse_args()
+    main(args)
--- a/example/auto_compression/pytorch_yolov7/paddle_trt_infer.py
+++ b/example/auto_compression/pytorch_yolov7/paddle_trt_infer.py
@@ -244,8 +244,9 @@ def predict_image(predictor,
                  threshold=0.5,
                  arch='YOLOv5'):
    img, scale_factor = image_preprocess(image_file, image_shape)
-    inputs = {}
+    if arch == 'YOLOv6':
-    if arch == 'YOLOv5':
+        inputs['x2paddle_image_arrays'] = img
+    else:
        inputs['x2paddle_images'] = img
    input_names = predictor.get_input_names()
    for i in range(len(input_names)):
@@ -306,6 +307,8 @@ if __name__ == '__main__':
        default='GPU',
        help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
    )
+    parser.add_argument(
+        '--arch', type=str, default='YOLOv5', help="architectures name.")
    parser.add_argument('--img_shape', type=int, default=640, help="input_size")
    args = parser.parse_args()
@@ -319,4 +322,5 @@ if __name__ == '__main__':
        args.image_file,
        image_shape=[args.img_shape, args.img_shape],
        warmup=warmup,
-        repeats=repeats)
+        repeats=repeats,
+        arch=args.arch)
--- a/example/auto_compression/pytorch_yolov7/post_process.py
+++ b/example/auto_compression/pytorch_yolov7/post_process.py
@@ -14,6 +14,8 @@
 import numpy as np
 import cv2
+import json
+import sys
 def box_area(boxes):
@@ -68,9 +70,9 @@ def nms(boxes, scores, iou_threshold):
    return keep
-class YOLOv7PostProcess(object):
+class YOLOPostProcess(object):
    """
-    Post process of YOLOv6 network.
+    Post process of YOLO-series network.
    args:
        score_threshold(float): Threshold to filter out bounding boxes with low 
                confidence score. If not provided, consider all boxes.
@@ -157,8 +159,8 @@ class YOLOv7PostProcess(object):
            if len(pred.shape) == 1:
                pred = pred[np.newaxis, :]
            pred_bboxes = pred[:, :4]
-            scale_factor = np.tile(scale_factor[i][::-1], (1, 2))
+            scale = np.tile(scale_factor[i][::-1], (2))
-            pred_bboxes /= scale_factor
+            pred_bboxes /= scale
            bbox = np.concatenate(
                [
                    pred[:, -1][:, np.newaxis], pred[:, -2][:, np.newaxis],
@@ -171,3 +173,59 @@ class YOLOv7PostProcess(object):
        bboxs = np.concatenate(bboxs, axis=0)
        box_nums = np.array(box_nums)
        return {'bbox': bboxs, 'bbox_num': box_nums}
+def coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list):
+    try:
+        from pycocotools.coco import COCO
+        from pycocotools.cocoeval import COCOeval
+    except:
+        print(
+            "[ERROR] Not found pycocotools, please install by `pip install pycocotools`"
+        )
+        sys.exit(1)
+    coco_gt = COCO(anno_file)
+    cats = coco_gt.loadCats(coco_gt.getCatIds())
+    clsid2catid = {i: cat['id'] for i, cat in enumerate(cats)}
+    results = []
+    for bboxes, bbox_nums, image_id in zip(bboxes_list, bbox_nums_list,
+                                           image_id_list):
+        results += _get_det_res(bboxes, bbox_nums, image_id, clsid2catid)
+    output = "bbox.json"
+    with open(output, 'w') as f:
+        json.dump(results, f)
+    coco_dt = coco_gt.loadRes(output)
+    coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')
+    coco_eval.evaluate()
+    coco_eval.accumulate()
+    coco_eval.summarize()
+    return coco_eval.stats
+def _get_det_res(bboxes, bbox_nums, image_id, label_to_cat_id_map):
+    det_res = []
+    k = 0
+    for i in range(len(bbox_nums)):
+        cur_image_id = int(image_id[i][0])
+        det_nums = bbox_nums[i]
+        for j in range(det_nums):
+            dt = bboxes[k]
+            k = k + 1
+            num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
+            if int(num_id) < 0:
+                continue
+            category_id = label_to_cat_id_map[int(num_id)]
+            w = xmax - xmin
+            h = ymax - ymin
+            bbox = [xmin, ymin, w, h]
+            dt_res = {
+                'image_id': cur_image_id,
+                'category_id': category_id,
+                'bbox': bbox,
+                'score': score
+            }
+            det_res.append(dt_res)
+    return det_res
--- a/example/auto_compression/pytorch_yolov6/run.py
+++ b/example/auto_compression/pytorch_yolov6/run.py
@@ -16,14 +16,12 @@ import os
 import sys
 import numpy as np
 import argparse
+from tqdm import tqdm
 import paddle
-from ppdet.core.workspace import load_config, merge_config
+from paddleslim.common import load_config
-from ppdet.core.workspace import create
-from ppdet.metrics import COCOMetric, VOCMetric
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
 from paddleslim.auto_compression import AutoCompression
+from dataset import COCOValDataset, COCOTrainDataset
-from post_process import YOLOv6PostProcess
+from post_process import YOLOPostProcess, coco_metric
 def argsparser():
@@ -50,124 +48,72 @@ def argsparser():
    return parser
-def reader_wrapper(reader, input_list):
-    def gen():
-        for data in reader:
-            in_dict = {}
-            if isinstance(input_list, list):
-                for input_name in input_list:
-                    in_dict[input_name] = data[input_name]
-            elif isinstance(input_list, dict):
-                for input_name in input_list.keys():
-                    in_dict[input_list[input_name]] = data[input_name]
-            yield in_dict
-    return gen
-def convert_numpy_data(data, metric):
-    data_all = {}
-    data_all = {k: np.array(v) for k, v in data.items()}
-    if isinstance(metric, VOCMetric):
-        for k, v in data_all.items():
-            if not isinstance(v[0], np.ndarray):
-                tmp_list = []
-                for t in v:
-                    tmp_list.append(np.array(t))
-                data_all[k] = np.array(tmp_list)
-    else:
-        data_all = {k: np.array(v) for k, v in data.items()}
-    return data_all
 def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
-    metric = global_config['metric']
+    bboxes_list, bbox_nums_list, image_id_list = [], [], []
-    for batch_id, data in enumerate(val_loader):
+    with tqdm(
-        data_all = convert_numpy_data(data, metric)
+            total=len(val_loader),
-        data_input = {}
+            bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}',
-        for k, v in data.items():
+            ncols=80) as t:
-            if isinstance(global_config['input_list'], list):
+        for data in val_loader:
-                if k in test_feed_names:
+            data_all = {k: np.array(v) for k, v in data.items()}
-                    data_input[k] = np.array(v)
+            outs = exe.run(compiled_test_program,
-            elif isinstance(global_config['input_list'], dict):
+                           feed={test_feed_names[0]: data_all['image']},
-                if k in global_config['input_list'].keys():
+                           fetch_list=test_fetch_list,
-                    data_input[global_config['input_list'][k]] = np.array(v)
+                           return_numpy=False)
-        outs = exe.run(compiled_test_program,
+            res = {}
-                       feed=data_input,
+            postprocess = YOLOPostProcess(
-                       fetch_list=test_fetch_list,
-                       return_numpy=False)
-        res = {}
-        if 'arch' in global_config and global_config['arch'] == 'YOLOv6':
-            postprocess = YOLOv6PostProcess(
                score_threshold=0.001, nms_threshold=0.65, multi_label=True)
            res = postprocess(np.array(outs[0]), data_all['scale_factor'])
-        else:
+            bboxes_list.append(res['bbox'])
-            for out in outs:
+            bbox_nums_list.append(res['bbox_num'])
-                v = np.array(out)
+            image_id_list.append(np.array(data_all['im_id']))
-                if len(v.shape) > 1:
+            t.update()
-                    res['bbox'] = v
+    map_res = coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list)
-                else:
+    return map_res[0]
-                    res['bbox_num'] = v
-        metric.update(data_all, res)
-        if batch_id % 100 == 0:
-            print('Eval iter:', batch_id)
-    metric.accumulate()
-    metric.log()
-    map_res = metric.get_results()
-    metric.reset()
-    return map_res['bbox'][0]
 def main():
    global global_config
-    all_config = load_slim_config(FLAGS.config_path)
+    all_config = load_config(FLAGS.config_path)
    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
    global_config = all_config["Global"]
-    reader_cfg = load_config(global_config['reader_config'])
+    input_name = 'x2paddle_image_arrays' if global_config[
+        'arch'] == 'YOLOv6' else 'x2paddle_images'
-    train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
+    dataset = COCOTrainDataset(
-                                        reader_cfg['worker_num'],
+        dataset_dir=global_config['dataset_dir'],
-                                        return_list=True)
+        image_dir=global_config['train_image_dir'],
-    train_loader = reader_wrapper(train_loader, global_config['input_list'])
+        anno_path=global_config['train_anno_path'],
+        input_name=input_name)
+    train_loader = paddle.io.DataLoader(
+        dataset, batch_size=1, shuffle=True, drop_last=True, num_workers=0)
    if 'Evaluation' in global_config.keys() and global_config[
            'Evaluation'] and paddle.distributed.get_rank() == 0:
        eval_func = eval_function
-        dataset = reader_cfg['EvalDataset']
        global val_loader
-        _eval_batch_sampler = paddle.io.BatchSampler(
+        dataset = COCOValDataset(
-            dataset, batch_size=reader_cfg['EvalReader']['batch_size'])
+            dataset_dir=global_config['dataset_dir'],
-        val_loader = create('EvalReader')(dataset,
+            image_dir=global_config['val_image_dir'],
-                                          reader_cfg['worker_num'],
+            anno_path=global_config['val_anno_path'])
-                                          batch_sampler=_eval_batch_sampler,
+        global anno_file
-                                          return_list=True)
+        anno_file = dataset.ann_file
-        metric = None
+        val_loader = paddle.io.DataLoader(
-        if reader_cfg['metric'] == 'COCO':
+            dataset,
-            clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
+            batch_size=1,
-            anno_file = dataset.get_anno()
+            shuffle=False,
-            metric = COCOMetric(
+            drop_last=False,
-                anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
+            num_workers=0)
-        elif reader_cfg['metric'] == 'VOC':
-            metric = VOCMetric(
-                label_list=dataset.get_label_list(),
-                class_num=reader_cfg['num_classes'],
-                map_type=reader_cfg['map_type'])
-        else:
-            raise ValueError("metric currently only supports COCO and VOC.")
-        global_config['metric'] = metric
    else:
        eval_func = None
    ac = AutoCompression(
        model_dir=global_config["model_dir"],
-        model_filename=global_config["model_filename"],
+        train_dataloader=train_loader,
-        params_filename=global_config["params_filename"],
        save_dir=FLAGS.save_dir,
        config=all_config,
-        train_dataloader=train_loader,
        eval_callback=eval_func)
    ac.compress()
+    ac.export_onnx()
 if __name__ == '__main__':

--- a/example/auto_compression/pytorch_yolov5/README.md
+++ b/example/auto_compression/pytorch_yolov5/README.md
-# YOLOv5目标检测模型自动压缩示例
-目录：
- [1.简介](#1简介)
- [2.Benchmark](#2Benchmark)
- [3.开始自动压缩](#自动压缩流程)
-  - [3.1 环境准备](#31-准备环境)
-  - [3.2 准备数据集](#32-准备数据集)
-  - [3.3 准备预测模型](#33-准备预测模型)
-  - [3.4 测试模型精度](#34-测试模型精度)
-  - [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
-## 1. 简介
-飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨（PaddlePaddle）的预测模型。借助X2Paddle的能力，各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。
-本示例将以[ultralytics/yolov5](https://github.com/ultralytics/yolov5)目标检测模型为例，将PyTorch框架模型转换为Paddle框架模型，再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。
-## 2.Benchmark
-| 模型  |  策略  | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |  配置文件 | Inference模型  |
-| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
-| YOLOv5s |  Base模型 | 640*640  |  37.4   |   5.95ms  |   2.44ms   |  -  |  - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/detection/yolov5s_infer.tar) |
-| YOLOv5s |  KL离线量化 | 640*640  |  36.0   |   - |   -   |  1.87ms  |  - | - |
-| YOLOv5s |  量化蒸馏训练 | 640*640  |  **36.9**   |   - |   -   |  **1.87ms**  |  [config](./configs/yolov5s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov5s_quant.tar) |
-说明：
- mAP的指标均在COCO val2017数据集中评测得到。
- YOLOv5s模型在Tesla T4的GPU环境下开启TensorRT 8.4.1，batch_size=1， 测试脚本是[cpp_infer](./cpp_infer)。
-## 3. 自动压缩流程
-#### 3.1 准备环境
- PaddlePaddle >= 2.3 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
- PaddleSlim >= 2.3
- PaddleDet >= 2.4
- [X2Paddle](https://github.com/PaddlePaddle/X2Paddle) >= 1.3.6
- opencv-python
-（1）安装paddlepaddle：
-```shell
-# CPU
-pip install paddlepaddle
-# GPU
-pip install paddlepaddle-gpu
-```
-（2）安装paddleslim：
-```shell
-pip install paddleslim
-```
-（3）安装paddledet：
-```shell
-pip install paddledet
-```
-注：安装PaddleDet的目的是为了直接使用PaddleDetection中的Dataloader组件。
-（4）安装X2Paddle的1.3.6以上版本：
-```shell
-pip install x2paddle sympy onnx
-```
-#### 3.2 准备数据集
-本案例默认以COCO数据进行自动压缩实验，并且依赖PaddleDetection中数据读取模块，如果自定义COCO数据，或者其他格式数据，请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。
-如果已经准备好数据集，请直接修改[./configs/yolov6_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。
-#### 3.3 准备预测模型
-（1）准备ONNX模型：
-可通过[ultralytics/yolov5](https://github.com/ultralytics/yolov5) 官方的[导出教程](https://github.com/ultralytics/yolov5/issues/251)来准备ONNX模型。也可以下载准备好的[yolov5s.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx)。
-```shell
-python export.py --weights yolov5s.pt --include onnx
-```
-(2) 转换模型：
-```shell
-x2paddle --framework=onnx --model=yolov5s.onnx --save_dir=pd_model
-cp -r pd_model/inference_model/ yolov5s_infer
-```
-即可得到YOLOv5s模型的预测模型（`model.pdmodel` 和 `model.pdiparams`）。如想快速体验，可直接下载上方表格中YOLOv5s的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/detection/yolov5s_infer.tar)。
-预测模型的格式为：`model.pdmodel` 和 `model.pdiparams`两个，带`pdmodel`的是模型文件，带`pdiparams`后缀的是权重文件。
-#### 3.4 自动压缩并产出模型
-蒸馏量化自动压缩示例通过run.py脚本启动，会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数，配置完成后便可对模型进行量化和蒸馏。具体运行命令为：
- 单卡训练：
-```
-export CUDA_VISIBLE_DEVICES=0
-python run.py --config_path=./configs/yolov5s_qat_dis.yaml --save_dir='./output/'
-```
- 多卡训练：
-```
-CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
-          --config_path=./configs/yolov5s_qat_dis.yaml --save_dir='./output/'
-```
-#### 3.5 测试模型精度
-使用eval.py脚本得到模型的mAP：
-```
-export CUDA_VISIBLE_DEVICES=0
-python eval.py --config_path=./configs/yolov5s_qat_dis.yaml
-```
-**注意**：如果要测试量化后的模型，模型路径需要在配置文件中`model_dir`字段下进行修改指定。
-## 4.预测部署
-#### Paddle-TensorRT C++部署
-进入[cpp_infer](./cpp_infer)文件夹内，请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译，然后开始测试：
-```shell
-# 编译
-bash complie.sh
-# 执行
-./build/trt_run --model_file yolov5s_quant/model.pdmodel --params_file yolov5s_quant/model.pdiparams --run_mode=trt_int8
-```
-#### Paddle-TensorRT Python部署:
-首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)。
-然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署：
-```shell
-python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
-```
-## 5.FAQ
- 如果想测试离线量化模型精度，可执行：
-```shell
-python post_quant.py --config_path=./configs/yolov5s_qat_dis.yaml
-```
--- a/example/auto_compression/pytorch_yolov5/configs/yolov5_reader.yml
+++ b/example/auto_compression/pytorch_yolov5/configs/yolov5_reader.yml
-metric: COCO
-num_classes: 80
-# Datset configuration
-TrainDataset:
-  !COCODataSet
-    image_dir: train2017
-    anno_path: annotations/instances_train2017.json
-    dataset_dir: dataset/coco/
-EvalDataset:
-  !COCODataSet
-    image_dir: val2017
-    anno_path: annotations/instances_val2017.json
-    dataset_dir: dataset/coco/
-worker_num: 0
-# preprocess reader in test
-EvalReader:
-  sample_transforms:
-    - Decode: {}
-    - Resize: {target_size: [640, 640], keep_ratio: True}
-    - Pad: {size: [640, 640], fill_value: [114., 114., 114.]}
-    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
-    - Permute: {}
-  batch_size: 1
--- a/example/auto_compression/pytorch_yolov5/cpp_infer/trt_run.cc
+++ b/example/auto_compression/pytorch_yolov5/cpp_infer/trt_run.cc
-#include <chrono>
-#include <iostream>
-#include <memory>
-#include <numeric>
-#include <gflags/gflags.h>
-#include <glog/logging.h>
-#include <cuda_runtime.h>
-#include "paddle/include/paddle_inference_api.h"
-#include "paddle/include/experimental/phi/common/float16.h"
-using paddle_infer::Config;
-using paddle_infer::Predictor;
-using paddle_infer::CreatePredictor;
-using paddle_infer::PrecisionType;
-using phi::dtype::float16;
-DEFINE_string(model_dir, "", "Directory of the inference model.");
-DEFINE_string(model_file, "", "Path of the inference model file.");
-DEFINE_string(params_file, "", "Path of the inference params file.");
-DEFINE_string(run_mode, "trt_fp32", "run_mode which can be: trt_fp32, trt_fp16 and trt_int8");
-DEFINE_int32(batch_size, 1, "Batch size.");
-DEFINE_int32(gpu_id, 0, "GPU card ID num.");
-DEFINE_int32(trt_min_subgraph_size, 3, "tensorrt min_subgraph_size");
-DEFINE_int32(warmup, 50, "warmup");
-DEFINE_int32(repeats, 1000, "repeats");
-using Time = decltype(std::chrono::high_resolution_clock::now());
-Time time() { return std::chrono::high_resolution_clock::now(); };
-double time_diff(Time t1, Time t2) {
-  typedef std::chrono::microseconds ms;
-  auto diff = t2 - t1;
-  ms counter = std::chrono::duration_cast<ms>(diff);
-  return counter.count() / 1000.0;
-}
-std::shared_ptr<Predictor> InitPredictor() {
-  Config config;
-  std::string model_path;
-  if (FLAGS_model_dir != "") {
-    config.SetModel(FLAGS_model_dir);
-    model_path = FLAGS_model_dir.substr(0, FLAGS_model_dir.find_last_of("/"));
-  } else {
-    config.SetModel(FLAGS_model_file, FLAGS_params_file);
-    model_path = FLAGS_model_file.substr(0, FLAGS_model_file.find_last_of("/"));
-  }
-  // enable tune
-  std::cout << "model_path: " << model_path << std::endl;
-  config.EnableUseGpu(256, FLAGS_gpu_id);
-  if (FLAGS_run_mode == "trt_fp32") {
-    config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
-                                PrecisionType::kFloat32, false, false);
-  } else if (FLAGS_run_mode == "trt_fp16") {
-    config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
-                                PrecisionType::kHalf, false, false);
-  } else if (FLAGS_run_mode == "trt_int8") {
-    config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
-                                PrecisionType::kInt8, false, false);
-  }
-  config.EnableMemoryOptim();
-  config.SwitchIrOptim(true);
-  return CreatePredictor(config);
-}
-template <typename type>
-void run(Predictor *predictor, const std::vector<type> &input,
-         const std::vector<int> &input_shape, type* out_data, std::vector<int> out_shape) {
-    // prepare input
-    int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1,
-                                    std::multiplies<int>());
-    auto input_names = predictor->GetInputNames();
-    auto input_t = predictor->GetInputHandle(input_names[0]);
-    input_t->Reshape(input_shape);
-    input_t->CopyFromCpu(input.data());
-  for (int i = 0; i < FLAGS_warmup; ++i)
-    CHECK(predictor->Run());
-  auto st = time();
-  for (int i = 0; i < FLAGS_repeats; ++i) {
-    auto input_names = predictor->GetInputNames();
-    auto input_t = predictor->GetInputHandle(input_names[0]);
-    input_t->Reshape(input_shape);
-    input_t->CopyFromCpu(input.data());
-    CHECK(predictor->Run());
-    auto output_names = predictor->GetOutputNames();
-    auto output_t = predictor->GetOutputHandle(output_names[0]);
-    std::vector<int> output_shape = output_t->shape();
-    output_t -> ShareExternalData<type>(out_data, out_shape, paddle_infer::PlaceType::kGPU);
-  }
-  LOG(INFO) << "[" << FLAGS_run_mode << " bs-" << FLAGS_batch_size << " ] run avg time is " << time_diff(st, time()) / FLAGS_repeats
-            << " ms";
-}
-int main(int argc, char *argv[]) {
-  google::ParseCommandLineFlags(&argc, &argv, true);
-  auto predictor = InitPredictor();
-  std::vector<int> input_shape = {FLAGS_batch_size, 3, 640, 640};
-  // float16
-  using dtype = float16;
-  std::vector<dtype> input_data(FLAGS_batch_size * 3 * 640 * 640, dtype(1.0));
-  dtype *out_data;
-  int out_data_size = FLAGS_batch_size * 25200 * 85;
-  cudaHostAlloc((void**)&out_data, sizeof(float) * out_data_size, cudaHostAllocMapped);
-  std::vector<int> out_shape{ FLAGS_batch_size, 1, 25200, 85};
-  run<dtype>(predictor.get(), input_data, input_shape, out_data, out_shape);
-  return 0;
-}
--- a/example/auto_compression/pytorch_yolov5/eval.py
+++ b/example/auto_compression/pytorch_yolov5/eval.py
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import os
-import sys
-import numpy as np
-import argparse
-import paddle
-from ppdet.core.workspace import load_config, merge_config
-from ppdet.core.workspace import create
-from ppdet.metrics import COCOMetric, VOCMetric
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
-from post_process import YOLOv5PostProcess
-def argsparser():
-    parser = argparse.ArgumentParser(description=__doc__)
-    parser.add_argument(
-        '--config_path',
-        type=str,
-        default=None,
-        help="path of compression strategy config.",
-        required=True)
-    parser.add_argument(
-        '--devices',
-        type=str,
-        default='gpu',
-        help="which device used to compress.")
-    return parser
-def reader_wrapper(reader, input_list):
-    def gen():
-        for data in reader:
-            in_dict = {}
-            if isinstance(input_list, list):
-                for input_name in input_list:
-                    in_dict[input_name] = data[input_name]
-            elif isinstance(input_list, dict):
-                for input_name in input_list.keys():
-                    in_dict[input_list[input_name]] = data[input_name]
-            yield in_dict
-    return gen
-def convert_numpy_data(data, metric):
-    data_all = {}
-    data_all = {k: np.array(v) for k, v in data.items()}
-    if isinstance(metric, VOCMetric):
-        for k, v in data_all.items():
-            if not isinstance(v[0], np.ndarray):
-                tmp_list = []
-                for t in v:
-                    tmp_list.append(np.array(t))
-                data_all[k] = np.array(tmp_list)
-    else:
-        data_all = {k: np.array(v) for k, v in data.items()}
-    return data_all
-def eval():
-    place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
-    exe = paddle.static.Executor(place)
-    val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(
-        global_config["model_dir"],
-        exe,
-        model_filename=global_config["model_filename"],
-        params_filename=global_config["params_filename"])
-    print('Loaded model from: {}'.format(global_config["model_dir"]))
-    metric = global_config['metric']
-    for batch_id, data in enumerate(val_loader):
-        data_all = convert_numpy_data(data, metric)
-        data_input = {}
-        for k, v in data.items():
-            if isinstance(global_config['input_list'], list):
-                if k in global_config['input_list']:
-                    data_input[k] = np.array(v)
-            elif isinstance(global_config['input_list'], dict):
-                if k in global_config['input_list'].keys():
-                    data_input[global_config['input_list'][k]] = np.array(v)
-        outs = exe.run(val_program,
-                       feed=data_input,
-                       fetch_list=fetch_targets,
-                       return_numpy=False)
-        res = {}
-        if 'arch' in global_config and global_config['arch'] == 'YOLOv5':
-            postprocess = YOLOv5PostProcess(
-                score_threshold=0.001, nms_threshold=0.6, multi_label=True)
-            res = postprocess(np.array(outs[0]), data_all['scale_factor'])
-        else:
-            for out in outs:
-                v = np.array(out)
-                if len(v.shape) > 1:
-                    res['bbox'] = v
-                else:
-                    res['bbox_num'] = v
-        metric.update(data_all, res)
-        if batch_id % 100 == 0:
-            print('Eval iter:', batch_id)
-    metric.accumulate()
-    metric.log()
-    metric.reset()
-def main():
-    global global_config
-    all_config = load_slim_config(FLAGS.config_path)
-    global_config = all_config["Global"]
-    reader_cfg = load_config(global_config['reader_config'])
-    dataset = reader_cfg['EvalDataset']
-    global val_loader
-    val_loader = create('EvalReader')(reader_cfg['EvalDataset'],
-                                      reader_cfg['worker_num'],
-                                      return_list=True)
-    metric = None
-    if reader_cfg['metric'] == 'COCO':
-        clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
-        anno_file = dataset.get_anno()
-        metric = COCOMetric(
-            anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
-    elif reader_cfg['metric'] == 'VOC':
-        metric = VOCMetric(
-            label_list=dataset.get_label_list(),
-            class_num=reader_cfg['num_classes'],
-            map_type=reader_cfg['map_type'])
-    else:
-        raise ValueError("metric currently only supports COCO and VOC.")
-    global_config['metric'] = metric
-    eval()
-if __name__ == '__main__':
-    paddle.enable_static()
-    parser = argsparser()
-    FLAGS = parser.parse_args()
-    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
-    paddle.set_device(FLAGS.devices)
-    main()
--- a/example/auto_compression/pytorch_yolov5/paddle_trt_infer.py
+++ b/example/auto_compression/pytorch_yolov5/paddle_trt_infer.py
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import os
-import cv2
-import numpy as np
-import argparse
-import time
-from paddle.inference import Config
-from paddle.inference import create_predictor
-from post_process import YOLOv5PostProcess
-CLASS_LABEL = [
-    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
-    'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
-    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
-    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
-    'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
-    'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
-    'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
-    'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
-    'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
-    'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
-    'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
-    'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
-    'hair drier', 'toothbrush'
-]
-def generate_scale(im, target_shape, keep_ratio=True):
-    """
-        Args:
-            im (np.ndarray): image (np.ndarray)
-        Returns:
-            im_scale_x: the resize ratio of X
-            im_scale_y: the resize ratio of Y
-        """
-    origin_shape = im.shape[:2]
-    if keep_ratio:
-        im_size_min = np.min(origin_shape)
-        im_size_max = np.max(origin_shape)
-        target_size_min = np.min(target_shape)
-        target_size_max = np.max(target_shape)
-        im_scale = float(target_size_min) / float(im_size_min)
-        if np.round(im_scale * im_size_max) > target_size_max:
-            im_scale = float(target_size_max) / float(im_size_max)
-        im_scale_x = im_scale
-        im_scale_y = im_scale
-    else:
-        resize_h, resize_w = target_shape
-        im_scale_y = resize_h / float(origin_shape[0])
-        im_scale_x = resize_w / float(origin_shape[1])
-    return im_scale_y, im_scale_x
-def image_preprocess(img_path, target_shape):
-    img = cv2.imread(img_path)
-    # Resize
-    im_scale_y, im_scale_x = generate_scale(img, target_shape)
-    img = cv2.resize(
-        img,
-        None,
-        None,
-        fx=im_scale_x,
-        fy=im_scale_y,
-        interpolation=cv2.INTER_LINEAR)
-    # Pad
-    im_h, im_w = img.shape[:2]
-    h, w = target_shape[:]
-    if h != im_h or w != im_w:
-        canvas = np.ones((h, w, 3), dtype=np.float32)
-        canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
-        canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
-        img = canvas
-    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-    img = np.transpose(img, [2, 0, 1]) / 255
-    img = np.expand_dims(img, 0)
-    scale_factor = np.array([[im_scale_y, im_scale_x]])
-    return img.astype(np.float32), scale_factor
-def get_color_map_list(num_classes):
-    color_map = num_classes * [0, 0, 0]
-    for i in range(0, num_classes):
-        j = 0
-        lab = i
-        while lab:
-            color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
-            color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
-            color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
-            j += 1
-            lab >>= 3
-    color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
-    return color_map
-def draw_box(image_file, results, class_label, threshold=0.5):
-    srcimg = cv2.imread(image_file, 1)
-    for i in range(len(results)):
-        color_list = get_color_map_list(len(class_label))
-        clsid2color = {}
-        classid, conf = int(results[i, 0]), results[i, 1]
-        if conf < threshold:
-            continue
-        xmin, ymin, xmax, ymax = int(results[i, 2]), int(results[i, 3]), int(
-            results[i, 4]), int(results[i, 5])
-        if classid not in clsid2color:
-            clsid2color[classid] = color_list[classid]
-        color = tuple(clsid2color[classid])
-        cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2)
-        print(class_label[classid] + ': ' + str(round(conf, 3)))
-        cv2.putText(
-            srcimg,
-            class_label[classid] + ':' + str(round(conf, 3)), (xmin, ymin - 10),
-            cv2.FONT_HERSHEY_SIMPLEX,
-            0.8, (0, 255, 0),
-            thickness=2)
-    return srcimg
-def load_predictor(model_dir,
-                   run_mode='paddle',
-                   batch_size=1,
-                   device='CPU',
-                   min_subgraph_size=3,
-                   use_dynamic_shape=False,
-                   trt_min_shape=1,
-                   trt_max_shape=1280,
-                   trt_opt_shape=640,
-                   trt_calib_mode=False,
-                   cpu_threads=1,
-                   enable_mkldnn=False,
-                   enable_mkldnn_bfloat16=False,
-                   delete_shuffle_pass=False):
-    """set AnalysisConfig, generate AnalysisPredictor
-    Args:
-        model_dir (str): root path of __model__ and __params__
-        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
-        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8)
-        use_dynamic_shape (bool): use dynamic shape or not
-        trt_min_shape (int): min shape for dynamic shape in trt
-        trt_max_shape (int): max shape for dynamic shape in trt
-        trt_opt_shape (int): opt shape for dynamic shape in trt
-        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
-            calibration, trt_calib_mode need to set True
-        delete_shuffle_pass (bool): whether to remove shuffle_channel_detect_pass in TensorRT. 
-                                    Used by action model.
-    Returns:
-        predictor (PaddlePredictor): AnalysisPredictor
-    Raises:
-        ValueError: predict by TensorRT need device == 'GPU'.
-    """
-    if device != 'GPU' and run_mode != 'paddle':
-        raise ValueError(
-            "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
-            .format(run_mode, device))
-    config = Config(
-        os.path.join(model_dir, 'model.pdmodel'),
-        os.path.join(model_dir, 'model.pdiparams'))
-    if device == 'GPU':
-        # initial GPU memory(M), device ID
-        config.enable_use_gpu(200, 0)
-        # optimize graph and fuse op
-        config.switch_ir_optim(True)
-    elif device == 'XPU':
-        config.enable_lite_engine()
-        config.enable_xpu(10 * 1024 * 1024)
-    else:
-        config.disable_gpu()
-        config.set_cpu_math_library_num_threads(cpu_threads)
-        if enable_mkldnn:
-            try:
-                # cache 10 different shapes for mkldnn to avoid memory leak
-                config.set_mkldnn_cache_capacity(10)
-                config.enable_mkldnn()
-                if enable_mkldnn_bfloat16:
-                    config.enable_mkldnn_bfloat16()
-            except Exception as e:
-                print(
-                    "The current environment does not support `mkldnn`, so disable mkldnn."
-                )
-                pass
-    precision_map = {
-        'trt_int8': Config.Precision.Int8,
-        'trt_fp32': Config.Precision.Float32,
-        'trt_fp16': Config.Precision.Half
-    }
-    if run_mode in precision_map.keys():
-        config.enable_tensorrt_engine(
-            workspace_size=(1 << 25) * batch_size,
-            max_batch_size=batch_size,
-            min_subgraph_size=min_subgraph_size,
-            precision_mode=precision_map[run_mode],
-            use_static=False,
-            use_calib_mode=trt_calib_mode)
-        if use_dynamic_shape:
-            min_input_shape = {
-                'image': [batch_size, 3, trt_min_shape, trt_min_shape]
-            }
-            max_input_shape = {
-                'image': [batch_size, 3, trt_max_shape, trt_max_shape]
-            }
-            opt_input_shape = {
-                'image': [batch_size, 3, trt_opt_shape, trt_opt_shape]
-            }
-            config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape,
-                                              opt_input_shape)
-            print('trt set dynamic shape done!')
-    # disable print log when predict
-    config.disable_glog_info()
-    # enable shared memory
-    config.enable_memory_optim()
-    # disable feed, fetch OP, needed by zero_copy_run
-    config.switch_use_feed_fetch_ops(False)
-    if delete_shuffle_pass:
-        config.delete_pass("shuffle_channel_detect_pass")
-    predictor = create_predictor(config)
-    return predictor
-def predict_image(predictor,
-                  image_file,
-                  image_shape=[640, 640],
-                  warmup=1,
-                  repeats=1,
-                  threshold=0.5,
-                  arch='YOLOv5'):
-    img, scale_factor = image_preprocess(image_file, image_shape)
-    inputs = {}
-    if arch == 'YOLOv5':
-        inputs['x2paddle_images'] = img
-    input_names = predictor.get_input_names()
-    for i in range(len(input_names)):
-        input_tensor = predictor.get_input_handle(input_names[i])
-        input_tensor.copy_from_cpu(inputs[input_names[i]])
-    for i in range(warmup):
-        predictor.run()
-    np_boxes = None
-    predict_time = 0.
-    time_min = float("inf")
-    time_max = float('-inf')
-    for i in range(repeats):
-        start_time = time.time()
-        predictor.run()
-        output_names = predictor.get_output_names()
-        boxes_tensor = predictor.get_output_handle(output_names[0])
-        np_boxes = boxes_tensor.copy_to_cpu()
-        end_time = time.time()
-        timed = end_time - start_time
-        time_min = min(time_min, timed)
-        time_max = max(time_max, timed)
-        predict_time += timed
-    time_avg = predict_time / repeats
-    print('Inference time(ms): min={}, max={}, avg={}'.format(
-        round(time_min * 1000, 2),
-        round(time_max * 1000, 1), round(time_avg * 1000, 1)))
-    postprocess = YOLOv5PostProcess(
-        score_threshold=0.001, nms_threshold=0.6, multi_label=True)
-    res = postprocess(np_boxes, scale_factor)
-    res_img = draw_box(
-        image_file, res['bbox'], CLASS_LABEL, threshold=threshold)
-    cv2.imwrite('result.jpg', res_img)
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        '--image_file', type=str, default=None, help="image path")
-    parser.add_argument(
-        '--model_path', type=str, help="inference model filepath")
-    parser.add_argument(
-        '--benchmark',
-        type=bool,
-        default=False,
-        help="Whether run benchmark or not.")
-    parser.add_argument(
-        '--run_mode',
-        type=str,
-        default='paddle',
-        help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
-    parser.add_argument(
-        '--device',
-        type=str,
-        default='GPU',
-        help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
-    )
-    parser.add_argument('--img_shape', type=int, default=640, help="input_size")
-    args = parser.parse_args()
-    predictor = load_predictor(
-        args.model_path, run_mode=args.run_mode, device=args.device)
-    warmup, repeats = 1, 1
-    if args.benchmark:
-        warmup, repeats = 50, 100
-    predict_image(
-        predictor,
-        args.image_file,
-        image_shape=[args.img_shape, args.img_shape],
-        warmup=warmup,
-        repeats=repeats)
--- a/example/auto_compression/pytorch_yolov5/post_process.py
+++ b/example/auto_compression/pytorch_yolov5/post_process.py
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import numpy as np
-import cv2
-def box_area(boxes):
-    """
-    Args:
-        boxes(np.ndarray): [N, 4]
-    return: [N]
-    """
-    return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
-def box_iou(box1, box2):
-    """
-    Args:
-        box1(np.ndarray): [N, 4]
-        box2(np.ndarray): [M, 4]
-    return: [N, M]
-    """
-    area1 = box_area(box1)
-    area2 = box_area(box2)
-    lt = np.maximum(box1[:, np.newaxis, :2], box2[:, :2])
-    rb = np.minimum(box1[:, np.newaxis, 2:], box2[:, 2:])
-    wh = rb - lt
-    wh = np.maximum(0, wh)
-    inter = wh[:, :, 0] * wh[:, :, 1]
-    iou = inter / (area1[:, np.newaxis] + area2 - inter)
-    return iou
-def nms(boxes, scores, iou_threshold):
-    """
-    Non Max Suppression numpy implementation.
-    args:
-        boxes(np.ndarray): [N, 4]
-        scores(np.ndarray): [N, 1]
-        iou_threshold(float): Threshold of IoU.
-    """
-    idxs = scores.argsort()
-    keep = []
-    while idxs.size > 0:
-        max_score_index = idxs[-1]
-        max_score_box = boxes[max_score_index][None, :]
-        keep.append(max_score_index)
-        if idxs.size == 1:
-            break
-        idxs = idxs[:-1]
-        other_boxes = boxes[idxs]
-        ious = box_iou(max_score_box, other_boxes)
-        idxs = idxs[ious[0] <= iou_threshold]
-    keep = np.array(keep)
-    return keep
-class YOLOv5PostProcess(object):
-    """
-    Post process of YOLOv5 network.
-    args:
-        score_threshold(float): Threshold to filter out bounding boxes with low 
-                confidence score. If not provided, consider all boxes.
-        nms_threshold(float): The threshold to be used in NMS.
-        multi_label(bool): Whether keep multi label in boxes.
-        keep_top_k(int): Number of total bboxes to be kept per image after NMS
-                step. -1 means keeping all bboxes after NMS step.
-    """
-    def __init__(self,
-                 score_threshold=0.25,
-                 nms_threshold=0.5,
-                 multi_label=False,
-                 keep_top_k=300):
-        self.score_threshold = score_threshold
-        self.nms_threshold = nms_threshold
-        self.multi_label = multi_label
-        self.keep_top_k = keep_top_k
-    def _xywh2xyxy(self, x):
-        # Convert from [x, y, w, h] to [x1, y1, x2, y2]
-        y = np.copy(x)
-        y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
-        y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
-        y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
-        y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
-        return y
-    def _non_max_suppression(self, prediction):
-        max_wh = 4096  # (pixels) minimum and maximum box width and height
-        nms_top_k = 30000
-        cand_boxes = prediction[..., 4] > self.score_threshold  # candidates
-        output = [np.zeros((0, 6))] * prediction.shape[0]
-        for batch_id, boxes in enumerate(prediction):
-            # Apply constraints
-            boxes = boxes[cand_boxes[batch_id]]
-            if not boxes.shape[0]:
-                continue
-            # Compute conf (conf = obj_conf * cls_conf)
-            boxes[:, 5:] *= boxes[:, 4:5]
-            # Box (center x, center y, width, height) to (x1, y1, x2, y2)
-            convert_box = self._xywh2xyxy(boxes[:, :4])
-            # Detections matrix nx6 (xyxy, conf, cls)
-            if self.multi_label:
-                i, j = (boxes[:, 5:] > self.score_threshold).nonzero()
-                boxes = np.concatenate(
-                    (convert_box[i], boxes[i, j + 5, None],
-                     j[:, None].astype(np.float32)),
-                    axis=1)
-            else:
-                conf = np.max(boxes[:, 5:], axis=1)
-                j = np.argmax(boxes[:, 5:], axis=1)
-                re = np.array(conf.reshape(-1) > self.score_threshold)
-                conf = conf.reshape(-1, 1)
-                j = j.reshape(-1, 1)
-                boxes = np.concatenate((convert_box, conf, j), axis=1)[re]
-            num_box = boxes.shape[0]
-            if not num_box:
-                continue
-            elif num_box > nms_top_k:
-                boxes = boxes[boxes[:, 4].argsort()[::-1][:nms_top_k]]
-            # Batched NMS
-            c = boxes[:, 5:6] * max_wh
-            clean_boxes, scores = boxes[:, :4] + c, boxes[:, 4]
-            keep = nms(clean_boxes, scores, self.nms_threshold)
-            # limit detection box num
-            if keep.shape[0] > self.keep_top_k:
-                keep = keep[:self.keep_top_k]
-            output[batch_id] = boxes[keep]
-        return output
-    def __call__(self, outs, scale_factor):
-        preds = self._non_max_suppression(outs)
-        bboxs, box_nums = [], []
-        for i, pred in enumerate(preds):
-            if len(pred.shape) > 2:
-                pred = np.squeeze(pred)
-            if len(pred.shape) == 1:
-                pred = pred[np.newaxis, :]
-            pred_bboxes = pred[:, :4]
-            scale_factor = np.tile(scale_factor[i][::-1], (1, 2))
-            pred_bboxes /= scale_factor
-            bbox = np.concatenate(
-                [
-                    pred[:, -1][:, np.newaxis], pred[:, -2][:, np.newaxis],
-                    pred_bboxes
-                ],
-                axis=-1)
-            bboxs.append(bbox)
-            box_num = bbox.shape[0]
-            box_nums.append(box_num)
-        bboxs = np.concatenate(bboxs, axis=0)
-        box_nums = np.array(box_nums)
-        return {'bbox': bboxs, 'bbox_num': box_nums}
--- a/example/auto_compression/pytorch_yolov5/run.py
+++ b/example/auto_compression/pytorch_yolov5/run.py
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import os
-import sys
-import numpy as np
-import argparse
-import paddle
-from ppdet.core.workspace import load_config, merge_config
-from ppdet.core.workspace import create
-from ppdet.metrics import COCOMetric, VOCMetric
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
-from paddleslim.auto_compression import AutoCompression
-from post_process import YOLOv5PostProcess
-def argsparser():
-    parser = argparse.ArgumentParser(description=__doc__)
-    parser.add_argument(
-        '--config_path',
-        type=str,
-        default=None,
-        help="path of compression strategy config.",
-        required=True)
-    parser.add_argument(
-        '--save_dir',
-        type=str,
-        default='output',
-        help="directory to save compressed model.")
-    parser.add_argument(
-        '--devices',
-        type=str,
-        default='gpu',
-        help="which device used to compress.")
-    return parser
-def reader_wrapper(reader, input_list):
-    def gen():
-        for data in reader:
-            in_dict = {}
-            if isinstance(input_list, list):
-                for input_name in input_list:
-                    in_dict[input_name] = data[input_name]
-            elif isinstance(input_list, dict):
-                for input_name in input_list.keys():
-                    in_dict[input_list[input_name]] = data[input_name]
-            yield in_dict
-    return gen
-def convert_numpy_data(data, metric):
-    data_all = {}
-    data_all = {k: np.array(v) for k, v in data.items()}
-    if isinstance(metric, VOCMetric):
-        for k, v in data_all.items():
-            if not isinstance(v[0], np.ndarray):
-                tmp_list = []
-                for t in v:
-                    tmp_list.append(np.array(t))
-                data_all[k] = np.array(tmp_list)
-    else:
-        data_all = {k: np.array(v) for k, v in data.items()}
-    return data_all
-def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
-    metric = global_config['metric']
-    for batch_id, data in enumerate(val_loader):
-        data_all = convert_numpy_data(data, metric)
-        data_input = {}
-        for k, v in data.items():
-            if isinstance(global_config['input_list'], list):
-                if k in test_feed_names:
-                    data_input[k] = np.array(v)
-            elif isinstance(global_config['input_list'], dict):
-                if k in global_config['input_list'].keys():
-                    data_input[global_config['input_list'][k]] = np.array(v)
-        outs = exe.run(compiled_test_program,
-                       feed=data_input,
-                       fetch_list=test_fetch_list,
-                       return_numpy=False)
-        res = {}
-        if 'arch' in global_config and global_config['arch'] == 'YOLOv5':
-            postprocess = YOLOv5PostProcess(
-                score_threshold=0.001, nms_threshold=0.6, multi_label=True)
-            res = postprocess(np.array(outs[0]), data_all['scale_factor'])
-        else:
-            for out in outs:
-                v = np.array(out)
-                if len(v.shape) > 1:
-                    res['bbox'] = v
-                else:
-                    res['bbox_num'] = v
-        metric.update(data_all, res)
-        if batch_id % 100 == 0:
-            print('Eval iter:', batch_id)
-    metric.accumulate()
-    metric.log()
-    map_res = metric.get_results()
-    metric.reset()
-    return map_res['bbox'][0]
-def main():
-    global global_config
-    all_config = load_slim_config(FLAGS.config_path)
-    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
-    global_config = all_config["Global"]
-    reader_cfg = load_config(global_config['reader_config'])
-    train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
-                                        reader_cfg['worker_num'],
-                                        return_list=True)
-    train_loader = reader_wrapper(train_loader, global_config['input_list'])
-    if 'Evaluation' in global_config.keys() and global_config[
-            'Evaluation'] and paddle.distributed.get_rank() == 0:
-        eval_func = eval_function
-        dataset = reader_cfg['EvalDataset']
-        global val_loader
-        _eval_batch_sampler = paddle.io.BatchSampler(
-            dataset, batch_size=reader_cfg['EvalReader']['batch_size'])
-        val_loader = create('EvalReader')(dataset,
-                                          reader_cfg['worker_num'],
-                                          batch_sampler=_eval_batch_sampler,
-                                          return_list=True)
-        metric = None
-        if reader_cfg['metric'] == 'COCO':
-            clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
-            anno_file = dataset.get_anno()
-            metric = COCOMetric(
-                anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
-        elif reader_cfg['metric'] == 'VOC':
-            metric = VOCMetric(
-                label_list=dataset.get_label_list(),
-                class_num=reader_cfg['num_classes'],
-                map_type=reader_cfg['map_type'])
-        else:
-            raise ValueError("metric currently only supports COCO and VOC.")
-        global_config['metric'] = metric
-    else:
-        eval_func = None
-    ac = AutoCompression(
-        model_dir=global_config["model_dir"],
-        model_filename=global_config["model_filename"],
-        params_filename=global_config["params_filename"],
-        save_dir=FLAGS.save_dir,
-        config=all_config,
-        train_dataloader=train_loader,
-        eval_callback=eval_func)
-    ac.compress()
-if __name__ == '__main__':
-    paddle.enable_static()
-    parser = argsparser()
-    FLAGS = parser.parse_args()
-    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
-    paddle.set_device(FLAGS.devices)
-    main()
--- a/example/auto_compression/pytorch_yolov6/README.md
+++ b/example/auto_compression/pytorch_yolov6/README.md
-# YOLOv6自动压缩示例
-目录：
- [1.简介](#1简介)
- [2.Benchmark](#2Benchmark)
- [3.开始自动压缩](#自动压缩流程)
-  - [3.1 环境准备](#31-准备环境)
-  - [3.2 准备数据集](#32-准备数据集)
-  - [3.3 准备预测模型](#33-准备预测模型)
-  - [3.4 测试模型精度](#34-测试模型精度)
-  - [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
-## 1. 简介
-飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨（PaddlePaddle）的预测模型。借助X2Paddle的能力，各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。
-本示例将以[meituan/YOLOv6](https://github.com/meituan/YOLOv6)目标检测模型为例，将PyTorch框架模型转换为Paddle框架模型，再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。
-## 2.Benchmark
-| 模型  |  策略  | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |  配置文件 | Inference模型  |
-| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
-| YOLOv6s |  Base模型 | 640*640  |  42.4   |   9.06ms  |   2.90ms   |  -  |  - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_infer.tar) |
-| YOLOv6s |  KL离线量化 | 640*640  |  30.3   |   - |   -   |  1.83ms  |  - | - |
-| YOLOv6s |  量化蒸馏训练 | 640*640  |  **41.3**   |   - |   -   |  **1.83ms**  |  [config](./configs/yolov6s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_quant.tar) |
-说明：
- mAP的指标均在COCO val2017数据集中评测得到。
- YOLOv6s模型在Tesla T4的GPU环境下开启TensorRT 8.4.1，batch_size=1， 测试脚本是[cpp_infer](./cpp_infer)。
-## 3. 自动压缩流程
-#### 3.1 准备环境
- PaddlePaddle >= 2.3 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
- PaddleSlim > 2.3版本
- PaddleDet >= 2.4
- [X2Paddle](https://github.com/PaddlePaddle/X2Paddle) >= 1.3.6
- opencv-python
-（1）安装paddlepaddle：
-```shell
-# CPU
-pip install paddlepaddle
-# GPU
-pip install paddlepaddle-gpu
-```
-（2）安装paddleslim：
-```shell
-pip install paddleslim
-```
-（3）安装paddledet：
-```shell
-pip install paddledet
-```
-注：安装PaddleDet的目的只是为了直接使用PaddleDetection中的Dataloader组件。
-（4）安装X2Paddle的1.3.6以上版本：
-```shell
-pip install x2paddle sympy onnx
-```
-#### 3.2 准备数据集
-本案例默认以COCO数据进行自动压缩实验，并且依赖PaddleDetection中数据读取模块，如果自定义COCO数据，或者其他格式数据，请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。
-如果已经准备好数据集，请直接修改[./configs/yolov6_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。
-#### 3.3 准备预测模型
-（1）准备ONNX模型：
-可通过[meituan/YOLOv6](https://github.com/meituan/YOLOv6)官方的[导出教程](https://github.com/meituan/YOLOv6/blob/main/deploy/ONNX/README.md)来准备ONNX模型。也可以下载已经准备好的[yolov6s.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov6s.onnx)。
-(2) 转换模型：
-```
-x2paddle --framework=onnx --model=yolov6s.onnx --save_dir=pd_model
-cp -r pd_model/inference_model/ yolov6s_infer
-```
-即可得到YOLOv6s模型的预测模型（`model.pdmodel` 和 `model.pdiparams`）。如想快速体验，可直接下载上方表格中YOLOv6s的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_infer.tar)。
-预测模型的格式为：`model.pdmodel` 和 `model.pdiparams`两个，带`pdmodel`的是模型文件，带`pdiparams`后缀的是权重文件。
-#### 3.4 自动压缩并产出模型
-蒸馏量化自动压缩示例通过run.py脚本启动，会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数，配置完成后便可对模型进行量化和蒸馏。具体运行命令为：
- 单卡训练：
-```
-export CUDA_VISIBLE_DEVICES=0
-python run.py --config_path=./configs/yolov6s_qat_dis.yaml --save_dir='./output/'
-```
- 多卡训练：
-```
-CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
-          --config_path=./configs/yolov6s_qat_dis.yaml --save_dir='./output/'
-```
-#### 3.5 测试模型精度
-修改[yolov6s_qat_dis.yaml](./configs/yolov6s_qat_dis.yaml)中`model_dir`字段为模型存储路径，然后使用eval.py脚本得到模型的mAP：
-```
-export CUDA_VISIBLE_DEVICES=0
-python eval.py --config_path=./configs/yolov6s_qat_dis.yaml
-```
-## 4.预测部署
-#### Paddle-TensorRT C++部署
-进入[cpp_infer](./cpp_infer)文件夹内，请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译，然后开始测试：
-```shell
-# 编译
-bash complie.sh
-# 执行
-./build/trt_run --model_file yolov6s_quant/model.pdmodel --params_file yolov6s_quant/model.pdiparams --run_mode=trt_int8
-```
-#### Paddle-TensorRT Python部署:
-首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)。
-然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署：
-```shell
-python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
-```
-## 5.FAQ
- 如果想测试离线量化模型精度，可执行：
-```shell
-python post_quant.py --config_path=./configs/yolov6s_qat_dis.yaml
-```
--- a/example/auto_compression/pytorch_yolov6/cpp_infer/CMakeLists.txt
+++ b/example/auto_compression/pytorch_yolov6/cpp_infer/CMakeLists.txt
-cmake_minimum_required(VERSION 3.0)
-project(cpp_inference_demo CXX C)
-option(WITH_MKL        "Compile demo with MKL/OpenBlas support, default use MKL."       ON)
-option(WITH_GPU        "Compile demo with GPU/CPU, default use CPU."                    OFF)
-option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static."   ON)
-option(USE_TENSORRT "Compile demo with TensorRT."   OFF)
-option(WITH_ROCM "Compile demo with rocm." OFF)
-option(WITH_ONNXRUNTIME "Compile demo with ONNXRuntime"  OFF)
-option(WITH_ARM "Compile demo with ARM" OFF)
-option(WITH_MIPS "Compile demo with MIPS" OFF)
-option(WITH_SW   "Compile demo with SW" OFF) 
-option(WITH_XPU "Compile demow ith xpu" OFF)
-option(WITH_NPU "Compile demow ith npu" OFF)
-if(NOT WITH_STATIC_LIB)
-  add_definitions("-DPADDLE_WITH_SHARED_LIB")
-else()
-  # PD_INFER_DECL is mainly used to set the dllimport/dllexport attribute in dynamic library mode. 
-  # Set it to empty in static library mode to avoid compilation issues.
-  add_definitions("/DPD_INFER_DECL=")
-endif()
-macro(safe_set_static_flag)
-    foreach(flag_var
-        CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
-        CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
-      if(${flag_var} MATCHES "/MD")
-        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
-      endif(${flag_var} MATCHES "/MD")
-    endforeach(flag_var)
-endmacro()
-if(NOT DEFINED PADDLE_LIB)
-  message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
-endif()
-if(NOT DEFINED DEMO_NAME)
-  message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
-endif()
-include_directories("${PADDLE_LIB}/")
-set(PADDLE_LIB_THIRD_PARTY_PATH "${PADDLE_LIB}/third_party/install/")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/include")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/lib")
-link_directories("${PADDLE_LIB}/paddle/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib")
-if (WIN32)
-  add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
-  option(MSVC_STATIC_CRT "use static C Runtime library by default" ON)
-  if (MSVC_STATIC_CRT)
-    if (WITH_MKL)
-      set(FLAG_OPENMP "/openmp")
-    endif()
-    set(CMAKE_C_FLAGS_DEBUG   "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
-    set(CMAKE_C_FLAGS_RELEASE  "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
-    set(CMAKE_CXX_FLAGS_DEBUG  "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
-    set(CMAKE_CXX_FLAGS_RELEASE   "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
-    safe_set_static_flag()
-    if (WITH_STATIC_LIB)
-      add_definitions(-DSTATIC_LIB)
-    endif()
-  endif()
-else()
-  if(WITH_MKL)
-    set(FLAG_OPENMP "-fopenmp")
-  endif()
-  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 ${FLAG_OPENMP}")
-endif()
-if(WITH_GPU)
-  if(NOT WIN32)
-    include_directories("/usr/local/cuda/include")
-    if(CUDA_LIB STREQUAL "")
-      set(CUDA_LIB "/usr/local/cuda/lib64/" CACHE STRING "CUDA Library")
-    endif()
-  else()
-    include_directories("C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\include")
-    if(CUDA_LIB STREQUAL "")
-      set(CUDA_LIB "C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\lib\\x64")
-    endif()
-  endif(NOT WIN32)
-endif()
-if (USE_TENSORRT AND WITH_GPU)
-  set(TENSORRT_ROOT "" CACHE STRING "The root directory of TensorRT library")
-  if("${TENSORRT_ROOT}" STREQUAL "")
-      message(FATAL_ERROR "The TENSORRT_ROOT is empty, you must assign it a value with CMake command. Such as: -DTENSORRT_ROOT=TENSORRT_ROOT_PATH ")
-  endif()
-  set(TENSORRT_INCLUDE_DIR ${TENSORRT_ROOT}/include)
-  set(TENSORRT_LIB_DIR ${TENSORRT_ROOT}/lib)
-  file(READ ${TENSORRT_INCLUDE_DIR}/NvInfer.h TENSORRT_VERSION_FILE_CONTENTS)
-  string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
-    "${TENSORRT_VERSION_FILE_CONTENTS}")
-  if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
-    file(READ ${TENSORRT_INCLUDE_DIR}/NvInferVersion.h TENSORRT_VERSION_FILE_CONTENTS)
-    string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
-      "${TENSORRT_VERSION_FILE_CONTENTS}")
-  endif()
-  if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
-    message(SEND_ERROR "Failed to detect TensorRT version.")
-  endif()
-  string(REGEX REPLACE "define NV_TENSORRT_MAJOR +([0-9]+)" "\\1"
-    TENSORRT_MAJOR_VERSION "${TENSORRT_MAJOR_VERSION}")
-  message(STATUS "Current TensorRT header is ${TENSORRT_INCLUDE_DIR}/NvInfer.h. "
-    "Current TensorRT version is v${TENSORRT_MAJOR_VERSION}. ")
-  include_directories("${TENSORRT_INCLUDE_DIR}")
-  link_directories("${TENSORRT_LIB_DIR}")
-endif()
-if(WITH_MKL)
-  set(MATH_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mklml")
-  include_directories("${MATH_LIB_PATH}/include")
-  if(WIN32)
-    set(MATH_LIB ${MATH_LIB_PATH}/lib/mklml${CMAKE_STATIC_LIBRARY_SUFFIX}
-                 ${MATH_LIB_PATH}/lib/libiomp5md${CMAKE_STATIC_LIBRARY_SUFFIX})
-  else()
-    set(MATH_LIB ${MATH_LIB_PATH}/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
-                 ${MATH_LIB_PATH}/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
-  endif()
-  set(MKLDNN_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mkldnn")
-  if(EXISTS ${MKLDNN_PATH})
-    include_directories("${MKLDNN_PATH}/include")
-    if(WIN32)
-      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
-    else(WIN32)
-      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
-    endif(WIN32)
-  endif()
-elseif((NOT WITH_MIPS) AND (NOT WITH_SW))
-  set(OPENBLAS_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}openblas")
-  include_directories("${OPENBLAS_LIB_PATH}/include/openblas")
-  if(WIN32)
-    set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/openblas${CMAKE_STATIC_LIBRARY_SUFFIX})
-  else()
-    set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
-  endif()
-endif()
-if(WITH_STATIC_LIB)
-  set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
-else()
-  if(WIN32)
-    set(DEPS ${PADDLE_LIB}/paddle/lib/paddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
-  else()
-    set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_SHARED_LIBRARY_SUFFIX})
-  endif()
-endif()
-if (WITH_ONNXRUNTIME)
-  if(WIN32)
-    set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.lib paddle2onnx)
-  elseif(APPLE)
-    set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.1.10.0.dylib paddle2onnx)
-  else()
-    set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.so.1.10.0 paddle2onnx)
-  endif()
-endif()
-if (NOT WIN32)
-  set(EXTERNAL_LIB "-lrt -ldl -lpthread")
-  set(DEPS ${DEPS}
-      ${MATH_LIB} ${MKLDNN_LIB}
-      glog gflags protobuf xxhash cryptopp
-      ${EXTERNAL_LIB})
-else()
-  set(DEPS ${DEPS}
-      ${MATH_LIB} ${MKLDNN_LIB}
-      glog gflags_static libprotobuf xxhash cryptopp-static ${EXTERNAL_LIB})
-  set(DEPS ${DEPS} shlwapi.lib)
-endif(NOT WIN32)
-if(WITH_GPU)
-  if(NOT WIN32)
-    if (USE_TENSORRT)
-      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
-      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
-    endif()
-    set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
-  else()
-    if(USE_TENSORRT)
-      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
-      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
-      if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
-        set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_STATIC_LIBRARY_SUFFIX})
-      endif()
-    endif()
-    set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
-    set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
-    set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX} )
-  endif()
-endif()
-if(WITH_ROCM AND NOT WIN32)
-  set(DEPS ${DEPS} ${ROCM_LIB}/libamdhip64${CMAKE_SHARED_LIBRARY_SUFFIX})
-endif()
-if(WITH_XPU AND NOT WIN32)
-  set(XPU_INSTALL_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}xpu")
-  set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpuapi${CMAKE_SHARED_LIBRARY_SUFFIX})
-  set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpurt${CMAKE_SHARED_LIBRARY_SUFFIX})
-endif()
-if(WITH_NPU AND NOT WIN32)
-  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libgraph${CMAKE_SHARED_LIBRARY_SUFFIX})
-  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libge_runner${CMAKE_SHARED_LIBRARY_SUFFIX})
-  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
-  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
-  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libacl_op_compiler${CMAKE_SHARED_LIBRARY_SUFFIX})
-endif()
-add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
-target_link_libraries(${DEMO_NAME} ${DEPS})
-if(WIN32)
-  if(USE_TENSORRT)
-    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
-            COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_SHARED_LIBRARY_SUFFIX}
-              ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
-            COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX}
-              ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
-    )
-    if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
-      add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
-              COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_SHARED_LIBRARY_SUFFIX}
-                ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE})
-    endif()
-  endif()
-  if(WITH_MKL)
-    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
-          COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/mklml.dll ${CMAKE_BINARY_DIR}/Release
-          COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/libiomp5md.dll ${CMAKE_BINARY_DIR}/Release
-          COMMAND ${CMAKE_COMMAND} -E copy ${MKLDNN_PATH}/lib/mkldnn.dll  ${CMAKE_BINARY_DIR}/Release
-    )
-  else()
-    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
-          COMMAND ${CMAKE_COMMAND} -E copy ${OPENBLAS_LIB_PATH}/lib/openblas.dll ${CMAKE_BINARY_DIR}/Release
-    )
-  endif()
-  if(WITH_ONNXRUNTIME)
-    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
-    COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.dll
-      ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
-    COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib/paddle2onnx.dll
-      ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
-    )
-  endif()
-  if(NOT WITH_STATIC_LIB)
-      add_custom_command(TARGET ${DEMO_NAME} POST_BUILD 
-        COMMAND ${CMAKE_COMMAND} -E copy "${PADDLE_LIB}/paddle/lib/paddle_inference.dll" ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
-      )
-  endif()
-endif()
--- a/example/auto_compression/pytorch_yolov6/cpp_infer/README.md
+++ b/example/auto_compression/pytorch_yolov6/cpp_infer/README.md
-# YOLOv6 TensorRT Benchmark测试（Linux）
-## 环境准备
- CUDA、CUDNN：确认环境中已经安装CUDA和CUDNN，并且提前获取其安装路径。
- TensorRT：可通过NVIDIA官网下载[TensorRT 8.4.1.5](https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.4.1/tars/tensorrt-8.4.1.5.linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz)或其他版本安装包。
- Paddle Inference C++预测库：编译develop版本请参考[编译文档](https://www.paddlepaddle.org.cn/inference/user_guides/source_compile.html)。编译完成后，会在build目录下生成`paddle_inference_install_dir`文件夹，这个就是我们需要的C++预测库文件。
-## 编译可执行程序
- (1)修改`compile.sh`中依赖库路径，主要是以下内容：
-```shell
-# Paddle Inference预测库路径
-LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
-# CUDNN路径
-CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
-# CUDA路径
-CUDA_LIB=/usr/local/cuda/lib64
-# TensorRT安装包路径，为TRT资源包解压完成后的绝对路径，其中包含`lib`和`include`文件夹
-TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
-```
-## 测试
- FP32
-```
-./build/trt_run --model_file yolov6s_infer/model.pdmodel --params_file yolov6s_infer/model.pdiparams --run_mode=trt_fp32
-```
- FP16
-```
-./build/trt_run --model_file yolov6s_infer/model.pdmodel --params_file yolov6s_infer/model.pdiparams --run_mode=trt_fp16
-```
- INT8
-```
-./build/trt_run --model_file yolov6s_quant/model.pdmodel --params_file yolov6s_quant/model.pdiparams --run_mode=trt_int8
-```
-## 性能对比
-| 模型  | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |
-| :-------- |:-------- |:--------: | :---------------------: |
-| YOLOv6s |   9.06ms  |   2.90ms   |  1.83ms  |
-环境：
- Tesla T4，TensorRT 8.4.1，CUDA 11.2
- batch_size=1
--- a/example/auto_compression/pytorch_yolov6/cpp_infer/compile.sh
+++ b/example/auto_compression/pytorch_yolov6/cpp_infer/compile.sh
-#!/bin/bash
-set +x
-set -e
-work_path=$(dirname $(readlink -f $0))
-mkdir -p build
-cd build
-rm -rf *
-DEMO_NAME=trt_run
-WITH_MKL=ON
-WITH_GPU=ON
-USE_TENSORRT=ON
-LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
-CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
-CUDA_LIB=/usr/local/cuda/lib64
-TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
-WITH_ROCM=OFF
-ROCM_LIB=/opt/rocm/lib
-cmake .. -DPADDLE_LIB=${LIB_DIR} \
-  -DWITH_MKL=${WITH_MKL} \
-  -DDEMO_NAME=${DEMO_NAME} \
-  -DWITH_GPU=${WITH_GPU} \
-  -DWITH_STATIC_LIB=OFF \
-  -DUSE_TENSORRT=${USE_TENSORRT} \
-  -DWITH_ROCM=${WITH_ROCM} \
-  -DROCM_LIB=${ROCM_LIB} \
-  -DCUDNN_LIB=${CUDNN_LIB} \
-  -DCUDA_LIB=${CUDA_LIB} \
-  -DTENSORRT_ROOT=${TENSORRT_ROOT}
-make -j
--- a/example/auto_compression/pytorch_yolov6/cpp_infer/trt_run.cc
+++ b/example/auto_compression/pytorch_yolov6/cpp_infer/trt_run.cc
-#include <chrono>
-#include <iostream>
-#include <memory>
-#include <numeric>
-#include <gflags/gflags.h>
-#include <glog/logging.h>
-#include <cuda_runtime.h>
-#include "paddle/include/paddle_inference_api.h"
-#include "paddle/include/experimental/phi/common/float16.h"
-using paddle_infer::Config;
-using paddle_infer::Predictor;
-using paddle_infer::CreatePredictor;
-using paddle_infer::PrecisionType;
-using phi::dtype::float16;
-DEFINE_string(model_dir, "", "Directory of the inference model.");
-DEFINE_string(model_file, "", "Path of the inference model file.");
-DEFINE_string(params_file, "", "Path of the inference params file.");
-DEFINE_string(run_mode, "trt_fp32", "run_mode which can be: trt_fp32, trt_fp16 and trt_int8");
-DEFINE_int32(batch_size, 1, "Batch size.");
-DEFINE_int32(gpu_id, 0, "GPU card ID num.");
-DEFINE_int32(trt_min_subgraph_size, 3, "tensorrt min_subgraph_size");
-DEFINE_int32(warmup, 50, "warmup");
-DEFINE_int32(repeats, 1000, "repeats");
-using Time = decltype(std::chrono::high_resolution_clock::now());
-Time time() { return std::chrono::high_resolution_clock::now(); };
-double time_diff(Time t1, Time t2) {
-  typedef std::chrono::microseconds ms;
-  auto diff = t2 - t1;
-  ms counter = std::chrono::duration_cast<ms>(diff);
-  return counter.count() / 1000.0;
-}
-std::shared_ptr<Predictor> InitPredictor() {
-  Config config;
-  std::string model_path;
-  if (FLAGS_model_dir != "") {
-    config.SetModel(FLAGS_model_dir);
-    model_path = FLAGS_model_dir.substr(0, FLAGS_model_dir.find_last_of("/"));
-  } else {
-    config.SetModel(FLAGS_model_file, FLAGS_params_file);
-    model_path = FLAGS_model_file.substr(0, FLAGS_model_file.find_last_of("/"));
-  }
-  // enable tune
-  std::cout << "model_path: " << model_path << std::endl;
-  config.EnableUseGpu(256, FLAGS_gpu_id);
-  if (FLAGS_run_mode == "trt_fp32") {
-    config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
-                                PrecisionType::kFloat32, false, false);
-  } else if (FLAGS_run_mode == "trt_fp16") {
-    config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
-                                PrecisionType::kHalf, false, false);
-  } else if (FLAGS_run_mode == "trt_int8") {
-    config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
-                                PrecisionType::kInt8, false, false);
-  }
-  config.EnableMemoryOptim();
-  config.SwitchIrOptim(true);
-  return CreatePredictor(config);
-}
-template <typename type>
-void run(Predictor *predictor, const std::vector<type> &input,
-         const std::vector<int> &input_shape, type* out_data, std::vector<int> out_shape) {
-    // prepare input
-    int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1,
-                                    std::multiplies<int>());
-    auto input_names = predictor->GetInputNames();
-    auto input_t = predictor->GetInputHandle(input_names[0]);
-    input_t->Reshape(input_shape);
-    input_t->CopyFromCpu(input.data());
-  for (int i = 0; i < FLAGS_warmup; ++i)
-    CHECK(predictor->Run());
-  auto st = time();
-  for (int i = 0; i < FLAGS_repeats; ++i) {
-    auto input_names = predictor->GetInputNames();
-    auto input_t = predictor->GetInputHandle(input_names[0]);
-    input_t->Reshape(input_shape);
-    input_t->CopyFromCpu(input.data());
-    CHECK(predictor->Run());
-    auto output_names = predictor->GetOutputNames();
-    auto output_t = predictor->GetOutputHandle(output_names[0]);
-    std::vector<int> output_shape = output_t->shape();
-    output_t -> ShareExternalData<type>(out_data, out_shape, paddle_infer::PlaceType::kGPU);
-  }
-  LOG(INFO) << "[" << FLAGS_run_mode << " bs-" << FLAGS_batch_size << " ] run avg time is " << time_diff(st, time()) / FLAGS_repeats
-            << " ms";
-}
-int main(int argc, char *argv[]) {
-  google::ParseCommandLineFlags(&argc, &argv, true);
-  auto predictor = InitPredictor();
-  std::vector<int> input_shape = {FLAGS_batch_size, 3, 640, 640};
-  // float16
-  using dtype = float16;
-  std::vector<dtype> input_data(FLAGS_batch_size * 3 * 640 * 640, dtype(1.0));
-  dtype *out_data;
-  int out_data_size = FLAGS_batch_size * 8400 * 85;
-  cudaHostAlloc((void**)&out_data, sizeof(float) * out_data_size, cudaHostAllocMapped);
-  std::vector<int> out_shape{ FLAGS_batch_size, 1, 8400, 85};
-  run<dtype>(predictor.get(), input_data, input_shape, out_data, out_shape);
-  return 0;
-}
--- a/example/auto_compression/pytorch_yolov6/images/000000570688.jpg
+++ b/example/auto_compression/pytorch_yolov6/images/000000570688.jpg
--- a/example/auto_compression/pytorch_yolov6/paddle_trt_infer.py
+++ b/example/auto_compression/pytorch_yolov6/paddle_trt_infer.py
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import os
-import cv2
-import numpy as np
-import argparse
-import time
-from paddle.inference import Config
-from paddle.inference import create_predictor
-from post_process import YOLOv6PostProcess
-CLASS_LABEL = [
-    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
-    'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
-    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
-    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
-    'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
-    'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
-    'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
-    'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
-    'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
-    'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
-    'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
-    'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
-    'hair drier', 'toothbrush'
-]
-def generate_scale(im, target_shape, keep_ratio=True):
-    """
-        Args:
-            im (np.ndarray): image (np.ndarray)
-        Returns:
-            im_scale_x: the resize ratio of X
-            im_scale_y: the resize ratio of Y
-        """
-    origin_shape = im.shape[:2]
-    if keep_ratio:
-        im_size_min = np.min(origin_shape)
-        im_size_max = np.max(origin_shape)
-        target_size_min = np.min(target_shape)
-        target_size_max = np.max(target_shape)
-        im_scale = float(target_size_min) / float(im_size_min)
-        if np.round(im_scale * im_size_max) > target_size_max:
-            im_scale = float(target_size_max) / float(im_size_max)
-        im_scale_x = im_scale
-        im_scale_y = im_scale
-    else:
-        resize_h, resize_w = target_shape
-        im_scale_y = resize_h / float(origin_shape[0])
-        im_scale_x = resize_w / float(origin_shape[1])
-    return im_scale_y, im_scale_x
-def image_preprocess(img_path, target_shape):
-    img = cv2.imread(img_path)
-    # Resize
-    im_scale_y, im_scale_x = generate_scale(img, target_shape)
-    img = cv2.resize(
-        img,
-        None,
-        None,
-        fx=im_scale_x,
-        fy=im_scale_y,
-        interpolation=cv2.INTER_LINEAR)
-    # Pad
-    im_h, im_w = img.shape[:2]
-    h, w = target_shape[:]
-    if h != im_h or w != im_w:
-        canvas = np.ones((h, w, 3), dtype=np.float32)
-        canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
-        canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
-        img = canvas
-    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-    img = np.transpose(img, [2, 0, 1]) / 255
-    img = np.expand_dims(img, 0)
-    scale_factor = np.array([[im_scale_y, im_scale_x]])
-    return img.astype(np.float32), scale_factor
-def get_color_map_list(num_classes):
-    color_map = num_classes * [0, 0, 0]
-    for i in range(0, num_classes):
-        j = 0
-        lab = i
-        while lab:
-            color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
-            color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
-            color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
-            j += 1
-            lab >>= 3
-    color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
-    return color_map
-def draw_box(image_file, results, class_label, threshold=0.5):
-    srcimg = cv2.imread(image_file, 1)
-    for i in range(len(results)):
-        color_list = get_color_map_list(len(class_label))
-        clsid2color = {}
-        classid, conf = int(results[i, 0]), results[i, 1]
-        if conf < threshold:
-            continue
-        xmin, ymin, xmax, ymax = int(results[i, 2]), int(results[i, 3]), int(
-            results[i, 4]), int(results[i, 5])
-        if classid not in clsid2color:
-            clsid2color[classid] = color_list[classid]
-        color = tuple(clsid2color[classid])
-        cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2)
-        print(class_label[classid] + ': ' + str(round(conf, 3)))
-        cv2.putText(
-            srcimg,
-            class_label[classid] + ':' + str(round(conf, 3)), (xmin, ymin - 10),
-            cv2.FONT_HERSHEY_SIMPLEX,
-            0.8, (0, 255, 0),
-            thickness=2)
-    return srcimg
-def load_predictor(model_dir,
-                   run_mode='paddle',
-                   batch_size=1,
-                   device='CPU',
-                   min_subgraph_size=3,
-                   use_dynamic_shape=False,
-                   trt_min_shape=1,
-                   trt_max_shape=1280,
-                   trt_opt_shape=640,
-                   trt_calib_mode=False,
-                   cpu_threads=1,
-                   enable_mkldnn=False,
-                   enable_mkldnn_bfloat16=False,
-                   delete_shuffle_pass=False):
-    """set AnalysisConfig, generate AnalysisPredictor
-    Args:
-        model_dir (str): root path of __model__ and __params__
-        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
-        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8)
-        use_dynamic_shape (bool): use dynamic shape or not
-        trt_min_shape (int): min shape for dynamic shape in trt
-        trt_max_shape (int): max shape for dynamic shape in trt
-        trt_opt_shape (int): opt shape for dynamic shape in trt
-        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
-            calibration, trt_calib_mode need to set True
-        delete_shuffle_pass (bool): whether to remove shuffle_channel_detect_pass in TensorRT. 
-                                    Used by action model.
-    Returns:
-        predictor (PaddlePredictor): AnalysisPredictor
-    Raises:
-        ValueError: predict by TensorRT need device == 'GPU'.
-    """
-    if device != 'GPU' and run_mode != 'paddle':
-        raise ValueError(
-            "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
-            .format(run_mode, device))
-    config = Config(
-        os.path.join(model_dir, 'model.pdmodel'),
-        os.path.join(model_dir, 'model.pdiparams'))
-    if device == 'GPU':
-        # initial GPU memory(M), device ID
-        config.enable_use_gpu(200, 0)
-        # optimize graph and fuse op
-        config.switch_ir_optim(True)
-    elif device == 'XPU':
-        config.enable_lite_engine()
-        config.enable_xpu(10 * 1024 * 1024)
-    else:
-        config.disable_gpu()
-        config.set_cpu_math_library_num_threads(cpu_threads)
-        if enable_mkldnn:
-            try:
-                # cache 10 different shapes for mkldnn to avoid memory leak
-                config.set_mkldnn_cache_capacity(10)
-                config.enable_mkldnn()
-                if enable_mkldnn_bfloat16:
-                    config.enable_mkldnn_bfloat16()
-            except Exception as e:
-                print(
-                    "The current environment does not support `mkldnn`, so disable mkldnn."
-                )
-                pass
-    precision_map = {
-        'trt_int8': Config.Precision.Int8,
-        'trt_fp32': Config.Precision.Float32,
-        'trt_fp16': Config.Precision.Half
-    }
-    if run_mode in precision_map.keys():
-        config.enable_tensorrt_engine(
-            workspace_size=(1 << 25) * batch_size,
-            max_batch_size=batch_size,
-            min_subgraph_size=min_subgraph_size,
-            precision_mode=precision_map[run_mode],
-            use_static=False,
-            use_calib_mode=trt_calib_mode)
-        if use_dynamic_shape:
-            min_input_shape = {
-                'image': [batch_size, 3, trt_min_shape, trt_min_shape]
-            }
-            max_input_shape = {
-                'image': [batch_size, 3, trt_max_shape, trt_max_shape]
-            }
-            opt_input_shape = {
-                'image': [batch_size, 3, trt_opt_shape, trt_opt_shape]
-            }
-            config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape,
-                                              opt_input_shape)
-            print('trt set dynamic shape done!')
-    # disable print log when predict
-    config.disable_glog_info()
-    # enable shared memory
-    config.enable_memory_optim()
-    # disable feed, fetch OP, needed by zero_copy_run
-    config.switch_use_feed_fetch_ops(False)
-    if delete_shuffle_pass:
-        config.delete_pass("shuffle_channel_detect_pass")
-    predictor = create_predictor(config)
-    return predictor
-def predict_image(predictor,
-                  image_file,
-                  image_shape=[640, 640],
-                  warmup=1,
-                  repeats=1,
-                  threshold=0.5,
-                  arch='YOLOv5'):
-    img, scale_factor = image_preprocess(image_file, image_shape)
-    inputs = {}
-    if arch == 'YOLOv5':
-        inputs['x2paddle_images'] = img
-    input_names = predictor.get_input_names()
-    for i in range(len(input_names)):
-        input_tensor = predictor.get_input_handle(input_names[i])
-        input_tensor.copy_from_cpu(inputs[input_names[i]])
-    for i in range(warmup):
-        predictor.run()
-    np_boxes = None
-    predict_time = 0.
-    time_min = float("inf")
-    time_max = float('-inf')
-    for i in range(repeats):
-        start_time = time.time()
-        predictor.run()
-        output_names = predictor.get_output_names()
-        boxes_tensor = predictor.get_output_handle(output_names[0])
-        np_boxes = boxes_tensor.copy_to_cpu()
-        end_time = time.time()
-        timed = end_time - start_time
-        time_min = min(time_min, timed)
-        time_max = max(time_max, timed)
-        predict_time += timed
-    time_avg = predict_time / repeats
-    print('Inference time(ms): min={}, max={}, avg={}'.format(
-        round(time_min * 1000, 2),
-        round(time_max * 1000, 1), round(time_avg * 1000, 1)))
-    postprocess = YOLOv6PostProcess(
-        score_threshold=0.001, nms_threshold=0.65, multi_label=True)
-    res = postprocess(np_boxes, scale_factor)
-    res_img = draw_box(
-        image_file, res['bbox'], CLASS_LABEL, threshold=threshold)
-    cv2.imwrite('result.jpg', res_img)
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        '--image_file', type=str, default=None, help="image path")
-    parser.add_argument(
-        '--model_path', type=str, help="inference model filepath")
-    parser.add_argument(
-        '--benchmark',
-        type=bool,
-        default=False,
-        help="Whether run benchmark or not.")
-    parser.add_argument(
-        '--run_mode',
-        type=str,
-        default='paddle',
-        help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
-    parser.add_argument(
-        '--device',
-        type=str,
-        default='GPU',
-        help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
-    )
-    parser.add_argument('--img_shape', type=int, default=640, help="input_size")
-    args = parser.parse_args()
-    predictor = load_predictor(
-        args.model_path, run_mode=args.run_mode, device=args.device)
-    warmup, repeats = 1, 1
-    if args.benchmark:
-        warmup, repeats = 50, 100
-    predict_image(
-        predictor,
-        args.image_file,
-        image_shape=[args.img_shape, args.img_shape],
-        warmup=warmup,
-        repeats=repeats)
--- a/example/auto_compression/pytorch_yolov7/README.md
+++ b/example/auto_compression/pytorch_yolov7/README.md
-# YOLOv7自动压缩示例
-目录：
- [1.简介](#1简介)
- [2.Benchmark](#2Benchmark)
- [3.开始自动压缩](#自动压缩流程)
-  - [3.1 环境准备](#31-准备环境)
-  - [3.2 准备数据集](#32-准备数据集)
-  - [3.3 准备预测模型](#33-准备预测模型)
-  - [3.4 测试模型精度](#34-测试模型精度)
-  - [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
-## 1. 简介
-飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨（PaddlePaddle）的预测模型。借助X2Paddle的能力，各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。
-本示例将以[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)目标检测模型为例，将PyTorch框架模型转换为Paddle框架模型，再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。
-## 2.Benchmark
-| 模型  |  策略  | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |  配置文件 | Inference模型  |
-| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
-| YOLOv7 |  Base模型 | 640*640  |  51.1   |   26.84ms  |   7.44ms   |  -  |  - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_infer.tar) |
-| YOLOv7 |  KL离线量化 | 640*640  |  50.2   |   - |   -   |  4.55ms  |  - | - |
-| YOLOv7 |  量化蒸馏训练 | 640*640  |  **50.8**   |   - |   -   |  **4.55ms**  |  [config](./configs/yolov7_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar) |
-说明：
- mAP的指标均在COCO val2017数据集中评测得到。
- YOLOv7模型在Tesla T4的GPU环境下开启TensorRT 8.4.1，batch_size=1， 测试脚本是[cpp_infer](./cpp_infer)。
-## 3. 自动压缩流程
-#### 3.1 准备环境
- PaddlePaddle >= 2.3 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
- PaddleSlim > 2.3版本
- PaddleDet >= 2.4
- [X2Paddle](https://github.com/PaddlePaddle/X2Paddle) >= 1.3.6
- opencv-python
-（1）安装paddlepaddle：
-```shell
-# CPU
-pip install paddlepaddle
-# GPU
-pip install paddlepaddle-gpu
-```
-（2）安装paddleslim：
-```shell
-pip install paddleslim
-```
-（3）安装paddledet：
-```shell
-pip install paddledet
-```
-注：安装PaddleDet的目的只是为了直接使用PaddleDetection中的Dataloader组件。
-（4）安装X2Paddle的1.3.6以上版本：
-```shell
-pip install x2paddle sympy onnx
-```
-#### 3.2 准备数据集
-本案例默认以COCO数据进行自动压缩实验，并且依赖PaddleDetection中数据读取模块，如果自定义COCO数据，或者其他格式数据，请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。
-如果已经准备好数据集，请直接修改[./configs/yolov7_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。
-#### 3.3 准备预测模型
-（1）准备ONNX模型：
-可通过[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)的导出脚本来准备ONNX模型，具体步骤如下：
-```shell
-git clone https://github.com/WongKinYiu/yolov7.git
-# 切换分支到u5分支，保持导出的ONNX模型后处理和YOLOv5一致
-git checkout u5
-# 下载好yolov7.pt权重后执行：
-python export.py --weights yolov7.pt --include onnx
-```
-也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx)。
-(2) 转换模型：
-```
-x2paddle --framework=onnx --model=yolov7.onnx --save_dir=pd_model
-cp -r pd_model/inference_model/ yolov7_infer
-```
-即可得到YOLOv7模型的预测模型（`model.pdmodel` 和 `model.pdiparams`）。如想快速体验，可直接下载上方表格中YOLOv7的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_infer.tar)。
-预测模型的格式为：`model.pdmodel` 和 `model.pdiparams`两个，带`pdmodel`的是模型文件，带`pdiparams`后缀的是权重文件。
-#### 3.4 自动压缩并产出模型
-蒸馏量化自动压缩示例通过run.py脚本启动，会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数，配置完成后便可对模型进行量化和蒸馏。具体运行命令为：
- 单卡训练：
-```
-export CUDA_VISIBLE_DEVICES=0
-python run.py --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
-```
- 多卡训练：
-```
-CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
-          --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
-```
-#### 3.5 测试模型精度
-修改[yolov7_qat_dis.yaml](./configs/yolov7_qat_dis.yaml)中`model_dir`字段为模型存储路径，然后使用eval.py脚本得到模型的mAP：
-```
-export CUDA_VISIBLE_DEVICES=0
-python eval.py --config_path=./configs/yolov7_qat_dis.yaml
-```
-## 4.预测部署
-#### Paddle-TensorRT C++部署
-进入[cpp_infer](./cpp_infer)文件夹内，请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译，然后开始测试：
-```shell
-# 编译
-bash complie.sh
-# 执行
-./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
-```
-#### Paddle-TensorRT Python部署:
-首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)。
-然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署：
-```shell
-python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
-```
-## 5.FAQ
- 如果想测试离线量化模型精度，可执行：
-```shell
-python post_quant.py --config_path=./configs/yolov7_qat_dis.yaml
-```
--- a/example/auto_compression/pytorch_yolov7/configs/yolov7_reader.yaml
+++ b/example/auto_compression/pytorch_yolov7/configs/yolov7_reader.yaml
-metric: COCO
-num_classes: 80
-# Datset configuration
-TrainDataset:
-  !COCODataSet
-    image_dir: train2017
-    anno_path: annotations/instances_train2017.json
-    dataset_dir: dataset/coco/
-EvalDataset:
-  !COCODataSet
-    image_dir: val2017
-    anno_path: annotations/instances_val2017.json
-    dataset_dir: dataset/coco/
-worker_num: 0
-# preprocess reader in test
-EvalReader:
-  sample_transforms:
-    - Decode: {}
-    - Resize: {target_size: [640, 640], keep_ratio: True}
-    - Pad: {size: [640, 640], fill_value: [114., 114., 114.]}
-    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
-    - Permute: {}
-  batch_size: 1
--- a/example/auto_compression/pytorch_yolov7/cpp_infer/CMakeLists.txt
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/CMakeLists.txt
-cmake_minimum_required(VERSION 3.0)
-project(cpp_inference_demo CXX C)
-option(WITH_MKL        "Compile demo with MKL/OpenBlas support, default use MKL."       ON)
-option(WITH_GPU        "Compile demo with GPU/CPU, default use CPU."                    OFF)
-option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static."   ON)
-option(USE_TENSORRT "Compile demo with TensorRT."   OFF)
-option(WITH_ROCM "Compile demo with rocm." OFF)
-option(WITH_ONNXRUNTIME "Compile demo with ONNXRuntime"  OFF)
-option(WITH_ARM "Compile demo with ARM" OFF)
-option(WITH_MIPS "Compile demo with MIPS" OFF)
-option(WITH_SW   "Compile demo with SW" OFF) 
-option(WITH_XPU "Compile demow ith xpu" OFF)
-option(WITH_NPU "Compile demow ith npu" OFF)
-if(NOT WITH_STATIC_LIB)
-  add_definitions("-DPADDLE_WITH_SHARED_LIB")
-else()
-  # PD_INFER_DECL is mainly used to set the dllimport/dllexport attribute in dynamic library mode. 
-  # Set it to empty in static library mode to avoid compilation issues.
-  add_definitions("/DPD_INFER_DECL=")
-endif()
-macro(safe_set_static_flag)
-    foreach(flag_var
-        CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
-        CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
-      if(${flag_var} MATCHES "/MD")
-        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
-      endif(${flag_var} MATCHES "/MD")
-    endforeach(flag_var)
-endmacro()
-if(NOT DEFINED PADDLE_LIB)
-  message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
-endif()
-if(NOT DEFINED DEMO_NAME)
-  message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
-endif()
-include_directories("${PADDLE_LIB}/")
-set(PADDLE_LIB_THIRD_PARTY_PATH "${PADDLE_LIB}/third_party/install/")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/include")
-include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/include")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/lib")
-link_directories("${PADDLE_LIB}/paddle/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib")
-link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib")
-if (WIN32)
-  add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
-  option(MSVC_STATIC_CRT "use static C Runtime library by default" ON)
-  if (MSVC_STATIC_CRT)
-    if (WITH_MKL)
-      set(FLAG_OPENMP "/openmp")
-    endif()
-    set(CMAKE_C_FLAGS_DEBUG   "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
-    set(CMAKE_C_FLAGS_RELEASE  "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
-    set(CMAKE_CXX_FLAGS_DEBUG  "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
-    set(CMAKE_CXX_FLAGS_RELEASE   "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
-    safe_set_static_flag()
-    if (WITH_STATIC_LIB)
-      add_definitions(-DSTATIC_LIB)
-    endif()
-  endif()
-else()
-  if(WITH_MKL)
-    set(FLAG_OPENMP "-fopenmp")
-  endif()
-  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 ${FLAG_OPENMP}")
-endif()
-if(WITH_GPU)
-  if(NOT WIN32)
-    include_directories("/usr/local/cuda/include")
-    if(CUDA_LIB STREQUAL "")
-      set(CUDA_LIB "/usr/local/cuda/lib64/" CACHE STRING "CUDA Library")
-    endif()
-  else()
-    include_directories("C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\include")
-    if(CUDA_LIB STREQUAL "")
-      set(CUDA_LIB "C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\lib\\x64")
-    endif()
-  endif(NOT WIN32)
-endif()
-if (USE_TENSORRT AND WITH_GPU)
-  set(TENSORRT_ROOT "" CACHE STRING "The root directory of TensorRT library")
-  if("${TENSORRT_ROOT}" STREQUAL "")
-      message(FATAL_ERROR "The TENSORRT_ROOT is empty, you must assign it a value with CMake command. Such as: -DTENSORRT_ROOT=TENSORRT_ROOT_PATH ")
-  endif()
-  set(TENSORRT_INCLUDE_DIR ${TENSORRT_ROOT}/include)
-  set(TENSORRT_LIB_DIR ${TENSORRT_ROOT}/lib)
-  file(READ ${TENSORRT_INCLUDE_DIR}/NvInfer.h TENSORRT_VERSION_FILE_CONTENTS)
-  string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
-    "${TENSORRT_VERSION_FILE_CONTENTS}")
-  if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
-    file(READ ${TENSORRT_INCLUDE_DIR}/NvInferVersion.h TENSORRT_VERSION_FILE_CONTENTS)
-    string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
-      "${TENSORRT_VERSION_FILE_CONTENTS}")
-  endif()
-  if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
-    message(SEND_ERROR "Failed to detect TensorRT version.")
-  endif()
-  string(REGEX REPLACE "define NV_TENSORRT_MAJOR +([0-9]+)" "\\1"
-    TENSORRT_MAJOR_VERSION "${TENSORRT_MAJOR_VERSION}")
-  message(STATUS "Current TensorRT header is ${TENSORRT_INCLUDE_DIR}/NvInfer.h. "
-    "Current TensorRT version is v${TENSORRT_MAJOR_VERSION}. ")
-  include_directories("${TENSORRT_INCLUDE_DIR}")
-  link_directories("${TENSORRT_LIB_DIR}")
-endif()
-if(WITH_MKL)
-  set(MATH_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mklml")
-  include_directories("${MATH_LIB_PATH}/include")
-  if(WIN32)
-    set(MATH_LIB ${MATH_LIB_PATH}/lib/mklml${CMAKE_STATIC_LIBRARY_SUFFIX}
-                 ${MATH_LIB_PATH}/lib/libiomp5md${CMAKE_STATIC_LIBRARY_SUFFIX})
-  else()
-    set(MATH_LIB ${MATH_LIB_PATH}/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
-                 ${MATH_LIB_PATH}/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
-  endif()
-  set(MKLDNN_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mkldnn")
-  if(EXISTS ${MKLDNN_PATH})
-    include_directories("${MKLDNN_PATH}/include")
-    if(WIN32)
-      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
-    else(WIN32)
-      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
-    endif(WIN32)
-  endif()
-elseif((NOT WITH_MIPS) AND (NOT WITH_SW))
-  set(OPENBLAS_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}openblas")
-  include_directories("${OPENBLAS_LIB_PATH}/include/openblas")
-  if(WIN32)
-    set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/openblas${CMAKE_STATIC_LIBRARY_SUFFIX})
-  else()
-    set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
-  endif()
-endif()
-if(WITH_STATIC_LIB)
-  set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
-else()
-  if(WIN32)
-    set(DEPS ${PADDLE_LIB}/paddle/lib/paddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
-  else()
-    set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_SHARED_LIBRARY_SUFFIX})
-  endif()
-endif()
-if (WITH_ONNXRUNTIME)
-  if(WIN32)
-    set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.lib paddle2onnx)
-  elseif(APPLE)
-    set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.1.10.0.dylib paddle2onnx)
-  else()
-    set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.so.1.10.0 paddle2onnx)
-  endif()
-endif()
-if (NOT WIN32)
-  set(EXTERNAL_LIB "-lrt -ldl -lpthread")
-  set(DEPS ${DEPS}
-      ${MATH_LIB} ${MKLDNN_LIB}
-      glog gflags protobuf xxhash cryptopp
-      ${EXTERNAL_LIB})
-else()
-  set(DEPS ${DEPS}
-      ${MATH_LIB} ${MKLDNN_LIB}
-      glog gflags_static libprotobuf xxhash cryptopp-static ${EXTERNAL_LIB})
-  set(DEPS ${DEPS} shlwapi.lib)
-endif(NOT WIN32)
-if(WITH_GPU)
-  if(NOT WIN32)
-    if (USE_TENSORRT)
-      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
-      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
-    endif()
-    set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
-  else()
-    if(USE_TENSORRT)
-      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
-      set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
-      if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
-        set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_STATIC_LIBRARY_SUFFIX})
-      endif()
-    endif()
-    set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
-    set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
-    set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX} )
-  endif()
-endif()
-if(WITH_ROCM AND NOT WIN32)
-  set(DEPS ${DEPS} ${ROCM_LIB}/libamdhip64${CMAKE_SHARED_LIBRARY_SUFFIX})
-endif()
-if(WITH_XPU AND NOT WIN32)
-  set(XPU_INSTALL_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}xpu")
-  set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpuapi${CMAKE_SHARED_LIBRARY_SUFFIX})
-  set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpurt${CMAKE_SHARED_LIBRARY_SUFFIX})
-endif()
-if(WITH_NPU AND NOT WIN32)
-  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libgraph${CMAKE_SHARED_LIBRARY_SUFFIX})
-  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libge_runner${CMAKE_SHARED_LIBRARY_SUFFIX})
-  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
-  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
-  set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libacl_op_compiler${CMAKE_SHARED_LIBRARY_SUFFIX})
-endif()
-add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
-target_link_libraries(${DEMO_NAME} ${DEPS})
-if(WIN32)
-  if(USE_TENSORRT)
-    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
-            COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_SHARED_LIBRARY_SUFFIX}
-              ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
-            COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX}
-              ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
-    )
-    if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
-      add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
-              COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_SHARED_LIBRARY_SUFFIX}
-                ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE})
-    endif()
-  endif()
-  if(WITH_MKL)
-    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
-          COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/mklml.dll ${CMAKE_BINARY_DIR}/Release
-          COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/libiomp5md.dll ${CMAKE_BINARY_DIR}/Release
-          COMMAND ${CMAKE_COMMAND} -E copy ${MKLDNN_PATH}/lib/mkldnn.dll  ${CMAKE_BINARY_DIR}/Release
-    )
-  else()
-    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
-          COMMAND ${CMAKE_COMMAND} -E copy ${OPENBLAS_LIB_PATH}/lib/openblas.dll ${CMAKE_BINARY_DIR}/Release
-    )
-  endif()
-  if(WITH_ONNXRUNTIME)
-    add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
-    COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.dll
-      ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
-    COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib/paddle2onnx.dll
-      ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
-    )
-  endif()
-  if(NOT WITH_STATIC_LIB)
-      add_custom_command(TARGET ${DEMO_NAME} POST_BUILD 
-        COMMAND ${CMAKE_COMMAND} -E copy "${PADDLE_LIB}/paddle/lib/paddle_inference.dll" ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
-      )
-  endif()
-endif()
--- a/example/auto_compression/pytorch_yolov7/cpp_infer/README.md
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/README.md
-# YOLOv7 TensorRT Benchmark测试（Linux）
-## 环境准备
- CUDA、CUDNN：确认环境中已经安装CUDA和CUDNN，并且提前获取其安装路径。
- TensorRT：可通过NVIDIA官网下载[TensorRT 8.4.1.5](https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.4.1/tars/tensorrt-8.4.1.5.linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz)或其他版本安装包。
- Paddle Inference C++预测库：编译develop版本请参考[编译文档](https://www.paddlepaddle.org.cn/inference/user_guides/source_compile.html)。编译完成后，会在build目录下生成`paddle_inference_install_dir`文件夹，这个就是我们需要的C++预测库文件。
-## 编译可执行程序
- (1)修改`compile.sh`中依赖库路径，主要是以下内容：
-```shell
-# Paddle Inference预测库路径
-LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
-# CUDNN路径
-CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
-# CUDA路径
-CUDA_LIB=/usr/local/cuda/lib64
-# TensorRT安装包路径，为TRT资源包解压完成后的绝对路径，其中包含`lib`和`include`文件夹
-TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
-```
-## 测试
- FP32
-```
-./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp32
-```
- FP16
-```
-./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp16
-```
- INT8
-```
-./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
-```
-## 性能对比
-| 预测库 |  模型  | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |
-| :--------: | :--------: |:-------- |:--------: | :---------------------: |
-| Paddle TensorRT | YOLOv7 |   26.84ms  |   7.44ms   |  4.55ms  |
-| TensorRT  | YOLOv7 |   28.25ms  |   7.23ms   |  4.67ms  |
-环境：
- Tesla T4，TensorRT 8.4.1，CUDA 11.2
- batch_size=1
--- a/example/auto_compression/pytorch_yolov7/cpp_infer/compile.sh
+++ b/example/auto_compression/pytorch_yolov7/cpp_infer/compile.sh
-#!/bin/bash
-set +x
-set -e
-work_path=$(dirname $(readlink -f $0))
-mkdir -p build
-cd build
-rm -rf *
-DEMO_NAME=trt_run
-WITH_MKL=ON
-WITH_GPU=ON
-USE_TENSORRT=ON
-LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
-CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
-CUDA_LIB=/usr/local/cuda/lib64
-TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
-WITH_ROCM=OFF
-ROCM_LIB=/opt/rocm/lib
-cmake .. -DPADDLE_LIB=${LIB_DIR} \
-  -DWITH_MKL=${WITH_MKL} \
-  -DDEMO_NAME=${DEMO_NAME} \
-  -DWITH_GPU=${WITH_GPU} \
-  -DWITH_STATIC_LIB=OFF \
-  -DUSE_TENSORRT=${USE_TENSORRT} \
-  -DWITH_ROCM=${WITH_ROCM} \
-  -DROCM_LIB=${ROCM_LIB} \
-  -DCUDNN_LIB=${CUDNN_LIB} \
-  -DCUDA_LIB=${CUDA_LIB} \
-  -DTENSORRT_ROOT=${TENSORRT_ROOT}
-make -j
--- a/example/auto_compression/pytorch_yolov7/images/000000570688.jpg
+++ b/example/auto_compression/pytorch_yolov7/images/000000570688.jpg
--- a/example/auto_compression/pytorch_yolov7/run.py
+++ b/example/auto_compression/pytorch_yolov7/run.py
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import os
-import sys
-import numpy as np
-import argparse
-import paddle
-from ppdet.core.workspace import load_config, merge_config
-from ppdet.core.workspace import create
-from ppdet.metrics import COCOMetric, VOCMetric
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
-from paddleslim.auto_compression import AutoCompression
-from post_process import YOLOv7PostProcess
-def argsparser():
-    parser = argparse.ArgumentParser(description=__doc__)
-    parser.add_argument(
-        '--config_path',
-        type=str,
-        default=None,
-        help="path of compression strategy config.",
-        required=True)
-    parser.add_argument(
-        '--save_dir',
-        type=str,
-        default='output',
-        help="directory to save compressed model.")
-    parser.add_argument(
-        '--devices',
-        type=str,
-        default='gpu',
-        help="which device used to compress.")
-    parser.add_argument(
-        '--eval', type=bool, default=False, help="whether to run evaluation.")
-    return parser
-def reader_wrapper(reader, input_list):
-    def gen():
-        for data in reader:
-            in_dict = {}
-            if isinstance(input_list, list):
-                for input_name in input_list:
-                    in_dict[input_name] = data[input_name]
-            elif isinstance(input_list, dict):
-                for input_name in input_list.keys():
-                    in_dict[input_list[input_name]] = data[input_name]
-            yield in_dict
-    return gen
-def convert_numpy_data(data, metric):
-    data_all = {}
-    data_all = {k: np.array(v) for k, v in data.items()}
-    if isinstance(metric, VOCMetric):
-        for k, v in data_all.items():
-            if not isinstance(v[0], np.ndarray):
-                tmp_list = []
-                for t in v:
-                    tmp_list.append(np.array(t))
-                data_all[k] = np.array(tmp_list)
-    else:
-        data_all = {k: np.array(v) for k, v in data.items()}
-    return data_all
-def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
-    metric = global_config['metric']
-    for batch_id, data in enumerate(val_loader):
-        data_all = convert_numpy_data(data, metric)
-        data_input = {}
-        for k, v in data.items():
-            if isinstance(global_config['input_list'], list):
-                if k in test_feed_names:
-                    data_input[k] = np.array(v)
-            elif isinstance(global_config['input_list'], dict):
-                if k in global_config['input_list'].keys():
-                    data_input[global_config['input_list'][k]] = np.array(v)
-        outs = exe.run(compiled_test_program,
-                       feed=data_input,
-                       fetch_list=test_fetch_list,
-                       return_numpy=False)
-        res = {}
-        postprocess = YOLOv7PostProcess(
-            score_threshold=0.001, nms_threshold=0.65, multi_label=True)
-        res = postprocess(np.array(outs[0]), data_all['scale_factor'])
-        metric.update(data_all, res)
-        if batch_id % 100 == 0:
-            print('Eval iter:', batch_id)
-    metric.accumulate()
-    metric.log()
-    map_res = metric.get_results()
-    metric.reset()
-    return map_res['bbox'][0]
-def main():
-    global global_config
-    all_config = load_slim_config(FLAGS.config_path)
-    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
-    global_config = all_config["Global"]
-    reader_cfg = load_config(global_config['reader_config'])
-    train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
-                                        reader_cfg['worker_num'],
-                                        return_list=True)
-    train_loader = reader_wrapper(train_loader, global_config['input_list'])
-    if 'Evaluation' in global_config.keys() and global_config[
-            'Evaluation'] and paddle.distributed.get_rank() == 0:
-        eval_func = eval_function
-        dataset = reader_cfg['EvalDataset']
-        global val_loader
-        _eval_batch_sampler = paddle.io.BatchSampler(
-            dataset, batch_size=reader_cfg['EvalReader']['batch_size'])
-        val_loader = create('EvalReader')(dataset,
-                                          reader_cfg['worker_num'],
-                                          batch_sampler=_eval_batch_sampler,
-                                          return_list=True)
-        metric = None
-        if reader_cfg['metric'] == 'COCO':
-            clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
-            anno_file = dataset.get_anno()
-            metric = COCOMetric(
-                anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
-        elif reader_cfg['metric'] == 'VOC':
-            metric = VOCMetric(
-                label_list=dataset.get_label_list(),
-                class_num=reader_cfg['num_classes'],
-                map_type=reader_cfg['map_type'])
-        else:
-            raise ValueError("metric currently only supports COCO and VOC.")
-        global_config['metric'] = metric
-    else:
-        eval_func = None
-    ac = AutoCompression(
-        model_dir=global_config["model_dir"],
-        model_filename=global_config["model_filename"],
-        params_filename=global_config["params_filename"],
-        save_dir=FLAGS.save_dir,
-        config=all_config,
-        train_dataloader=train_loader,
-        eval_callback=eval_func)
-    ac.compress()
-if __name__ == '__main__':
-    paddle.enable_static()
-    parser = argsparser()
-    FLAGS = parser.parse_args()
-    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
-    paddle.set_device(FLAGS.devices)
-    main()
--- a/example/auto_compression/semantic_segmentation/README.md
+++ b/example/auto_compression/semantic_segmentation/README.md
@@ -71,10 +71,10 @@ git clone https://github.com/PaddlePaddle/PaddleSlim.git
 安装paddleseg
 ```shell
-pip install paddleseg
+pip install paddleseg==2.5.0
 ```
-注：安装[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)的目的只是为了直接使用PaddleSeg中的Dataloader组件，不涉及模型组网等。
+注：安装[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)的目的只是为了直接使用PaddleSeg中的Dataloader组件，不涉及模型组网等。推荐安装PaddleSeg 2.5.0, 不同版本的PaddleSeg的Dataloader返回数据的格式略有不同.
 #### 3.2 准备数据集

--- a/example/auto_compression/semantic_segmentation/run.py
+++ b/example/auto_compression/semantic_segmentation/run.py
@@ -21,7 +21,7 @@ from paddleseg.cvlibs import Config as PaddleSegDataConfig
 from paddleseg.utils import worker_init_fn
 from paddleslim.auto_compression import AutoCompression
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.common import load_config as load_slim_config
 from paddleseg.core.infer import reverse_transform
 from paddleseg.utils import metrics
@@ -38,6 +38,11 @@ def argsparser():
        type=str,
        default=None,
        help="directory to save compressed model.")
+    parser.add_argument(
+        '--devices',
+        type=str,
+        default='gpu',
+        help="which device used to compress.")
    return parser
@@ -123,7 +128,12 @@ def main(args):
    config = all_config["Global"]
    rank_id = paddle.distributed.get_rank()
-    place = paddle.CUDAPlace(rank_id)
+    if args.devices == 'gpu':
+        place = paddle.CUDAPlace(rank_id)
+        paddle.set_device('gpu')
+    else:
+        place = paddle.CPUPlace()
+        paddle.set_device('cpu')
    # step1: load dataset config and create dataloader
    data_cfg = PaddleSegDataConfig(config['reader_config'])
    train_dataset = data_cfg.train_dataset

--- a/example/auto_compression/tensorflow_mobilenet/eval.py
+++ b/example/auto_compression/tensorflow_mobilenet/eval.py
@@ -23,7 +23,7 @@ import paddle
 import paddle.nn as nn
 from paddle.io import DataLoader
 from imagenet_reader import ImageNetDataset
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.common import load_config as load_slim_config
 def argsparser():
@@ -93,7 +93,8 @@ def eval():
 def main():
    global global_config
    all_config = load_slim_config(args.config_path)
-    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
+    assert "Global" in all_config, "Key 'Global' not found in config file. \n{}".format(
+        all_config)
    global_config = all_config["Global"]
    global data_dir
    data_dir = global_config['data_dir']

--- a/example/auto_compression/tensorflow_mobilenet/run.py
+++ b/example/auto_compression/tensorflow_mobilenet/run.py
@@ -23,7 +23,7 @@ import paddle
 import paddle.nn as nn
 from paddle.io import DataLoader
 from imagenet_reader import ImageNetDataset
-from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.common import load_config as load_slim_config
 from paddleslim.auto_compression import AutoCompression
@@ -107,7 +107,8 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
 def main():
    global global_config
    all_config = load_slim_config(args.config_path)
-    assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
+    assert "Global" in all_config, "Key 'Global' not found in config file. \n{}".format(
+        all_config)
    global_config = all_config["Global"]
    global data_dir
    data_dir = global_config['data_dir']

--- a/example/post_training_quantization/analysis.md
+++ b/example/post_training_quantization/analysis.md
+# 量化分析工具详细教程
+## 1. 量化分析工具功能
+1. 遍历模型所有层，依次量化该层，计算量化后精度。为所有只量化一层的模型精度排序，可视化不适合量化的层，以供量化时可选择性跳过不适合量化的层。
+2. 可视化量化效果最好和最差的层的权重和激活分布图，以供分析模型量化效果的原因。
+3. 【敬请期待】输入预期精度，直接产出符合预期精度的量化模型。
+## 2. paddleslim.quant.AnalysisQuant 可传入参数解析
+```yaml
+model_dir
+model_filename: None
+params_filename: None
+eval_function: None
+data_loader: None
+save_dir: 'analysis_results'
+checkpoint_name: 'analysis_checkpoint.pkl'
+num_histogram_plots: 10
+ptq_config
+```
+- model_dir: 必须传入的模型文件路径，可为文件夹名；若模型为ONNX类型，直接输入'.onnx'模型文件名称即可。
+- model_filename: 默认为None，若model_dir为文件夹名，则必须传入以'.pdmodel'结尾的模型名称，若model_dir为'.onnx'模型文件名称，则不需要传入。
+- params_filename: 默认为None，若model_dir为文件夹名，则必须传入以'.pdiparams'结尾的模型名称，若model_dir为'.onnx'模型文件名称，则不需要传入。
+- eval_function：目前不支持为None，需要传入自定义的验证函数。
+- data_loader：模型校准时使用的数据，DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader，或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。
+- save_dir：分析后保存模型精度或pdf等文件的文件夹，默认为`analysis_results`。
+- checkpoint_name：由于模型可能存在大量层需要分析，因此分析过程中会中间保存结果，如果程序中断会自动加载已经分析好的结果，默认为`analysis_checkpoint.pkl`。
+- num_histogram_plots：需要可视化的直方分布图数量。可视化量化效果最好和最坏的该数量个权重和激活的分布图。默认为10。若不需要可视化直方图，设置为0即可。
+- ptq_config：可传入的离线量化中的参数，详细可参考[离线量化文档](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post)。
+##  3. 量化分析工具产出内容
+量化分析工具会默认会产出以下目录：
+```
+analysis_results/
+├── analysis.txt
+├── best_weight_hist_result.pdf
+├── best_act_hist_result.pdf
+├── worst_weight_hist_result.pdf
+├── worst_act_hist_result.pdf
+```
+- 所有只量化一层的模型精度排序，将默认保存在 `./analysis_results/analysis.txt` 中。
+- 通过设置参数`num_histogram_plots`，可选择绘出该数量个量化效果最好和最差层的weight和activation的直方分布图，将以PDF形式保存在 `./analysis_results` 文件夹下， 分别保存为 `best_weight_hist_result.pdf`，`best_act_hist_result.pdf`，`worst_weight_hist_result.pdf` 和 `worst_act_hist_result.pdf` 中以供对比分析。
+##  3. 根据分析结果执行离线量化
+执行完量化分析工具后，可根据 `analysis.txt` 中的精度排序，在量化中去掉效果较差的层，具体操作为：在调用 `paddleslim.quant.quant_post_static` 时加入参数 `skip_tensor_list`，将需要去掉的层传入即可。
--- a/example/auto_compression/pytorch_yolov7/eval.py
+++ b/example/auto_compression/pytorch_yolov7/eval.py
--- a/example/post_training_quantization/detection/configs/picodet_s_analysis.yaml
+++ b/example/post_training_quantization/detection/configs/picodet_s_analysis.yaml
+input_list: ['image', 'scale_factor']
+model_dir: ./picodet_s_416_coco_lcnet/
+model_filename: model.pdmodel
+params_filename: model.pdiparams
+save_dir: ./analysis_results
+metric: COCO
+num_classes: 80
+PTQ:
+  quantizable_op_type: ["conv2d", "depthwise_conv2d"]
+  weight_quantize_type: 'abs_max'
+  activation_quantize_type: 'moving_average_abs_max'
+  is_full_quantize: False
+  batch_size: 10
+  batch_nums: 10
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: /dataset/coco/
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: /dataset/coco/
+eval_height: &eval_height 416
+eval_width: &eval_width 416
+eval_size: &eval_size [*eval_height, *eval_width]
+worker_num: 0
+EvalReader:
+  inputs_def:
+    image_shape: [1, 3, *eval_height, *eval_width]
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_size: 32
--- a/example/auto_compression/pytorch_yolov6/configs/yolov6_reader.yml
+++ b/example/auto_compression/pytorch_yolov6/configs/yolov6_reader.yml
+input_list: ['image', 'scale_factor']
+model_dir: ./picodet_s_416_coco_lcnet/
+model_filename: model.pdmodel
+params_filename: model.pdiparams
+skip_tensor_list: None
 metric: COCO
 num_classes: 80
@@ -6,22 +12,27 @@ TrainDataset:
  !COCODataSet
    image_dir: train2017
    anno_path: annotations/instances_train2017.json
-    dataset_dir: dataset/coco/
+    dataset_dir: /dataset/coco/
 EvalDataset:
  !COCODataSet
    image_dir: val2017
    anno_path: annotations/instances_val2017.json
-    dataset_dir: dataset/coco/
+    dataset_dir: /dataset/coco/
+eval_height: &eval_height 416
+eval_width: &eval_width 416
+eval_size: &eval_size [*eval_height, *eval_width]
 worker_num: 0
-# preprocess reader in test
 EvalReader:
+  inputs_def:
+    image_shape: [1, 3, *eval_height, *eval_width]
  sample_transforms:
-    - Decode: {}
+  - Decode: {}
-    - Resize: {target_size: [640, 640], keep_ratio: True}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
-    - Pad: {size: [640, 640], fill_value: [114., 114., 114.]}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
-    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+  - Permute: {}
-    - Permute: {}
+  batch_size: 32
-  batch_size: 1
--- a/example/auto_compression/pytorch_yolov6/eval.py
+++ b/example/auto_compression/pytorch_yolov6/eval.py
--- a/example/post_training_quantization/detection/keypoint_utils.py
+++ b/example/post_training_quantization/detection/keypoint_utils.py
--- a/example/post_training_quantization/detection/post_process.py
+++ b/example/post_training_quantization/detection/post_process.py
--- a/example/auto_compression/pytorch_yolov5/post_quant.py
+++ b/example/auto_compression/pytorch_yolov5/post_quant.py
--- a/example/post_training_quantization/pytorch_yolo_series/README.md
+++ b/example/post_training_quantization/pytorch_yolo_series/README.md
--- a/example/post_training_quantization/pytorch_yolo_series/analysis.py
+++ b/example/post_training_quantization/pytorch_yolo_series/analysis.py
--- a/example/post_training_quantization/pytorch_yolo_series/configs/yolov5s_ptq.yaml
+++ b/example/post_training_quantization/pytorch_yolo_series/configs/yolov5s_ptq.yaml
+arch: YOLOv5
+model_dir: ./yolov5s.onnx
+dataset_dir: dataset/coco/
+train_image_dir: train2017
+val_image_dir: val2017
+train_anno_path: annotations/instances_train2017.json
+val_anno_path: annotations/instances_val2017.json
+skip_tensors: None # you can set it after analysis
--- a/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_analysis.yaml
+++ b/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_analysis.yaml
+arch: YOLOv6
+model_dir: ./yolov6s.onnx
+save_dir: ./analysis_results
+dataset_dir: /dataset/coco/
+val_image_dir: val2017
+val_anno_path: annotations/instances_val2017.json
+PTQ:
+  quantizable_op_type: ["conv2d", "depthwise_conv2d"]
+  weight_quantize_type: 'abs_max'
+  activation_quantize_type: 'moving_average_abs_max'
+  is_full_quantize: False
+  batch_size: 10
+  batch_nums: 10
--- a/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_analyzed_ptq.yaml
+++ b/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_analyzed_ptq.yaml
--- a/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_ptq.yaml
+++ b/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_ptq.yaml
--- a/example/post_training_quantization/pytorch_yolo_series/configs/yolov7s_ptq.yaml
+++ b/example/post_training_quantization/pytorch_yolo_series/configs/yolov7s_ptq.yaml
--- a/example/post_training_quantization/pytorch_yolo_series/dataset.py
+++ b/example/post_training_quantization/pytorch_yolo_series/dataset.py
--- a/example/post_training_quantization/pytorch_yolo_series/eval.py
+++ b/example/post_training_quantization/pytorch_yolo_series/eval.py
--- a/example/post_training_quantization/pytorch_yolo_series/images/hist_compare.png
+++ b/example/post_training_quantization/pytorch_yolo_series/images/hist_compare.png
--- a/example/post_training_quantization/pytorch_yolo_series/images/sensitivity_rank.png
+++ b/example/post_training_quantization/pytorch_yolo_series/images/sensitivity_rank.png
--- a/example/auto_compression/pytorch_yolov6/post_process.py
+++ b/example/auto_compression/pytorch_yolov6/post_process.py
--- a/example/auto_compression/pytorch_yolov7/post_quant.py
+++ b/example/auto_compression/pytorch_yolov7/post_quant.py
--- a/paddleslim/analysis/_utils.py
+++ b/paddleslim/analysis/_utils.py
--- a/paddleslim/analysis/latency_predictor.py
+++ b/paddleslim/analysis/latency_predictor.py
--- a/paddleslim/auto_compression/__init__.py
+++ b/paddleslim/auto_compression/__init__.py
--- a/paddleslim/auto_compression/auto_strategy.py
+++ b/paddleslim/auto_compression/auto_strategy.py
--- a/paddleslim/auto_compression/compressor.py
+++ b/paddleslim/auto_compression/compressor.py
--- a/paddleslim/auto_compression/config_helpers.py
+++ b/paddleslim/auto_compression/config_helpers.py
--- a/paddleslim/auto_compression/create_compressed_program.py
+++ b/paddleslim/auto_compression/create_compressed_program.py
--- a/paddleslim/auto_compression/strategy_config.py
+++ b/paddleslim/auto_compression/strategy_config.py
--- a/paddleslim/auto_compression/utils/__init__.py
+++ b/paddleslim/auto_compression/utils/__init__.py
--- a/paddleslim/auto_compression/utils/fake_ptq.py
+++ b/paddleslim/auto_compression/utils/fake_ptq.py
--- a/paddleslim/auto_compression/utils/load_model.py
+++ b/paddleslim/auto_compression/utils/load_model.py
--- a/paddleslim/auto_compression/utils/predict.py
+++ b/paddleslim/auto_compression/utils/predict.py
--- a/paddleslim/auto_compression/utils/prune_model.py
+++ b/paddleslim/auto_compression/utils/prune_model.py
--- a/paddleslim/common/__init__.py
+++ b/paddleslim/common/__init__.py
--- a/paddleslim/common/config_helper.py
+++ b/paddleslim/common/config_helper.py
--- a/paddleslim/auto_compression/utils/dataloader.py
+++ b/paddleslim/auto_compression/utils/dataloader.py
--- a/paddleslim/common/load_model.py
+++ b/paddleslim/common/load_model.py
--- a/paddleslim/dygraph/prune/pruning_plan.py
+++ b/paddleslim/dygraph/prune/pruning_plan.py
--- a/paddleslim/prune/prune_worker.py
+++ b/paddleslim/prune/prune_worker.py
--- a/paddleslim/prune/pruner.py
+++ b/paddleslim/prune/pruner.py
--- a/paddleslim/quant/__init__.py
+++ b/paddleslim/quant/__init__.py
--- a/paddleslim/quant/analysis.py
+++ b/paddleslim/quant/analysis.py
--- a/paddleslim/quant/post_quant_hpo.py
+++ b/paddleslim/quant/post_quant_hpo.py
--- a/paddleslim/quant/quanter.py
+++ b/paddleslim/quant/quanter.py
--- a/requirements.txt
+++ b/requirements.txt
--- a/tests/act/test_act_api.py
+++ b/tests/act/test_act_api.py
--- a/tests/test_prune_walker.py
+++ b/tests/test_prune_walker.py
--- a/tests/test_quant_post.py
+++ b/tests/test_quant_post.py