diff --git a/example/auto_compression/pytorch_yolov7/README.md b/example/auto_compression/pytorch_yolov7/README.md
index 768a99eb57c090df377634d171b779809a56a0af..802b79398f30a48d67884f4494f70b7c89b3c60b 100644
--- a/example/auto_compression/pytorch_yolov7/README.md
+++ b/example/auto_compression/pytorch_yolov7/README.md
@@ -14,17 +14,19 @@
 
 ## 1. 简介
 
-飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨（PaddlePaddle）的预测模型。借助X2Paddle的能力，各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。
-
-本示例将以[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)目标检测模型为例，将PyTorch框架模型转换为Paddle框架模型，再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。
+本示例将以[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)目标检测模型为例，借助[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)的能力，将PyTorch框架模型转换为Paddle框架模型，再使用ACT自动压缩功能进行模型压缩，压缩后的模型可使用Paddle Inference或者导出至ONNX，利用TensorRT部署。
 
 ## 2.Benchmark
 
-| 模型  |  策略  | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |  配置文件 | Inference模型  |
-| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
-| YOLOv7 |  Base模型 | 640*640  |  51.1   |   26.84ms  |   7.44ms   |  -  |  - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx) |
-| YOLOv7 |  KL离线量化 | 640*640  |  50.2   |   - |   -   |  4.55ms  |  - | - |
-| YOLOv7 |  量化蒸馏训练 | 640*640  |  **50.8**   |   - |   -   |  **4.55ms**  |  [config](./configs/yolov7_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.onnx) |
+| 模型  |  策略  | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 模型体积 | 预测时延<sup><small>FP32</small><sup><br><sup> |预测时延<sup><small>FP16</small><sup><br><sup> | 预测时延<sup><small>INT8</small><sup><br><sup> |  配置文件 | Inference模型  |
+| :-------- |:-------- |:--------: | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
+| YOLOv7 |  Base模型 | 640*640  |  51.1   | 141MB  |  26.84ms  |   7.44ms   |  -  |  - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx) |
+| YOLOv7 |  离线量化 | 640*640  |  50.2   | 36MB |   - |   -   |  4.55ms  |  - | - |
+| YOLOv7 |  ACT量化训练 | 640*640  |  **50.9**   | 36MB |   - |   -   |  **4.55ms**  |  [config](./configs/yolov7_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.onnx) |
+|  |  |  |  |  |  |  |  |  |
+| YOLOv7-Tiny |  Base模型 | 640*640  |  37.3   | 24MB  |  5.06ms  |   2.32ms   |  -  |  - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx) |
+| YOLOv7-Tiny |  离线量化 | 640*640  |  -   | 6.1MB  |   - |   -   |  1.68ms  |  - | - |
+| YOLOv7-Tiny |  ACT量化训练 | 640*640  |  **37.0**   | 6.1MB  |  - |   -   |  **1.68ms**  |  [config](./configs/yolov7_tiny_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_tiny_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_tiny_quant.onnx) |
 
 说明：
 - mAP的指标均在COCO val2017数据集中评测得到。
@@ -33,10 +35,8 @@
 ## 3. 自动压缩流程
 
 #### 3.1 准备环境
-- PaddlePaddle >= 2.3 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
-- PaddleSlim > 2.3版本
-- PaddleDet >= 2.4
-- opencv-python
+- PaddlePaddle develop每日版本 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)下载安装）
+- PaddleSlim develop 版本
 
 （1）安装paddlepaddle：
 ```shell
@@ -48,22 +48,32 @@ pip install paddlepaddle-gpu
 
 （2）安装paddleslim：
 ```shell
-pip install paddleslim
-```
-
-（3）安装paddledet：
-```shell
-pip install paddledet
+git clone https://github.com/PaddlePaddle/PaddleSlim.git & cd PaddleSlim
+python setup.py install
 ```
 
-注：安装PaddleDet的目的只是为了直接使用PaddleDetection中的Dataloader组件。
-
 
 #### 3.2 准备数据集
 
-本案例默认以COCO数据进行自动压缩实验，并且依赖PaddleDetection中数据读取模块，如果自定义COCO数据，或者其他格式数据，请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。
+本示例默认以COCO数据进行自动压缩实验，可以从[MS COCO官网](https://cocodataset.org)下载[Train](http://images.cocodataset.org/zips/train2017.zip)、[Val](http://images.cocodataset.org/zips/val2017.zip)、[annotation](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)。
+
+目录格式如下：
+```
+dataset/coco/
+├── annotations
+│   ├── instances_train2017.json
+│   ├── instances_val2017.json
+│   |   ...
+├── train2017
+│   ├── 000000000009.jpg
+│   ├── 000000580008.jpg
+│   |   ...
+├── val2017
+│   ├── 000000000139.jpg
+│   ├── 000000000285.jpg
+```
 
-如果已经准备好数据集，请直接修改[./configs/yolov7_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。
+如果是自定义数据集，请按照如上COCO数据格式准备数据。
 
 
 #### 3.3 准备预测模型
@@ -73,13 +83,10 @@ pip install paddledet
 可通过[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)的导出脚本来准备ONNX模型，具体步骤如下：
 ```shell
 git clone https://github.com/WongKinYiu/yolov7.git
-# 切换分支到u5分支，保持导出的ONNX模型后处理和YOLOv5一致
-git checkout u5
-# 下载好yolov7.pt权重后执行：
-python export.py --weights yolov7.pt --include onnx
+python export.py --weights yolov7-tiny.pt --grid
 ```
 
-也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx)。
+**注意**：目前ACT支持不带NMS模型，使用如上命令导出即可。也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx)。
 
 #### 3.4 自动压缩并产出模型
 
@@ -88,13 +95,13 @@ python export.py --weights yolov7.pt --include onnx
 - 单卡训练：
 ```
 export CUDA_VISIBLE_DEVICES=0
-python run.py --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
+python run.py --config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/'
 ```
 
 - 多卡训练：
 ```
 CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
-          --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
+          --config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/'
 ```
 
 #### 3.5 测试模型精度
@@ -102,7 +109,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log -
 修改[yolov7_qat_dis.yaml](./configs/yolov7_qat_dis.yaml)中`model_dir`字段为模型存储路径，然后使用eval.py脚本得到模型的mAP：
 ```
 export CUDA_VISIBLE_DEVICES=0
-python eval.py --config_path=./configs/yolov7_qat_dis.yaml
+python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml
 ```
 
 
diff --git a/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml b/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
index 41428206da4e68587b2e29fc1f179a934bd0ede7..d0b68159ebd487c26b95e260a9705c17e83d7176 100644
--- a/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
+++ b/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
@@ -21,7 +21,7 @@ Quantization:
 TrainConfig:
   train_iter: 5000
   eval_iter: 1000
-  learning_rate: 
+  learning_rate:
     type: CosineAnnealingDecay 
     learning_rate: 0.00003
     T_max: 8000
diff --git a/example/auto_compression/pytorch_yolov7/configs/yolov7_tiny_qat_dis.yaml b/example/auto_compression/pytorch_yolov7/configs/yolov7_tiny_qat_dis.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7940f282f3c2d20ddbfaff2df806d8a0872b2dec
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/configs/yolov7_tiny_qat_dis.yaml
@@ -0,0 +1,31 @@
+Global:
+  model_dir: ./yolov7-tiny.onnx
+  dataset_dir: dataset/coco/
+  train_image_dir: train2017
+  val_image_dir: val2017
+  train_anno_path: annotations/instances_train2017.json
+  val_anno_path: annotations/instances_val2017.json
+  Evaluation: True
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+Quantization:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:
+    type: CosineAnnealingDecay 
+    learning_rate: 0.00003
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 0.00004
diff --git a/example/auto_compression/pytorch_yolov7/eval.py b/example/auto_compression/pytorch_yolov7/eval.py
index 1530c71e3ac18efdd24d202ddb3fc27e6fa51762..451301f1056c0233eefa49b13ffa7265f5d6bef2 100644
--- a/example/auto_compression/pytorch_yolov7/eval.py
+++ b/example/auto_compression/pytorch_yolov7/eval.py
@@ -19,7 +19,7 @@ import argparse
 from tqdm import tqdm
 import paddle
 from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
-from paddleslim.common import load_onnx_model
+from paddleslim.auto_compression.utils import load_inference_model
 from post_process import YOLOv7PostProcess, coco_metric
 from dataset import COCOValDataset
 
@@ -46,8 +46,8 @@ def eval():
     place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
     exe = paddle.static.Executor(place)
 
-    val_program, feed_target_names, fetch_targets = load_onnx_model(
-        global_config["model_dir"])
+    val_program, feed_target_names, fetch_targets = load_inference_model(
+        global_config["model_dir"], exe)
 
     bboxes_list, bbox_nums_list, image_id_list = [], [], []
     with tqdm(
diff --git a/example/auto_compression/pytorch_yolov7/onnx_trt_infer.py b/example/auto_compression/pytorch_yolov7/onnx_trt_infer.py
new file mode 100644
index 0000000000000000000000000000000000000000..3540c33d6ed8a86398b620d3cc722dc1a11ca85a
--- /dev/null
+++ b/example/auto_compression/pytorch_yolov7/onnx_trt_infer.py
@@ -0,0 +1,378 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+import cv2
+import tensorrt as trt
+import pycuda.driver as cuda
+import pycuda.autoinit
+import os
+import time
+import random
+import argparse
+
+EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
+EXPLICIT_PRECISION = 1 << (
+    int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION)
+
+# load coco labels
+CLASS_LABEL = [
+    "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train",
+    "truck", "boat", "traffic light", "fire hydrant", "stop sign",
+    "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
+    "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag",
+    "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
+    "baseball bat", "baseball glove", "skateboard", "surfboard",
+    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon",
+    "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot",
+    "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant",
+    "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote",
+    "keyboard", "cell phone", "microwave", "oven", "toaster", "sink",
+    "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
+    "hair drier", "toothbrush"
+]
+
+
+def preprocess(image, input_size, mean=None, std=None, swap=(2, 0, 1)):
+    if len(image.shape) == 3:
+        padded_img = np.ones((input_size[0], input_size[1], 3)) * 114.0
+    else:
+        padded_img = np.ones(input_size) * 114.0
+    img = np.array(image)
+    r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
+    resized_img = cv2.resize(
+        img,
+        (int(img.shape[1] * r), int(img.shape[0] * r)),
+        interpolation=cv2.INTER_LINEAR, ).astype(np.float32)
+    padded_img[:int(img.shape[0] * r), :int(img.shape[1] * r)] = resized_img
+
+    padded_img = padded_img[:, :, ::-1]
+    padded_img /= 255.0
+    if mean is not None:
+        padded_img -= mean
+    if std is not None:
+        padded_img /= std
+    padded_img = padded_img.transpose(swap)
+    padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
+    return padded_img, r
+
+
+def postprocess(predictions, ratio):
+    boxes = predictions[:, :4]
+    scores = predictions[:, 4:5] * predictions[:, 5:]
+    boxes_xyxy = np.ones_like(boxes)
+    boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2.
+    boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2.
+    boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2.
+    boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2.
+    boxes_xyxy /= ratio
+    dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.45, score_thr=0.1)
+    return dets
+
+
+def nms(boxes, scores, nms_thr):
+    """Single class NMS implemented in Numpy."""
+    x1 = boxes[:, 0]
+    y1 = boxes[:, 1]
+    x2 = boxes[:, 2]
+    y2 = boxes[:, 3]
+
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    order = scores.argsort()[::-1]
+
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+        inds = np.where(ovr <= nms_thr)[0]
+        order = order[inds + 1]
+
+    return keep
+
+
+def multiclass_nms(boxes, scores, nms_thr, score_thr):
+    """Multiclass NMS implemented in Numpy"""
+    final_dets = []
+    num_classes = scores.shape[1]
+    for cls_ind in range(num_classes):
+        cls_scores = scores[:, cls_ind]
+        valid_score_mask = cls_scores > score_thr
+        if valid_score_mask.sum() == 0:
+            continue
+        else:
+            valid_scores = cls_scores[valid_score_mask]
+            valid_boxes = boxes[valid_score_mask]
+            keep = nms(valid_boxes, valid_scores, nms_thr)
+            if len(keep) > 0:
+                cls_inds = np.ones((len(keep), 1)) * cls_ind
+                dets = np.concatenate(
+                    [valid_boxes[keep], valid_scores[keep, None], cls_inds], 1)
+                final_dets.append(dets)
+    if len(final_dets) == 0:
+        return None
+    return np.concatenate(final_dets, 0)
+
+
+def get_color_map_list(num_classes):
+    color_map = num_classes * [0, 0, 0]
+    for i in range(0, num_classes):
+        j = 0
+        lab = i
+        while lab:
+            color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
+            color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
+            color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
+            j += 1
+            lab >>= 3
+    color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
+    return color_map
+
+
+def draw_box(img, boxes, scores, cls_ids, conf=0.5, class_names=None):
+    color_list = get_color_map_list(len(class_names))
+    for i in range(len(boxes)):
+        box = boxes[i]
+        cls_id = int(cls_ids[i])
+        color = tuple(color_list[cls_id])
+        score = scores[i]
+        if score < conf:
+            continue
+        x0 = int(box[0])
+        y0 = int(box[1])
+        x1 = int(box[2])
+        y1 = int(box[3])
+
+        text = '{}:{:.1f}%'.format(class_names[cls_id], score * 100)
+        font = cv2.FONT_HERSHEY_SIMPLEX
+
+        txt_size = cv2.getTextSize(text, font, 0.4, 1)[0]
+        cv2.rectangle(img, (x0, y0), (x1, y1), color, 2)
+        cv2.rectangle(img, (x0, y0 + 1),
+                      (x0 + txt_size[0] + 1, y0 + int(1.5 * txt_size[1])),
+                      color, -1)
+        cv2.putText(
+            img,
+            text, (x0, y0 + txt_size[1]),
+            font,
+            0.8, (0, 255, 0),
+            thickness=2)
+
+    return img
+
+
+def get_engine(precision, model_file_path):
+    # TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
+    TRT_LOGGER = trt.Logger()
+    builder = trt.Builder(TRT_LOGGER)
+    config = builder.create_builder_config()
+    if precision == 'int8':
+        network = builder.create_network(EXPLICIT_BATCH | EXPLICIT_PRECISION)
+    else:
+        network = builder.create_network(EXPLICIT_BATCH)
+    parser = trt.OnnxParser(network, TRT_LOGGER)
+
+    runtime = trt.Runtime(TRT_LOGGER)
+    if model_file_path.endswith('.trt'):
+        # If a serialized engine exists, use it instead of building an engine.
+        print("Reading engine from file {}".format(model_file_path))
+        with open(model_file_path,
+                  "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
+            engine = runtime.deserialize_cuda_engine(f.read())
+            for i in range(network.num_layers):
+                layer = network.get_layer(i)
+                print(i, layer.name)
+            return engine
+    else:
+        config.max_workspace_size = 1 << 30
+
+        if precision == "fp16":
+            if not builder.platform_has_fast_fp16:
+                print("FP16 is not supported natively on this platform/device")
+            else:
+                config.set_flag(trt.BuilderFlag.FP16)
+        elif precision == "int8":
+            if not builder.platform_has_fast_int8:
+                print("INT8 is not supported natively on this platform/device")
+            else:
+                if builder.platform_has_fast_fp16:
+                    # Also enable fp16, as some layers may be even more efficient in fp16 than int8
+                    config.set_flag(trt.BuilderFlag.FP16)
+                config.set_flag(trt.BuilderFlag.INT8)
+
+        builder.max_batch_size = 1
+        print('Loading ONNX file from path {}...'.format(model_file_path))
+        with open(model_file_path, 'rb') as model:
+            print('Beginning ONNX file parsing')
+            if not parser.parse(model.read()):
+                print('ERROR: Failed to parse the ONNX file.')
+                for error in range(parser.num_errors):
+                    print(parser.get_error(error))
+                return None
+
+        print('Completed parsing of ONNX file')
+        print('Building an engine from file {}; this may take a while...'.
+              format(model_file_path))
+        plan = builder.build_serialized_network(network, config)
+        engine = runtime.deserialize_cuda_engine(plan)
+        print("Completed creating Engine")
+        with open(model_file_path, "wb") as f:
+            f.write(engine.serialize())
+        for i in range(network.num_layers):
+            layer = network.get_layer(i)
+            print(i, layer.name)
+        return engine
+
+
+# Simple helper data class that's a little nicer to use than a 2-tuple.
+class HostDeviceMem(object):
+    def __init__(self, host_mem, device_mem):
+        self.host = host_mem
+        self.device = device_mem
+
+    def __str__(self):
+        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
+
+    def __repr__(self):
+        return self.__str__()
+
+
+def allocate_buffers(engine):
+    inputs = []
+    outputs = []
+    bindings = []
+    stream = cuda.Stream()
+    for binding in engine:
+        size = trt.volume(engine.get_binding_shape(
+            binding)) * engine.max_batch_size
+        dtype = trt.nptype(engine.get_binding_dtype(binding))
+        # Allocate host and device buffers
+        host_mem = cuda.pagelocked_empty(size, dtype)
+        device_mem = cuda.mem_alloc(host_mem.nbytes)
+        # Append the device buffer to device bindings.
+        bindings.append(int(device_mem))
+        # Append to the appropriate list.
+        if engine.binding_is_input(binding):
+            inputs.append(HostDeviceMem(host_mem, device_mem))
+        else:
+            outputs.append(HostDeviceMem(host_mem, device_mem))
+    return inputs, outputs, bindings, stream
+
+
+def run_inference(context, bindings, inputs, outputs, stream):
+    # Transfer input data to the GPU.
+    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
+    # Run inference.
+    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
+    # Transfer predictions back from the GPU.
+    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
+    # Synchronize the stream
+    stream.synchronize()
+    # Return only the host outputs.
+    return [out.host for out in outputs]
+
+
+def main(args):
+    onnx_model = args.model_path
+    img_path = args.image_file
+    num_class = len(CLASS_LABEL)
+    repeat = 1000
+    engine = get_engine(args.precision, onnx_model)
+
+    model_all_names = []
+    for idx in range(engine.num_bindings):
+        is_input = engine.binding_is_input(idx)
+        name = engine.get_binding_name(idx)
+        op_type = engine.get_binding_dtype(idx)
+        model_all_names.append(name)
+        shape = engine.get_binding_shape(idx)
+        print('input id:', idx, '   is input: ', is_input, '  binding name:',
+              name, '  shape:', shape, 'type: ', op_type)
+
+    context = engine.create_execution_context()
+    print('Allocate buffers ...')
+    inputs, outputs, bindings, stream = allocate_buffers(engine)
+    print("TRT set input ...")
+
+    origin_img = cv2.imread(img_path)
+    input_shape = [args.img_shape, args.img_shape]
+    input_image, ratio = preprocess(origin_img, input_shape)
+
+    inputs[0].host = np.expand_dims(input_image, axis=0)
+
+    for _ in range(0, 50):
+        trt_outputs = run_inference(
+            context,
+            bindings=bindings,
+            inputs=inputs,
+            outputs=outputs,
+            stream=stream)
+
+    time1 = time.time()
+    for _ in range(0, repeat):
+        trt_outputs = run_inference(
+            context,
+            bindings=bindings,
+            inputs=inputs,
+            outputs=outputs,
+            stream=stream)
+    time2 = time.time()
+    # total time cost(ms)
+    total_inference_cost = (time2 - time1) * 1000
+    print("model path: ", onnx_model, " precision: ", args.precision)
+    print("In TensorRT, ",
+          "average latency is : {} ms".format(total_inference_cost / repeat))
+    # Do postprocess
+    output = trt_outputs[0]
+    predictions = np.reshape(output, (1, -1, int(5 + num_class)))[0]
+    dets = postprocess(predictions, ratio)
+    # Draw rectangles and labels on the original image
+    if dets is not None:
+        final_boxes, final_scores, final_cls_inds = dets[:, :
+                                                         4], dets[:, 4], dets[:,
+                                                                              5]
+        origin_img = draw_box(
+            origin_img,
+            final_boxes,
+            final_scores,
+            final_cls_inds,
+            conf=0.5,
+            class_names=CLASS_LABEL)
+    cv2.imwrite('output.jpg', origin_img)
+    print('The prediction results are saved in output.jpg.')
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--model_path',
+        type=str,
+        default="quant_model.onnx",
+        help="inference model filepath")
+    parser.add_argument(
+        '--image_file', type=str, default="bus.jpg", help="image path")
+    parser.add_argument(
+        '--precision', type=str, default='fp32', help="support fp32/fp16/int8.")
+    parser.add_argument('--img_shape', type=int, default=640, help="input_size")
+    args = parser.parse_args()
+    main(args)
diff --git a/example/auto_compression/pytorch_yolov7/post_quant.py b/example/auto_compression/pytorch_yolov7/post_quant.py
index 84db4f989f41ad76c1aa2a2cbd49656b456f8750..a253e671f8fd16f0a8ab3d13dbf1413de6f56d14 100644
--- a/example/auto_compression/pytorch_yolov7/post_quant.py
+++ b/example/auto_compression/pytorch_yolov7/post_quant.py
@@ -22,6 +22,7 @@ from paddleslim.common import load_onnx_model
 from paddleslim.quant import quant_post_static
 from dataset import COCOTrainDataset
 
+
 def argsparser():
     parser = argparse.ArgumentParser(description=__doc__)
     parser.add_argument(