diff --git a/example/auto_compression/pytorch_yolov7/README.md b/example/auto_compression/pytorch_yolov7/README.md index 768a99eb57c090df377634d171b779809a56a0af..802b79398f30a48d67884f4494f70b7c89b3c60b 100644 --- a/example/auto_compression/pytorch_yolov7/README.md +++ b/example/auto_compression/pytorch_yolov7/README.md @@ -14,17 +14,19 @@ ## 1. 简介 -飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。 - -本示例将以[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)目标检测模型为例,将PyTorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。 +本示例将以[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)目标检测模型为例,借助[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)的能力,将PyTorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行模型压缩,压缩后的模型可使用Paddle Inference或者导出至ONNX,利用TensorRT部署。 ## 2.Benchmark -| 模型 | 策略 | 输入尺寸 | mAPval
0.5:0.95 | 预测时延FP32
(ms) |预测时延FP16
(ms) | 预测时延INT8
(ms) | 配置文件 | Inference模型 | -| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | -| YOLOv7 | Base模型 | 640*640 | 51.1 | 26.84ms | 7.44ms | - | - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx) | -| YOLOv7 | KL离线量化 | 640*640 | 50.2 | - | - | 4.55ms | - | - | -| YOLOv7 | 量化蒸馏训练 | 640*640 | **50.8** | - | - | **4.55ms** | [config](./configs/yolov7_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar) | [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.onnx) | +| 模型 | 策略 | 输入尺寸 | mAPval
0.5:0.95 | 模型体积 | 预测时延FP32
|预测时延FP16
| 预测时延INT8
| 配置文件 | Inference模型 | +| :-------- |:-------- |:--------: | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: | +| YOLOv7 | Base模型 | 640*640 | 51.1 | 141MB | 26.84ms | 7.44ms | - | - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx) | +| YOLOv7 | 离线量化 | 640*640 | 50.2 | 36MB | - | - | 4.55ms | - | - | +| YOLOv7 | ACT量化训练 | 640*640 | **50.9** | 36MB | - | - | **4.55ms** | [config](./configs/yolov7_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar) | [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.onnx) | +| | | | | | | | | | +| YOLOv7-Tiny | Base模型 | 640*640 | 37.3 | 24MB | 5.06ms | 2.32ms | - | - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx) | +| YOLOv7-Tiny | 离线量化 | 640*640 | - | 6.1MB | - | - | 1.68ms | - | - | +| YOLOv7-Tiny | ACT量化训练 | 640*640 | **37.0** | 6.1MB | - | - | **1.68ms** | [config](./configs/yolov7_tiny_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_tiny_quant.tar) | [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_tiny_quant.onnx) | 说明: - mAP的指标均在COCO val2017数据集中评测得到。 @@ -33,10 +35,8 @@ ## 3. 自动压缩流程 #### 3.1 准备环境 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim > 2.3版本 -- PaddleDet >= 2.4 -- opencv-python +- PaddlePaddle develop每日版本 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)下载安装) +- PaddleSlim develop 版本 (1)安装paddlepaddle: ```shell @@ -48,22 +48,32 @@ pip install paddlepaddle-gpu (2)安装paddleslim: ```shell -pip install paddleslim -``` - -(3)安装paddledet: -```shell -pip install paddledet +git clone https://github.com/PaddlePaddle/PaddleSlim.git & cd PaddleSlim +python setup.py install ``` -注:安装PaddleDet的目的只是为了直接使用PaddleDetection中的Dataloader组件。 - #### 3.2 准备数据集 -本案例默认以COCO数据进行自动压缩实验,并且依赖PaddleDetection中数据读取模块,如果自定义COCO数据,或者其他格式数据,请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。 +本示例默认以COCO数据进行自动压缩实验,可以从[MS COCO官网](https://cocodataset.org)下载[Train](http://images.cocodataset.org/zips/train2017.zip)、[Val](http://images.cocodataset.org/zips/val2017.zip)、[annotation](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)。 + +目录格式如下: +``` +dataset/coco/ +├── annotations +│ ├── instances_train2017.json +│ ├── instances_val2017.json +│ | ... +├── train2017 +│ ├── 000000000009.jpg +│ ├── 000000580008.jpg +│ | ... +├── val2017 +│ ├── 000000000139.jpg +│ ├── 000000000285.jpg +``` -如果已经准备好数据集,请直接修改[./configs/yolov7_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。 +如果是自定义数据集,请按照如上COCO数据格式准备数据。 #### 3.3 准备预测模型 @@ -73,13 +83,10 @@ pip install paddledet 可通过[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)的导出脚本来准备ONNX模型,具体步骤如下: ```shell git clone https://github.com/WongKinYiu/yolov7.git -# 切换分支到u5分支,保持导出的ONNX模型后处理和YOLOv5一致 -git checkout u5 -# 下载好yolov7.pt权重后执行: -python export.py --weights yolov7.pt --include onnx +python export.py --weights yolov7-tiny.pt --grid ``` -也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx)。 +**注意**:目前ACT支持不带NMS模型,使用如上命令导出即可。也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx)。 #### 3.4 自动压缩并产出模型 @@ -88,13 +95,13 @@ python export.py --weights yolov7.pt --include onnx - 单卡训练: ``` export CUDA_VISIBLE_DEVICES=0 -python run.py --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/' +python run.py --config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/' ``` - 多卡训练: ``` CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \ - --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/' + --config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/' ``` #### 3.5 测试模型精度 @@ -102,7 +109,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log - 修改[yolov7_qat_dis.yaml](./configs/yolov7_qat_dis.yaml)中`model_dir`字段为模型存储路径,然后使用eval.py脚本得到模型的mAP: ``` export CUDA_VISIBLE_DEVICES=0 -python eval.py --config_path=./configs/yolov7_qat_dis.yaml +python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml ``` diff --git a/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml b/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml index 41428206da4e68587b2e29fc1f179a934bd0ede7..d0b68159ebd487c26b95e260a9705c17e83d7176 100644 --- a/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml +++ b/example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml @@ -21,7 +21,7 @@ Quantization: TrainConfig: train_iter: 5000 eval_iter: 1000 - learning_rate: + learning_rate: type: CosineAnnealingDecay learning_rate: 0.00003 T_max: 8000 diff --git a/example/auto_compression/pytorch_yolov7/configs/yolov7_tiny_qat_dis.yaml b/example/auto_compression/pytorch_yolov7/configs/yolov7_tiny_qat_dis.yaml new file mode 100644 index 0000000000000000000000000000000000000000..7940f282f3c2d20ddbfaff2df806d8a0872b2dec --- /dev/null +++ b/example/auto_compression/pytorch_yolov7/configs/yolov7_tiny_qat_dis.yaml @@ -0,0 +1,31 @@ +Global: + model_dir: ./yolov7-tiny.onnx + dataset_dir: dataset/coco/ + train_image_dir: train2017 + val_image_dir: val2017 + train_anno_path: annotations/instances_train2017.json + val_anno_path: annotations/instances_val2017.json + Evaluation: True + +Distillation: + alpha: 1.0 + loss: soft_label + +Quantization: + onnx_format: true + activation_quantize_type: 'moving_average_abs_max' + quantize_op_types: + - conv2d + - depthwise_conv2d + +TrainConfig: + train_iter: 5000 + eval_iter: 1000 + learning_rate: + type: CosineAnnealingDecay + learning_rate: 0.00003 + T_max: 8000 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 0.00004 diff --git a/example/auto_compression/pytorch_yolov7/eval.py b/example/auto_compression/pytorch_yolov7/eval.py index 1530c71e3ac18efdd24d202ddb3fc27e6fa51762..451301f1056c0233eefa49b13ffa7265f5d6bef2 100644 --- a/example/auto_compression/pytorch_yolov7/eval.py +++ b/example/auto_compression/pytorch_yolov7/eval.py @@ -19,7 +19,7 @@ import argparse from tqdm import tqdm import paddle from paddleslim.auto_compression.config_helpers import load_config as load_slim_config -from paddleslim.common import load_onnx_model +from paddleslim.auto_compression.utils import load_inference_model from post_process import YOLOv7PostProcess, coco_metric from dataset import COCOValDataset @@ -46,8 +46,8 @@ def eval(): place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace() exe = paddle.static.Executor(place) - val_program, feed_target_names, fetch_targets = load_onnx_model( - global_config["model_dir"]) + val_program, feed_target_names, fetch_targets = load_inference_model( + global_config["model_dir"], exe) bboxes_list, bbox_nums_list, image_id_list = [], [], [] with tqdm( diff --git a/example/auto_compression/pytorch_yolov7/onnx_trt_infer.py b/example/auto_compression/pytorch_yolov7/onnx_trt_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..3540c33d6ed8a86398b620d3cc722dc1a11ca85a --- /dev/null +++ b/example/auto_compression/pytorch_yolov7/onnx_trt_infer.py @@ -0,0 +1,378 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import cv2 +import tensorrt as trt +import pycuda.driver as cuda +import pycuda.autoinit +import os +import time +import random +import argparse + +EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH) +EXPLICIT_PRECISION = 1 << ( + int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION) + +# load coco labels +CLASS_LABEL = [ + "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", + "truck", "boat", "traffic light", "fire hydrant", "stop sign", + "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", + "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", + "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", + "baseball bat", "baseball glove", "skateboard", "surfboard", + "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", + "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", + "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", + "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", + "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", + "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", + "hair drier", "toothbrush" +] + + +def preprocess(image, input_size, mean=None, std=None, swap=(2, 0, 1)): + if len(image.shape) == 3: + padded_img = np.ones((input_size[0], input_size[1], 3)) * 114.0 + else: + padded_img = np.ones(input_size) * 114.0 + img = np.array(image) + r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1]) + resized_img = cv2.resize( + img, + (int(img.shape[1] * r), int(img.shape[0] * r)), + interpolation=cv2.INTER_LINEAR, ).astype(np.float32) + padded_img[:int(img.shape[0] * r), :int(img.shape[1] * r)] = resized_img + + padded_img = padded_img[:, :, ::-1] + padded_img /= 255.0 + if mean is not None: + padded_img -= mean + if std is not None: + padded_img /= std + padded_img = padded_img.transpose(swap) + padded_img = np.ascontiguousarray(padded_img, dtype=np.float32) + return padded_img, r + + +def postprocess(predictions, ratio): + boxes = predictions[:, :4] + scores = predictions[:, 4:5] * predictions[:, 5:] + boxes_xyxy = np.ones_like(boxes) + boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2. + boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2. + boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2. + boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2. + boxes_xyxy /= ratio + dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.45, score_thr=0.1) + return dets + + +def nms(boxes, scores, nms_thr): + """Single class NMS implemented in Numpy.""" + x1 = boxes[:, 0] + y1 = boxes[:, 1] + x2 = boxes[:, 2] + y2 = boxes[:, 3] + + areas = (x2 - x1 + 1) * (y2 - y1 + 1) + order = scores.argsort()[::-1] + + keep = [] + while order.size > 0: + i = order[0] + keep.append(i) + xx1 = np.maximum(x1[i], x1[order[1:]]) + yy1 = np.maximum(y1[i], y1[order[1:]]) + xx2 = np.minimum(x2[i], x2[order[1:]]) + yy2 = np.minimum(y2[i], y2[order[1:]]) + + w = np.maximum(0.0, xx2 - xx1 + 1) + h = np.maximum(0.0, yy2 - yy1 + 1) + inter = w * h + ovr = inter / (areas[i] + areas[order[1:]] - inter) + + inds = np.where(ovr <= nms_thr)[0] + order = order[inds + 1] + + return keep + + +def multiclass_nms(boxes, scores, nms_thr, score_thr): + """Multiclass NMS implemented in Numpy""" + final_dets = [] + num_classes = scores.shape[1] + for cls_ind in range(num_classes): + cls_scores = scores[:, cls_ind] + valid_score_mask = cls_scores > score_thr + if valid_score_mask.sum() == 0: + continue + else: + valid_scores = cls_scores[valid_score_mask] + valid_boxes = boxes[valid_score_mask] + keep = nms(valid_boxes, valid_scores, nms_thr) + if len(keep) > 0: + cls_inds = np.ones((len(keep), 1)) * cls_ind + dets = np.concatenate( + [valid_boxes[keep], valid_scores[keep, None], cls_inds], 1) + final_dets.append(dets) + if len(final_dets) == 0: + return None + return np.concatenate(final_dets, 0) + + +def get_color_map_list(num_classes): + color_map = num_classes * [0, 0, 0] + for i in range(0, num_classes): + j = 0 + lab = i + while lab: + color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j)) + color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j)) + color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j)) + j += 1 + lab >>= 3 + color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)] + return color_map + + +def draw_box(img, boxes, scores, cls_ids, conf=0.5, class_names=None): + color_list = get_color_map_list(len(class_names)) + for i in range(len(boxes)): + box = boxes[i] + cls_id = int(cls_ids[i]) + color = tuple(color_list[cls_id]) + score = scores[i] + if score < conf: + continue + x0 = int(box[0]) + y0 = int(box[1]) + x1 = int(box[2]) + y1 = int(box[3]) + + text = '{}:{:.1f}%'.format(class_names[cls_id], score * 100) + font = cv2.FONT_HERSHEY_SIMPLEX + + txt_size = cv2.getTextSize(text, font, 0.4, 1)[0] + cv2.rectangle(img, (x0, y0), (x1, y1), color, 2) + cv2.rectangle(img, (x0, y0 + 1), + (x0 + txt_size[0] + 1, y0 + int(1.5 * txt_size[1])), + color, -1) + cv2.putText( + img, + text, (x0, y0 + txt_size[1]), + font, + 0.8, (0, 255, 0), + thickness=2) + + return img + + +def get_engine(precision, model_file_path): + # TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) + TRT_LOGGER = trt.Logger() + builder = trt.Builder(TRT_LOGGER) + config = builder.create_builder_config() + if precision == 'int8': + network = builder.create_network(EXPLICIT_BATCH | EXPLICIT_PRECISION) + else: + network = builder.create_network(EXPLICIT_BATCH) + parser = trt.OnnxParser(network, TRT_LOGGER) + + runtime = trt.Runtime(TRT_LOGGER) + if model_file_path.endswith('.trt'): + # If a serialized engine exists, use it instead of building an engine. + print("Reading engine from file {}".format(model_file_path)) + with open(model_file_path, + "rb") as f, trt.Runtime(TRT_LOGGER) as runtime: + engine = runtime.deserialize_cuda_engine(f.read()) + for i in range(network.num_layers): + layer = network.get_layer(i) + print(i, layer.name) + return engine + else: + config.max_workspace_size = 1 << 30 + + if precision == "fp16": + if not builder.platform_has_fast_fp16: + print("FP16 is not supported natively on this platform/device") + else: + config.set_flag(trt.BuilderFlag.FP16) + elif precision == "int8": + if not builder.platform_has_fast_int8: + print("INT8 is not supported natively on this platform/device") + else: + if builder.platform_has_fast_fp16: + # Also enable fp16, as some layers may be even more efficient in fp16 than int8 + config.set_flag(trt.BuilderFlag.FP16) + config.set_flag(trt.BuilderFlag.INT8) + + builder.max_batch_size = 1 + print('Loading ONNX file from path {}...'.format(model_file_path)) + with open(model_file_path, 'rb') as model: + print('Beginning ONNX file parsing') + if not parser.parse(model.read()): + print('ERROR: Failed to parse the ONNX file.') + for error in range(parser.num_errors): + print(parser.get_error(error)) + return None + + print('Completed parsing of ONNX file') + print('Building an engine from file {}; this may take a while...'. + format(model_file_path)) + plan = builder.build_serialized_network(network, config) + engine = runtime.deserialize_cuda_engine(plan) + print("Completed creating Engine") + with open(model_file_path, "wb") as f: + f.write(engine.serialize()) + for i in range(network.num_layers): + layer = network.get_layer(i) + print(i, layer.name) + return engine + + +# Simple helper data class that's a little nicer to use than a 2-tuple. +class HostDeviceMem(object): + def __init__(self, host_mem, device_mem): + self.host = host_mem + self.device = device_mem + + def __str__(self): + return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device) + + def __repr__(self): + return self.__str__() + + +def allocate_buffers(engine): + inputs = [] + outputs = [] + bindings = [] + stream = cuda.Stream() + for binding in engine: + size = trt.volume(engine.get_binding_shape( + binding)) * engine.max_batch_size + dtype = trt.nptype(engine.get_binding_dtype(binding)) + # Allocate host and device buffers + host_mem = cuda.pagelocked_empty(size, dtype) + device_mem = cuda.mem_alloc(host_mem.nbytes) + # Append the device buffer to device bindings. + bindings.append(int(device_mem)) + # Append to the appropriate list. + if engine.binding_is_input(binding): + inputs.append(HostDeviceMem(host_mem, device_mem)) + else: + outputs.append(HostDeviceMem(host_mem, device_mem)) + return inputs, outputs, bindings, stream + + +def run_inference(context, bindings, inputs, outputs, stream): + # Transfer input data to the GPU. + [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs] + # Run inference. + context.execute_async_v2(bindings=bindings, stream_handle=stream.handle) + # Transfer predictions back from the GPU. + [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs] + # Synchronize the stream + stream.synchronize() + # Return only the host outputs. + return [out.host for out in outputs] + + +def main(args): + onnx_model = args.model_path + img_path = args.image_file + num_class = len(CLASS_LABEL) + repeat = 1000 + engine = get_engine(args.precision, onnx_model) + + model_all_names = [] + for idx in range(engine.num_bindings): + is_input = engine.binding_is_input(idx) + name = engine.get_binding_name(idx) + op_type = engine.get_binding_dtype(idx) + model_all_names.append(name) + shape = engine.get_binding_shape(idx) + print('input id:', idx, ' is input: ', is_input, ' binding name:', + name, ' shape:', shape, 'type: ', op_type) + + context = engine.create_execution_context() + print('Allocate buffers ...') + inputs, outputs, bindings, stream = allocate_buffers(engine) + print("TRT set input ...") + + origin_img = cv2.imread(img_path) + input_shape = [args.img_shape, args.img_shape] + input_image, ratio = preprocess(origin_img, input_shape) + + inputs[0].host = np.expand_dims(input_image, axis=0) + + for _ in range(0, 50): + trt_outputs = run_inference( + context, + bindings=bindings, + inputs=inputs, + outputs=outputs, + stream=stream) + + time1 = time.time() + for _ in range(0, repeat): + trt_outputs = run_inference( + context, + bindings=bindings, + inputs=inputs, + outputs=outputs, + stream=stream) + time2 = time.time() + # total time cost(ms) + total_inference_cost = (time2 - time1) * 1000 + print("model path: ", onnx_model, " precision: ", args.precision) + print("In TensorRT, ", + "average latency is : {} ms".format(total_inference_cost / repeat)) + # Do postprocess + output = trt_outputs[0] + predictions = np.reshape(output, (1, -1, int(5 + num_class)))[0] + dets = postprocess(predictions, ratio) + # Draw rectangles and labels on the original image + if dets is not None: + final_boxes, final_scores, final_cls_inds = dets[:, : + 4], dets[:, 4], dets[:, + 5] + origin_img = draw_box( + origin_img, + final_boxes, + final_scores, + final_cls_inds, + conf=0.5, + class_names=CLASS_LABEL) + cv2.imwrite('output.jpg', origin_img) + print('The prediction results are saved in output.jpg.') + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument( + '--model_path', + type=str, + default="quant_model.onnx", + help="inference model filepath") + parser.add_argument( + '--image_file', type=str, default="bus.jpg", help="image path") + parser.add_argument( + '--precision', type=str, default='fp32', help="support fp32/fp16/int8.") + parser.add_argument('--img_shape', type=int, default=640, help="input_size") + args = parser.parse_args() + main(args) diff --git a/example/auto_compression/pytorch_yolov7/post_quant.py b/example/auto_compression/pytorch_yolov7/post_quant.py index 84db4f989f41ad76c1aa2a2cbd49656b456f8750..a253e671f8fd16f0a8ab3d13dbf1413de6f56d14 100644 --- a/example/auto_compression/pytorch_yolov7/post_quant.py +++ b/example/auto_compression/pytorch_yolov7/post_quant.py @@ -22,6 +22,7 @@ from paddleslim.common import load_onnx_model from paddleslim.quant import quant_post_static from dataset import COCOTrainDataset + def argsparser(): parser = argparse.ArgumentParser(description=__doc__) parser.add_argument(