From 2dfb9c016f0e286f2ace5d2d75881043d9554bb1 Mon Sep 17 00:00:00 2001 From: Wenyu Date: Wed, 29 Jun 2022 14:29:20 +0800 Subject: [PATCH] [cherry-pick] Add PPYOLOE speed test script (#6303) * add speed testing, model -> onnx -> tensorrt, test=document_fix * add speed testing, model -> onnx -> tensorrt, test=document_fix --- configs/ppyoloe/README.md | 23 +++++++++++++++++++++++ configs/ppyoloe/README_cn.md | 23 +++++++++++++++++++++++ 2 files changed, 46 insertions(+) diff --git a/configs/ppyoloe/README.md b/configs/ppyoloe/README.md index 1a749d895..cb065e7dc 100644 --- a/configs/ppyoloe/README.md +++ b/configs/ppyoloe/README.md @@ -130,6 +130,29 @@ CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inferenc ``` +**Using TensorRT Inference with ONNX** to test speed, run following command + +```bash +# export inference model with trt=True +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams exclude_nms=True trt=True + +# convert to onnx +paddle2onnx --model_dir output_inference/ppyoloe_crn_s_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_crn_s_300e_coco.onnx + +# trt inference using fp16 and batch_size=1 +trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16 + +# trt inference using fp16 and batch_size=32 +trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16 + +# Using the above script, T4 and tensorrt 7.2 machine, the speed of PPYOLOE-s model is as follows, + +# batch_size=1, 2.80ms, 357fps +# batch_size=32, 67.69ms, 472fps + +``` + + ### Deployment PP-YOLOE can be deployed by following approches: diff --git a/configs/ppyoloe/README_cn.md b/configs/ppyoloe/README_cn.md index 60138a35a..315063985 100644 --- a/configs/ppyoloe/README_cn.md +++ b/configs/ppyoloe/README_cn.md @@ -132,6 +132,29 @@ CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inferenc ``` + +**使用 ONNX 和 TensorRT** 进行测速,执行以下命令: + +```bash +# 导出模型 +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams exclude_nms=True trt=True + +# 转化成ONNX格式 +paddle2onnx --model_dir output_inference/ppyoloe_crn_s_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_crn_s_300e_coco.onnx + +# 测试速度,半精度,batch_size=1 +trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16 + +# 测试速度,半精度,batch_size=32 +trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16 + +# 使用上边的脚本, 在T4 和 TensorRT 7.2的环境下,PPYOLOE-s模型速度如下 +# batch_size=1, 2.80ms, 357fps +# batch_size=32, 67.69ms, 472fps +``` + + + ### 部署 PP-YOLOE可以使用以下方式进行部署: -- GitLab