add PPYOLOE speed testing, model -> onnx -> tensorrt (#6300)

* add speed testing, model -> onnx -> tensorrt, test=document_fix * add speed testing, model -> onnx -> tensorrt, test=document_fix

add PPYOLOE speed testing, model -> onnx -> tensorrt (#6300)
* add speed testing, model -> onnx -> tensorrt, test=document_fix * add speed testing, model -> onnx -> tensorrt, test=document_fix
2a41d1a0 · Wenyu · GitHub · 8378e6db · 2a41d1a0 · 2a41d1a0
隐藏空白更改
内联并排

Showing with 46 addition and 0 deletion

configs/ppyoloe/README.md configs/ppyoloe/README.md +23 -0

configs/ppyoloe/README_cn.md configs/ppyoloe/README_cn.md +23 -0

未找到文件。
--- a/configs/ppyoloe/README.md
+++ b/configs/ppyoloe/README.md
@@ -133,6 +133,29 @@ CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inferenc

 ```

+**Using TensorRT Inference with ONNX** to test speed, run following command
+
+```bash
+# export inference model with trt=True
+python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams exclude_nms=True trt=True
+
+# convert to onnx
+paddle2onnx --model_dir output_inference/ppyoloe_crn_s_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_crn_s_300e_coco.onnx
+
+# trt inference using fp16 and batch_size=1
+trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
+
+# trt inference using fp16 and batch_size=32
+trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16
+
+# Using the above script, T4 and tensorrt 7.2 machine, the speed of PPYOLOE-s model is as follows,
+
+# batch_size=1, 2.80ms, 357fps
+# batch_size=32, 67.69ms, 472fps
+
+```
+
+
 ### Deployment

 PP-YOLOE can be deployed by following approches:

--- a/configs/ppyoloe/README_cn.md
+++ b/configs/ppyoloe/README_cn.md
@@ -133,6 +133,29 @@ CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inferenc

 ```

+
+**使用 ONNX 和 TensorRT** 进行测速，执行以下命令：
+
+```bash
+# 导出模型
+python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams exclude_nms=True trt=True
+
+# 转化成ONNX格式
+paddle2onnx --model_dir output_inference/ppyoloe_crn_s_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_crn_s_300e_coco.onnx
+
+# 测试速度，半精度，batch_size=1
+trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
+
+# 测试速度，半精度，batch_size=32
+trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16
+
+# 使用上边的脚本, 在T4 和 TensorRT 7.2的环境下，PPYOLOE-s模型速度如下
+# batch_size=1, 2.80ms, 357fps
+# batch_size=32, 67.69ms, 472fps
+```
+
+
+
 ### 部署

 PP-YOLOE可以使用以下方式进行部署：