未验证 提交 2a41d1a0 编写于 作者: W Wenyu 提交者: GitHub

add PPYOLOE speed testing, model -> onnx -> tensorrt (#6300)

* add speed testing, model -> onnx -> tensorrt, test=document_fix

* add speed testing, model -> onnx -> tensorrt, test=document_fix
上级 8378e6db
......@@ -133,6 +133,29 @@ CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inferenc
```
**Using TensorRT Inference with ONNX** to test speed, run following command
```bash
# export inference model with trt=True
python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams exclude_nms=True trt=True
# convert to onnx
paddle2onnx --model_dir output_inference/ppyoloe_crn_s_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_crn_s_300e_coco.onnx
# trt inference using fp16 and batch_size=1
trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
# trt inference using fp16 and batch_size=32
trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16
# Using the above script, T4 and tensorrt 7.2 machine, the speed of PPYOLOE-s model is as follows,
# batch_size=1, 2.80ms, 357fps
# batch_size=32, 67.69ms, 472fps
```
### Deployment
PP-YOLOE can be deployed by following approches:
......
......@@ -133,6 +133,29 @@ CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inferenc
```
**使用 ONNX 和 TensorRT** 进行测速,执行以下命令:
```bash
# 导出模型
python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams exclude_nms=True trt=True
# 转化成ONNX格式
paddle2onnx --model_dir output_inference/ppyoloe_crn_s_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_crn_s_300e_coco.onnx
# 测试速度,半精度,batch_size=1
trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
# 测试速度,半精度,batch_size=32
trtexec --onnx=./ppyoloe_crn_s_300e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16
# 使用上边的脚本, 在T4 和 TensorRT 7.2的环境下,PPYOLOE-s模型速度如下
# batch_size=1, 2.80ms, 357fps
# batch_size=32, 67.69ms, 472fps
```
### 部署
PP-YOLOE可以使用以下方式进行部署:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册