diff --git a/configs/ppyoloe/README.md b/configs/ppyoloe/README.md index aa52b37f049110fe11bb2cf80e31bc19f68c1118..fb7f562ca2b054716d63d6646498ac26e366eaf8 100644 --- a/configs/ppyoloe/README.md +++ b/configs/ppyoloe/README.md @@ -34,14 +34,14 @@ PP-YOLOE is composed of following methods: **Notes:** - PP-YOLOE is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset,Box APtest is evaluation results of `mAP(IoU=0.5:0.95)`. -- PP-YOLOE used 8 GPUs for mixed precision training, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/FAQ). -- PP-YOLOE inference speed is tesed on single Tesla V100 with batch size as 1, CUDA 10.2, CUDNN 7.6.5, TensorRT 6.0.1.8 in TensorRT mode. -- PP-YOLOE inference speed testing uses inference model exported by `tools/export_model.py` with `-o exclude_nms=True` and benchmarked by running `depoly/python/infer.py` with `--run_benchmark`. All testing results do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method. +- PP-YOLOE used 8 GPUs for mixed precision training, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/FAQ). +- PP-YOLOE inference speed is tesed on single Tesla V100 with batch size as 1, **CUDA 10.2**, **CUDNN 7.6.5**, **TensorRT 6.0.1.8** in TensorRT mode. +- Refer to [Speed testing](#Speed-testing) to reproduce the speed testing results of PP-YOLOE. - If you set `--run_benchmark=True`,you should install these dependencies at first, `pip install pynvml psutil GPUtil`. ## Getting Start -### 1. Training +### Training Training PP-YOLOE with mixed precision on 8 GPUs with following command @@ -51,7 +51,7 @@ python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c con ** Notes: ** use `--amp` to train with default config to avoid out of memeory. -### 2. Evaluation +### Evaluation Evaluating PP-YOLOE on COCO val2017 dataset in single GPU with following commands: @@ -61,7 +61,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyoloe/ppyoloe_crn_l_300 For evaluation on COCO test-dev2017 dataset, please download COCO test-dev2017 dataset from [COCO dataset download](https://cocodataset.org/#download) and decompress to COCO dataset directory and configure `EvalDataset` like `configs/ppyolo/ppyolo_test.yml`. -### 3. Inference +### Inference Inference images in single GPU with following commands, use `--infer_img` to inference a single image and `--infer_dir` to inference all images in the directory. @@ -73,56 +73,96 @@ CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_crn_l_30 CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams --infer_dir=demo ``` -### 4. Deployment +### Exporting models -- Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp) -- [Paddle-TensorRT](../../deploy/TENSOR_RT.md) -- [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) -- [PaddleServing](https://github.com/PaddlePaddle/Serving) - +For deployment on GPU or speed testing, model should be first exported to inference model using `tools/export_model.py`. -For deployment on GPU or benchmarked, model should be first exported to inference model using `tools/export_model.py`. - -Exporting PP-YOLOE for Paddle Inference **without TensorRT**, use following command. +**Exporting PP-YOLOE for Paddle Inference without TensorRT**, use following command ```bash python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams ``` -Exporting PP-YOLOE for Paddle Inference **with TensorRT** for better performance, use following command with extra `-o trt=True` setting. +**Exporting PP-YOLOE for Paddle Inference with TensorRT** for better performance, use following command with extra `-o trt=True` setting. ```bash python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True ``` -`deploy/python/infer.py` is used to load exported paddle inference model above for inference and benchmark through Paddle Inference. +If you want to export PP-YOLOE model to **ONNX format**, use following command refer to [PaddleDetection Model Export as ONNX Format Tutorial](../../deploy/EXPORT_ONNX_MODEL_en.md). ```bash -# inference single image -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu +# export inference model +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams -# inference all images in the directory -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_dir=demo/ --device=gpu +# install paddle2onnx +pip install paddle2onnx + +# convert to onnx +paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_300e_coco.onnx -# benchmark -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True ``` -If you want to export PP-YOLOE model to **ONNX format**, use following command refer to [PaddleDetection Model Export as ONNX Format Tutorial](../../deploy/EXPORT_ONNX_MODEL_en.md). +**Notes:** ONNX model only supports batch_size=1 now + +### Speed testing + +For fair comparison, the speed in [Model Zoo](#Model-Zoo) do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method. Thus, you should export model with extra `-o exclude_nms=True` setting. + +**Using Paddle Inference without TensorRT** to test speed, run following command ```bash # export inference model -python tools/export_model.py configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams exclude_nms=True -# install paddle2onnx -pip install paddle2onnx +# speed testing with run_benchmark=True +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=gpu --run_benchmark=True +``` -# convert to onnx -paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_300e_coco.onnx +**Using Paddle Inference with TensorRT** to test speed, run following command + +```bash +# export inference model with trt=True +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams exclude_nms=True trt=True + +# speed testing with run_benchmark=True,run_mode=trt_fp32/trt_fp16 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=trt_fp16 --device=gpu --run_benchmark=True + +``` + +### Deployment + +PP-YOLOE can be deployed by following approches: + - Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp) + - [Paddle-TensorRT](../../deploy/TENSOR_RT.md) + - [PaddleServing](https://github.com/PaddlePaddle/Serving) + +Next, we will introduce how to use Paddle Inference to deploy PP-YOLOE models in TensorRT FP16 mode. + +First, refer to [Paddle Inference Docs](https://www.paddlepaddle.org.cn/inference/master/user_guides/download_lib.html#python), download and install packages corresponding to CUDA, CUDNN and TensorRT version. + +Then, Exporting PP-YOLOE for Paddle Inference **with TensorRT**, use following command. + +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True +``` + +Finally, inference in TensorRT FP16 mode. + +```bash +# inference single image +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_mode=trt_fp16 + +# inference all images in the directory +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_dir=demo/ --device=gpu --run_mode=trt_fp16 ``` -### 5. Other Datasets +**Notes: ** +- TensorRT will perform optimization for the current hardware platform according to the definition of the network, generate an inference engine and serialize it into a file. This inference engine is only applicable to the current hardware hardware platform. If your hardware and software platform has not changed, you can set `use_static=True` in [enable_tensorrt_engine](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/infer.py#L660). In this way, the serialized file generated will be saved in the `output_inference` folder, and the saved serialized file will be loaded the next time when TensorRT is executed. +- PaddleDetection release/2.4 and later versions will support NMS calling TensorRT, which requires PaddlePaddle release/2.3 and later versions. + +### Other Datasets Model | AP | AP50 ---|---|--- diff --git a/configs/ppyoloe/README_cn.md b/configs/ppyoloe/README_cn.md index d770e1cf0407a5b318eec53584e3a6b791a1f42c..fb609f9c7e6b87f2b2096c7a804bbecb0681350d 100644 --- a/configs/ppyoloe/README_cn.md +++ b/configs/ppyoloe/README_cn.md @@ -34,14 +34,14 @@ PP-YOLOE由以下方法组成 **注意:** - PP-YOLOE模型使用COCO数据集中train2017作为训练集,使用val2017和test-dev2017作为测试集,Box APtest为`mAP(IoU=0.5:0.95)`评估结果。 -- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果训练GPU数和batch size不使用上述配置,须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/FAQ)调整学习率和迭代次数。 -- PP-YOLOE模型推理速度测试采用单卡V100,batch size=1进行测试,使用CUDA 10.2, CUDNN 7.6.5,TensorRT推理速度测试使用TensorRT 6.0.1.8。 -- PP-YOLOE推理速度测试使用`tools/export_model.py`并设置`-o exclude_nms=True`脚本导出的模型,并用`deploy/python/infer.py`设置`--run_benchnark`参数得到。测试结果均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致)。 +- PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练,如果训练GPU数和batch size不使用上述配置,须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/FAQ)调整学习率和迭代次数。 +- PP-YOLOE模型推理速度测试采用单卡V100,batch size=1进行测试,使用**CUDA 10.2**, **CUDNN 7.6.5**,TensorRT推理速度测试使用**TensorRT 6.0.1.8**。 +- 参考[速度测试](##速度测试)以复现PP-YOLOE推理速度测试结果。 - 如果你设置了`--run_benchnark=True`, 你首先需要安装以下依赖`pip install pynvml psutil GPUtil`。 ## 使用教程 -### 1. 训练 +### 训练 执行以下指令使用混合精度训练PP-YOLOE @@ -51,7 +51,7 @@ python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c con ** 注意: ** 使用默认配置训练需要设置`--amp`以避免显存溢出. -### 2. 评估 +### 评估 执行以下命令在单个GPU上评估COCO val2017数据集 @@ -61,7 +61,7 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyoloe/ppyoloe_crn_l_300 在coco test-dev2017上评估,请先从[COCO数据集下载](https://cocodataset.org/#download)下载COCO test-dev2017数据集,然后解压到COCO数据集文件夹并像`configs/ppyolo/ppyolo_test.yml`一样配置`EvalDataset`。 -### 3. 推理 +### 推理 使用以下命令在单张GPU上预测图片,使用`--infer_img`推理单张图片以及使用`--infer_dir`推理文件中的所有图片。 @@ -74,59 +74,97 @@ CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_crn_l_30 CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams --infer_dir=demo ``` -### 4. 部署 +### 模型导出 -- Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp) -- [Paddle-TensorRT](../../deploy/TENSOR_RT.md) -- [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) -- [PaddleServing](https://github.com/PaddlePaddle/Serving) +PP-YOLOE在GPU上部署或者速度测试需要通过`tools/export_model.py`导出模型。 - -PP-YOLOE在GPU上部署或者推理benchmark需要通过`tools/export_model.py`导出模型。 - -当你使用Paddle Inference但不使用TensorRT时,运行以下的命令进行导出 +当你**使用Paddle Inference但不使用TensorRT**时,运行以下的命令导出模型 ```bash python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams ``` -当你使用Paddle Inference的TensorRT时,需要指定`-o trt=True`进行导出 +当你**使用Paddle Inference且使用TensorRT**时,需要指定`-o trt=True`来导出模型。 ```bash python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True ``` -`deploy/python/infer.py`使用上述导出后的Paddle Inference模型用于推理和benchnark. +如果你想将PP-YOLOE模型导出为**ONNX格式**,参考 +[PaddleDetection模型导出为ONNX格式教程](../../deploy/EXPORT_ONNX_MODEL.md),运行以下命令: ```bash -# 推理单张图片 -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu -# 推理文件夹下的所有图片 -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_dir=demo/ --device=gpu +# 导出推理模型 +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams -# benchmark -CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True +# 安装paddle2onnx +pip install paddle2onnx + +# 转换成onnx格式 +paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_300e_coco.onnx ``` -如果你想将PP-YOLOE模型导出为**ONNX格式**,参考 -[PaddleDetection模型导出为ONNX格式教程](../../deploy/EXPORT_ONNX_MODEL.md) +**注意:**ONNX模型目前只支持batch_size=1 + +### 速度测试 +为了公平起见,在[模型库](#模型库)中的速度测试结果均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致),需要在导出模型时指定`-o exclude_nms=True`. + +**使用Paddle Inference但不使用TensorRT**进行测速,执行以下命令: ```bash -# 导出推理模型 -python tools/export_model.py configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams +# 导出模型 +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams exclude_nms=True -# 安装paddle2onnx -pip install paddle2onnx +# 速度测试,使用run_benchmark=True +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=gpu --run_benchmark=True +``` -# 转换成onnx格式 -paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_300e_coco.onnx +**使用Paddle Inference且使用TensorRT**进行测速,执行以下命令: + +```bash +# 导出模型,使用trt=True +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams exclude_nms=True trt=True + +# 速度测试,使用run_benchmark=True, run_mode=trt_fp32/trt_fp16 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=trt_fp16 --device=gpu --run_benchmark=True + +``` + +### 部署 + +PP-YOLOE可以使用以下方式进行部署: + - Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp) + - [Paddle-TensorRT](../../deploy/TENSOR_RT.md) + - [PaddleServing](https://github.com/PaddlePaddle/Serving) + +接下来,我们将介绍PP-YOLOE如何使用Paddle Inference在TensorRT FP16模式下部署 + +首先,参考[Paddle Inference文档](https://www.paddlepaddle.org.cn/inference/master/user_guides/download_lib.html#python),下载并安装与你的CUDA, CUDNN和TensorRT相应的wheel包。 + +然后,运行以下命令导出模型 + +```bash +python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True +``` + +最后,使用TensorRT FP16进行推理 + +```bash +# 推理单张图片 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_mode=trt_fp16 + +# 推理文件夹下的所有图片 +CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_dir=demo/ --device=gpu --run_mode=trt_fp16 ``` +**注意:** +- TensorRT会根据网络的定义,执行针对当前硬件平台的优化,生成推理引擎并序列化为文件。该推理引擎只适用于当前软硬件平台。如果你的软硬件平台没有发生变化,你可以设置[enable_tensorrt_engine](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/infer.py#L660)的参数`use_static=True`,这样生成的序列化文件将会保存在`output_inference`文件夹下,下次执行TensorRT时将加载保存的序列化文件。 +- PaddleDetection release/2.4及其之后的版本将支持NMS调用TensorRT,需要依赖PaddlePaddle release/2.3及其之后的版本 -### 5. 泛化性验证 +### 泛化性验证 模型 | AP | AP50 ---|---|---