9.6 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11
English | [简体中文](


## Table of Contents
- [Introduction](#Introduction)
- [Model Zoo](#Model-Zoo)
- [Getting Start](#Getting-Start)
- [Appendix](#Appendix)

## Introduction
PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surpassing a variety of popular yolo models. PP-YOLOE has a series of models, named s/m/l/x, which are configured through width multiplier and depth multiplier. PP-YOLOE avoids using special operators, such as deformable convolution or matrix nms, to be deployed friendly on various hardware. For more details, please refer to our [report](
13 14 15 16 17 18 19 20 21 22 23 24 25 26

<div align="center">
  <img src="../../docs/images/ppyoloe_map_fps.png" width=500 />

PP-YOLOE-l achieves 51.4 mAP on COCO test-dev2017 dataset with 78.1 FPS on Tesla V100. While using TensorRT FP16, PP-YOLOE-l can be further accelerated to 149.2 FPS. PP-YOLOE-s/m/x also have excellent accuracy and speed performance, which can be found in [Model Zoo](#Model-Zoo)

PP-YOLOE is composed of following methods:
- Scalable backbone and neck
- [Task Alignment Learning](
- Efficient Task-aligned head with [DFL]( and [VFL](
- [SiLU activation function](

## Model Zoo
27 28
|          Model           | GPU number | images/GPU |  backbone  | input shape | Box AP<sup>val</sup> | Box AP<sup>test</sup> | Params(M) | FLOPs(G) | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config  |
|:------------------------:|:-------:|:----------:|:----------:| :-------:| :------------------: | :-------------------: |:---------:|:--------:| :------------: | :---------------------: | :------: | :------: |
29 30 31 32
| PP-YOLOE-s                  |     8      |     32     | cspresnet-s |     640     |       42.7        |        43.1         |   7.93    |  17.36   |      208.3      |          333.3          | [model]( | [config](                   |
| PP-YOLOE-m                  |     8      |     28     | cspresnet-m |     640     |       48.6        |        48.9         |   23.43   |  49.91   |      123.4      |          208.3          | [model]( | [config](                   |
| PP-YOLOE-l                  |     8      |     20      | cspresnet-l |     640     |       50.9        |        51.4         |   52.20   |  110.07  |      78.1      |          149.2          | [model]( | [config](                   |
| PP-YOLOE-x                  |     8      |     16     | cspresnet-x |     640     |       51.9        |        52.2         |   98.42   |  206.59  |      45.0      |          95.2          | [model]( | [config](                   |
33 34 35 36


- PP-YOLOE is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset,Box AP<sup>test</sup> is evaluation results of `mAP(IoU=0.5:0.95)`.
- PP-YOLOE used 8 GPUs for mixed precision training, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
- PP-YOLOE inference speed is tesed on single Tesla V100 with batch size as 1, CUDA 10.2, CUDNN 7.6.5, TensorRT in TensorRT mode.
- PP-YOLOE inference speed testing uses inference model exported by `tools/` with `-o exclude_nms=True` and benchmarked by running `depoly/python/` with `--run_benchmark`. All testing results do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)]( in testing method.
- If you set `--run_benchmark=True`,you should install these dependencies at first, `pip install pynvml psutil GPUtil`.

## Getting Start

### 1. Training

Training PP-YOLOE with mixed precision on 8 GPUs with following command

python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/ -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --amp

** Notes: ** use `--amp` to train with default config to avoid out of memeory.

### 2. Evaluation

Evaluating PP-YOLOE on COCO val2017 dataset in single GPU with following commands:

CUDA_VISIBLE_DEVICES=0 python tools/ -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=

For evaluation on COCO test-dev2017 dataset, please download COCO test-dev2017 dataset from [COCO dataset download]( and decompress to COCO dataset directory and configure `EvalDataset` like `configs/ppyolo/ppyolo_test.yml`.

### 3. Inference

Inference images in single GPU with following commands, use `--infer_img` to inference a single image and `--infer_dir` to inference all images in the directory.

# inference single image
CUDA_VISIBLE_DEVICES=0 python tools/ -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights= --infer_img=demo/000000014439_640x640.jpg

# inference all images in the directory
CUDA_VISIBLE_DEVICES=0 python tools/ -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights= --infer_dir=demo

### 4. Deployment

- Paddle Inference [Python](../../deploy/python) & [C++](../../deploy/cpp)
Wenyu 已提交
79 80 81 82 83
- [Paddle-TensorRT](../../deploy/
- [Paddle2ONNX](
- [PaddleServing](
<!-- - [Paddle-Lite]( -->

84 85 86 87 88
For deployment on GPU or benchmarked, model should be first exported to inference model using `tools/`.

Exporting PP-YOLOE for Paddle Inference **without TensorRT**, use following command.

python tools/ -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=
90 91 92 93 94

Exporting PP-YOLOE for Paddle Inference **with TensorRT** for better performance, use following command with extra `-o trt=True` setting.

python tools/ -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights= trt=True
96 97

`deploy/python/` is used to load exported paddle inference model above for inference and benchmark through Paddle Inference.
99 100 101

# inference single image
CUDA_VISIBLE_DEVICES=0 python deploy/python/ --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu
103 104

# inference all images in the directory
CUDA_VISIBLE_DEVICES=0 python deploy/python/ --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_dir=demo/ --device=gpu
106 107

# benchmark
CUDA_VISIBLE_DEVICES=0 python deploy/python/ --model_dir=output_inference/ppyoloe_crn_l_300e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_benchmark=True
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124

If you want to export PP-YOLOE model to **ONNX format**, use following command refer to [PaddleDetection Model Export as ONNX Format Tutorial](../../deploy/

# export inference model
python tools/ configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --output_dir=output_inference -o weights=

# install paddle2onnx
pip install paddle2onnx

# convert to onnx
paddle2onnx --model_dir output_inference/ppyoloe_crn_l_300e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_crn_l_300e_coco.onnx


Wenyu 已提交
125 126 127 128 129 130 131 132 133 134 135 136 137 138
### 5. Other Datasets

Model | AP | AP<sub>50</sub>
[YOLOX]( | 22.6 | 37.5
[YOLOv5]( | 26.0 | 42.7
**PP-YOLOE** | **30.5** | **46.4**

- Here, we use [VisDrone]( dataset, and to detect 9 objects including `person, bicycles, car, van, truck, tricyle, awning-tricyle, bus, motor`.
- Above models trained using official default config, and load pretrained parameters on COCO dataset.
- *Due to the limited time, more verification results will be supplemented in the future. You are also welcome to contribute to PP-YOLOE*

139 140 141 142 143 144 145 146 147 148 149
## Appendix

Ablation experiments of PP-YOLOE.

| NO.  |        Model                 | Box AP<sup>val</sup> | Params(M) | FLOPs(G) | V100 FP32 FPS |
| :--: | :---------------------------: | :------------------: | :-------: | :------: | :-----------: |
|  A   | PP-YOLOv2          |         49.1         |   54.58   |  115.77   |     68.9     |
|  B   | A + Anchor-free    |         48.8         |   54.27   |  114.78   |     69.8     |
|  C   | B + CSPRepResNet   |         49.5         |   47.42   |  101.87   |     85.5     |
|  D   | C + TAL            |         50.4         |   48.32   |  104.75   |     84.0     |
|  E   | D + ET-Head        |         50.9         |   52.20   |  110.07   |     78.1     |