README.md 11.3 KB
Newer Older
W
Wenyu 已提交
1 2
# DETRs Beat YOLOs on Real-time Object Detection

W
Wenyu 已提交
3
## 最新动态
W
Wenyu 已提交
4

W
Wenyu 已提交
5
- 发布RT-DETR-R50和RT-DETR-R101的代码和预训练模型
W
Wenyu 已提交
6
- 发布RT-DETR-L和RT-DETR-X的代码和预训练模型
W
Wenyu 已提交
7 8 9
- 发布RT-DETR-R50-m模型(scale模型的范例)
- 发布RT-DETR-R34模型
- 发布RT-DETR-R18模型
W
Wenyu 已提交
10
- 发布RT-DETR-Swin和RT-DETR-FocalNet模型
W
Wenyu 已提交
11

W
Wenyu 已提交
12 13 14 15
## 简介
<!-- We propose a **R**eal-**T**ime **DE**tection **TR**ansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge. Specifically, we design an efficient hybrid encoder to efficiently process multi-scale features by decoupling the intra-scale interaction and cross-scale fusion, and propose IoU-aware query selection to improve the initialization of object queries. In addition, our proposed detector supports flexibly adjustment of the inference speed by using different decoder layers without the need for retraining, which facilitates the practical application of real-time object detectors. Our RT-DETR-L achieves 53.0% AP on COCO val2017 and 114 FPS on T4 GPU, while RT-DETR-X achieves 54.8% AP and 74 FPS, outperforming all YOLO detectors of the same scale in both speed and accuracy. Furthermore, our RT-DETR-R50 achieves 53.1% AP and 108 FPS, outperforming DINO-Deformable-DETR-R50 by 2.2% AP in accuracy and by about 21 times in FPS.  -->
RT-DETR是第一个实时端到端目标检测器。具体而言,我们设计了一个高效的混合编码器,通过解耦尺度内交互和跨尺度融合来高效处理多尺度特征,并提出了IoU感知的查询选择机制,以优化解码器查询的初始化。此外,RT-DETR支持通过使用不同的解码器层来灵活调整推理速度,而不需要重新训练,这有助于实时目标检测器的实际应用。RT-DETR-L在COCO val2017上实现了53.0%的AP,在T4 GPU上实现了114FPS,RT-DETR-X实现了54.8%的AP和74FPS,在速度和精度方面都优于相同规模的所有YOLO检测器。RT-DETR-R50实现了53.1%的AP和108FPS,RT-DETR-R101实现了54.3%的AP和74FPS,在精度上超过了全部使用相同骨干网络的DETR检测器。
若要了解更多细节,请参考我们的论文[paper](https://arxiv.org/abs/2304.08069).
W
Wenyu 已提交
16

W
Wenyu 已提交
17
<div align="center">
W
Wenyu 已提交
18
  <img src="https://github.com/PaddlePaddle/PaddleDetection/assets/17582080/196b0a10-d2e8-401c-9132-54b9126e0a33" width=500 />
W
Wenyu 已提交
19
</div>
W
Wenyu 已提交
20

21
## 基础模型
W
Wenyu 已提交
22

W
Wenyu 已提交
23
| Model | Epoch | Backbone  | Input shape | $AP^{val}$ | $AP^{val}_{50}$| Params(M) | FLOPs(G) |  T4 TensorRT FP16(FPS) | Pretrained Model | config |
W
Wenyu 已提交
24
|:--------------:|:-----:|:----------:| :-------:|:--------------------------:|:---------------------------:|:---------:|:--------:| :---------------------: |:------------------------------------------------------------------------------------:|:-------------------------------------------:|
W
Wenyu 已提交
25 26 27
| RT-DETR-R18 | 6x |  ResNet-18 | 640 | 46.5 | 63.8 | 20 | 60 | 217 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r18vd_dec3_6x_coco.pdparams) | [config](./rtdetr_r18vd_6x_coco.yml)
| RT-DETR-R34 | 6x |  ResNet-34 | 640 | 48.9 | 66.8 | 31 | 92 | 161 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r34vd_dec4_6x_coco.pdparams) | [config](./rtdetr_r34vd_6x_coco.yml)
| RT-DETR-R50-m | 6x |  ResNet-50 | 640 | 51.3 | 69.6 | 36 | 100 | 145 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_m_6x_coco.pdparams) | [config](./rtdetr_r50vd_m_6x_coco.yml)
W
Wenyu 已提交
28 29
| RT-DETR-R50 | 6x |  ResNet-50 | 640 | 53.1 | 71.3 | 42 | 136 | 108 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams) | [config](./rtdetr_r50vd_6x_coco.yml)
| RT-DETR-R101 | 6x |  ResNet-101 | 640 | 54.3 | 72.7 | 76 | 259 | 74 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r101vd_6x_coco.pdparams) | [config](./rtdetr_r101vd_6x_coco.yml)
W
Wenyu 已提交
30 31
| RT-DETR-L | 6x |  HGNetv2 | 640 | 53.0 | 71.6 | 32 | 110 | 114 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_hgnetv2_l_6x_coco.pdparams) | [config](rtdetr_hgnetv2_l_6x_coco.yml)
| RT-DETR-X | 6x |  HGNetv2 | 640 | 54.8 | 73.1 | 67 | 234 | 74 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_hgnetv2_x_6x_coco.pdparams) | [config](rtdetr_hgnetv2_x_6x_coco.yml)
W
Wenyu 已提交
32

33 34 35 36 37 38
## 高精度模型

| Model | Epoch | backbone  | input shape | $AP^{val}$ | $AP^{val}_{50}$ | Pretrained Model | config |
|:-----:|:-----:|:---------:| :---------:|:-----------:|:---------------:|:----------------:|:------:|
| RT-DETR-Swin | 3x |  Swin_L_384 | 640 | 56.2 | 73.5 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_swin_L_384_3x_coco.pdparams) | [config](./rtdetr_swin_L_384_3x_coco.yml)
| RT-DETR-FocalNet | 3x |  FocalNet_L_384  | 640 | 56.9 | 74.3 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_focalnet_L_384_3x_coco.pdparams) | [config](./rtdetr_focalnet_L_384_3x_coco.yml)
W
Wenyu 已提交
39

W
Wenyu 已提交
40 41 42 43 44 45 46 47 48 49 50 51

## Objects365预训练模型
| Model | Epoch | Dataset | Input shape | $AP^{val}$ | $AP^{val}_{50}$ | T4 TensorRT FP16(FPS) | Weight | Logs
|:---:|:---:|:---:| :---:|:---:|:---:|:---:|:---:|:---:|
RT-DETR-R50 | 1x | Objects365 | 640 | 35.1 | 46.2 | 108 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_1x_objects365.pdparams) | [log](https://github.com/lyuwenyu/RT-DETR/issues/8)
RT-DETR-R50 | 2x | COCO + Objects365 | 640 | 55.3 | 73.4 | 108 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_2x_coco_objects365.pdparams) | [log](https://github.com/lyuwenyu/RT-DETR/issues/8)

**Notes:**
- `COCO + Objects365` 代表使用Objects365预训练权重,在COCO上finetune的结果



W
Wenyu 已提交
52
**注意事项:**
53
- RT-DETR 基础模型均使用4个GPU训练。
W
Wenyu 已提交
54
- RT-DETR 在COCO train2017上训练,并在val2017上评估。
55
- 高精度模型RT-DETR-Swin和RT-DETR-FocalNet使用8个GPU训练,显存需求较高。
W
Wenyu 已提交
56 57 58 59 60 61

## 快速开始

<details open>
<summary>依赖包:</summary>

W
Wenyu 已提交
62
- PaddlePaddle >= 2.4.1
W
Wenyu 已提交
63 64 65 66 67

</details>

<details>
<summary>安装</summary>
W
Wenyu 已提交
68

W
Wenyu 已提交
69
- [安装指导文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL.md)
W
Wenyu 已提交
70

W
Wenyu 已提交
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
</details>

<details>
<summary>训练&评估</summary>

- 单卡GPU上训练:

```shell
# training on single-GPU
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml --eval
```

- 多卡GPU上训练:

```shell
# training on multi-GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3
W
Wenyu 已提交
89 90 91
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml --fleet --eval
```

W
Wenyu 已提交
92 93 94 95 96 97 98 99 100 101 102
- 评估:

```shell
python tools/eval.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
              -o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams
```

- 测试:

```shell
python tools/infer.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
W
Wenyu 已提交
103 104
              -o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams \
              --infer_img=./demo/000000570688.jpg
W
Wenyu 已提交
105 106 107 108 109 110 111 112 113
```

详情请参考[快速开始文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED.md).

</details>

## 部署

<details open>
W
Wenyu 已提交
114
<summary>1. 导出模型 </summary>
W
Wenyu 已提交
115 116 117 118 119 120 121 122 123 124 125

```shell
cd PaddleDetection
python tools/export_model.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
              -o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams trt=True \
              --output_dir=output_inference
```

</details>

<details>
W
Wenyu 已提交
126
<summary>2. 转换模型至ONNX </summary>
W
Wenyu 已提交
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143

- 安装[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) 和 ONNX

```shell
pip install onnx==1.13.0
pip install paddle2onnx==1.0.5
```

- 转换模型:

```shell
paddle2onnx --model_dir=./output_inference/rtdetr_r50vd_6x_coco/ \
            --model_filename model.pdmodel  \
            --params_filename model.pdiparams \
            --opset_version 16 \
            --save_file rtdetr_r50vd_6x_coco.onnx
```
W
Wenyu 已提交
144
</details>
W
Wenyu 已提交
145

W
Wenyu 已提交
146 147 148 149
<details>
<summary>3. 转换成TensorRT(可选) </summary>

- 确保TensorRT的版本>=8.5.1
W
Wenyu 已提交
150
- TRT推理可以参考[RT-DETR](https://github.com/lyuwenyu/RT-DETR)的部分代码或者其他网络资源
W
Wenyu 已提交
151 152 153 154 155 156 157 158 159 160

```shell
trtexec --onnx=./rtdetr_r50vd_6x_coco.onnx \
        --workspace=4096 \
        --shapes=image:1x3x640x640 \
        --saveEngine=rtdetr_r50vd_6x_coco.trt \
        --avgRuns=100 \
        --fp16
```

W
Wenyu 已提交
161 162 163
-
</details>

164 165 166 167 168 169 170 171 172 173 174 175 176
## 量化压缩

详细步骤请参考:[RT-DETR自动化量化压缩](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/deploy/auto_compression#rt-detr)

| 模型              | Base mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 |  TRT-INT8  |                           配置文件                           |                           量化模型                           |
| :---------------- | :------- | :--------: | :------: | :------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| RT-DETR-R50       | 53.1     |    53.0    | 32.05ms  |  9.12ms  | **6.96ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r50vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r50vd_6x_coco_quant.tar) |
| RT-DETR-R101      | 54.3     |    54.1    | 54.13ms  | 12.68ms  | **9.20ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r101vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r101vd_6x_coco_quant.tar) |
| RT-DETR-HGNetv2-L | 53.0     |    52.9    | 26.16ms  |  8.54ms  | **6.65ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_l_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_l_6x_coco_quant.tar) |
| RT-DETR-HGNetv2-X | 54.8     |    54.6    | 49.22ms  | 12.50ms  | **9.24ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_x_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_x_6x_coco_quant.tar) |

- 上表测试环境:Tesla T4,TensorRT 8.6.0,CUDA 11.7,batch_size=1。
- 也可直接参考:[PaddleSlim自动化压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/detection)
W
Wenyu 已提交
177 178 179 180 181 182 183 184 185 186 187 188

## 其他

<details>
<summary>1. 参数量和计算量统计 </summary>
可以使用以下代码片段实现参数量和计算量的统计

```
import paddle
from ppdet.core.workspace import load_config, merge_config
from ppdet.core.workspace import create

W
Wenyu 已提交
189
cfg_path = './configs/rtdetr/rtdetr_r50vd_6x_coco.yml'
W
Wenyu 已提交
190 191 192 193 194 195 196 197 198 199 200 201 202
cfg = load_config(cfg_path)
model = create(cfg.architecture)

blob = {
    'image': paddle.randn([1, 3, 640, 640]),
    'im_shape': paddle.to_tensor([[640], [640]]),
    'scale_factor': paddle.to_tensor([[1.], [1.]])
}
paddle.flops(model, None, blob, custom_ops=None, print_detail=False)
```
</details>


W
Wenyu 已提交
203
<details open>
W
Wenyu 已提交
204 205
<summary>2. YOLOs端到端速度测速 </summary>

W
Wenyu 已提交
206
- 可以参考[RT-DETR](https://github.com/lyuwenyu/RT-DETR) benchmark部分或者其他网络资源
W
Wenyu 已提交
207

W
Wenyu 已提交
208 209
</details>

W
Wenyu 已提交
210 211


W
Wenyu 已提交
212 213
## 引用RT-DETR
如果需要在你的研究中使用RT-DETR,请通过以下方式引用我们的论文:
W
Wenyu 已提交
214 215 216 217 218 219 220 221 222 223
```
@misc{lv2023detrs,
      title={DETRs Beat YOLOs on Real-time Object Detection},
      author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
      year={2023},
      eprint={2304.08069},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```