未验证 提交 c6c10032 编写于 作者: W wangxinxin08 提交者: GitHub

add ppyoloe_r (#7105)

* add ppyoloe_r

* modify code of ops.py

* add ppyoloe_r docs and modify rotate docs

* modify docs and refine connfigs

* fix some problems

* refine docs, add nms_rotated ext_op and fix some problems

* add image and inference_benchmark.py

* modify docs

* fix some problems

* modify code accroding to review

Co-authored-by: wangxinxin08 <>
上级 233b3641
metric: RBOX
num_classes: 15
TrainDataset:
!COCODataSet
image_dir: trainval1024/images
anno_path: trainval1024/DOTA_trainval1024.json
dataset_dir: dataset/dota_ms/
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly']
EvalDataset:
!COCODataSet
image_dir: trainval1024/images
anno_path: trainval1024/DOTA_trainval1024.json
dataset_dir: dataset/dota_ms/
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly']
TestDataset:
!ImageFolder
anno_path: test1024/DOTA_test1024.json
dataset_dir: dataset/dota_ms/
......@@ -16,7 +16,15 @@
| 模型 | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 |
|:---:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
| [S2ANet](./s2anet/README.md) | 73.84 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) |
| [FCOSR](./fcosr/README.md) | 76.62 | 3x | oc | - | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
| [FCOSR](./fcosr/README.md) | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
| [PP-YOLOE-R-s](./ppyoloe_r/README.md) | 73.82 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) |
| [PP-YOLOE-R-s](./ppyoloe_r/README.md) | 79.42 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) |
| [PP-YOLOE-R-m](./ppyoloe_r/README.md) | 77.64 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) |
| [PP-YOLOE-R-m](./ppyoloe_r/README.md) | 79.71 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) |
| [PP-YOLOE-R-l](./ppyoloe_r/README.md) | 78.14 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) |
| [PP-YOLOE-R-l](./ppyoloe_r/README.md) | 80.02 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) |
| [PP-YOLOE-R-x](./ppyoloe_r/README.md) | 78.28 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) |
| [PP-YOLOE-R-x](./ppyoloe_r/README.md) | 80.73 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) |
**注意:**
......
......@@ -15,7 +15,15 @@ Rotated object detection is used to detect rectangular bounding boxes with angle
| Model | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config |
|:---:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
| [S2ANet](./s2anet/README_en.md) | 73.84 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) |
| [FCOSR](./fcosr/README_en.md) | 76.62 | 3x | oc | - | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
| [FCOSR](./fcosr/README_en.md) | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
| [PP-YOLOE-R-s](./ppyoloe_r/README_en.md) | 73.82 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) |
| [PP-YOLOE-R-s](./ppyoloe_r/README_en.md) | 79.42 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) |
| [PP-YOLOE-R-m](./ppyoloe_r/README_en.md) | 77.64 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) |
| [PP-YOLOE-R-m](./ppyoloe_r/README_en.md) | 79.71 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) |
| [PP-YOLOE-R-l](./ppyoloe_r/README_en.md) | 78.14 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) |
| [PP-YOLOE-R-l](./ppyoloe_r/README_en.md) | 80.02 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) |
| [PP-YOLOE-R-x](./ppyoloe_r/README_en.md) | 78.28 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) |
| [PP-YOLOE-R-x](./ppyoloe_r/README_en.md) | 80.73 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) |
**Notes:**
......
......@@ -17,7 +17,7 @@
| 模型 | Backbone | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 |
|:---:|:--------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
| FCOSR-M | ResNeXt-50 | 76.62 | 3x | oc | - | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
| FCOSR-M | ResNeXt-50 | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
**注意:**
......
......@@ -17,7 +17,7 @@ English | [简体中文](README.md)
| Model | Backbone | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config |
|:---:|:--------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
| FCOSR-M | ResNeXt-50 | 76.62 | 3x | oc | - | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
| FCOSR-M | ResNeXt-50 | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
**Notes:**
......
简体中文 | [English](README_en.md)
# PP-YOLOE-R
## 内容
- [简介](#简介)
- [模型库](#模型库)
- [使用说明](#使用说明)
- [预测部署](#预测部署)
- [附录](#附录)
- [引用](#引用)
## 简介
PP-YOLOE-R是一个高效的单阶段Anchor-free旋转框检测模型。基于PP-YOLOE, PP-YOLOE-R以极少的参数量和计算量为代价,引入了一系列有用的设计来提升检测精度。在DOTA 1.0数据集上,PP-YOLOE-R-l和PP-YOLOE-R-x在单尺度训练和测试的情况下分别达到了78.14和78.27 mAP,这超越了几乎所有的旋转框检测模型。通过多尺度训练和测试,PP-YOLOE-R-l和PP-YOLOE-R-x的检测精度进一步提升至80.02和80.73 mAP。在这种情况下,PP-YOLOE-R-x超越了所有的anchor-free方法并且和最先进的anchor-based的两阶段模型精度几乎相当。此外,PP-YOLOE-R-s和PP-YOLOE-R-m通过多尺度训练和测试可以达到79.42和79.71 mAP。考虑到这两个模型的参数量和计算量,其性能也非常卓越。在保持高精度的同时,PP-YOLOE-R避免使用特殊的算子,例如Deformable Convolution或Rotated RoI Align,以使其能轻松地部署在多种多样的硬件上。在1024x1024的输入分辨率下,PP-YOLOE-R-s/m/l/x在RTX 2080 Ti上使用TensorRT FP16分别能达到69.8/55.1/48.3/37.1 FPS,在Tesla V100上分别能达到114.5/86.8/69.7/50.7 FPS。更多细节可以参考我们的技术报告。
<div align="center">
<img src="../../../docs/images/ppyoloe_r_map_fps.png" width=500 />
</div>
PP-YOLOE-R相较于PP-YOLOE做了以下几点改动:
- Rotated Task Alignment Learning
- 解耦的角度预测头
- 使用DFL进行角度预测
- 可学习的门控单元
- [ProbIoU损失函数](https://arxiv.org/abs/2106.06072)
## 模型库
| 模型 | Backbone | mAP | V100 TRT FP16 (FPS) | RTX 2080 Ti TRT FP16 (FPS) |学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 |
|:---:|:--------:|:----:|:--------------------:|:------------:|:--------------------:|:-----:|:--------:|:-------:|:------:|:-----------:|:------:|
| PP-YOLOE-R-s | CRN-s | 73.82 | 114.5 | 69.8 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) |
| PP-YOLOE-R-s | CRN-s | 79.42 | 114.5 | 69.8 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) |
| PP-YOLOE-R-m | CRN-m | 77.64 | 86.8 | 55.1 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) |
| PP-YOLOE-R-m | CRN-m | 79.71 | 86.8 | 55.1 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) |
| PP-YOLOE-R-l | CRN-l | 78.14 | 69.7 | 48.3 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) |
| PP-YOLOE-R-l | CRN-l | 80.02 | 69.7 | 48.3 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) |
| PP-YOLOE-R-x | CRN-x | 78.28 | 50.7 | 37.1 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) |
| PP-YOLOE-R-x | CRN-x | 80.73 | 50.7 | 37.1 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) |
**注意:**
- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)** 调整学习率。
- 模型库中的模型默认使用单尺度训练单尺度测试。如果数据增广一栏标明MS,意味着使用多尺度训练和多尺度测试。如果数据增广一栏标明RR,意味着使用RandomRotate数据增广进行训练。
- CRN表示在PP-YOLOE中提出的CSPRepResNet
- 速度测试使用TensorRT 8.2.3在DOTA测试集中测试2000张图片计算平均值得到。参考速度测试以复现[速度测试](#速度测试)
## 使用说明
参考[数据准备](../README.md#数据准备)准备数据。
### 训练
GPU单卡训练
``` bash
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml
```
GPU多卡训练
``` bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml
```
### 预测
执行以下命令预测单张图片,图片预测结果会默认保存在`output`文件夹下面
``` bash
python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5
```
### DOTA数据集评估
参考[DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), 评估DOTA数据集需要生成一个包含所有检测结果的zip文件,每一类的检测结果储存在一个txt文件中,txt文件中每行格式为:`image_name score x1 y1 x2 y2 x3 y3 x4 y4`。将生成的zip文件提交到[DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html)的Task1进行评估。你可以执行以下命令得到test数据集的预测结果:
``` bash
python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_ppyoloe_r --visualize=False --save_results=True
```
将预测结果处理成官网评估所需要的格式:
``` bash
python configs/rotate/tools/generate_result.py --pred_txt_dir=output_ppyoloe_r/ --output_dir=submit/ --data_type=dota10
zip -r submit.zip submit
```
### 速度测试
速度测试需要确保**TensorRT版本大于8.2, PaddlePaddle版本大于2.4.0rc0**。使用Paddle Inference且使用TensorRT进行测速,执行以下命令:
``` bash
# 导出模型
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True
# 速度测试
CUDA_VISIBLE_DEVICES=0 python configs/rotate/tools/inference_benchmark.py --model_dir output_inference/ppyoloe_r_crn_l_3x_dota/ --image_dir /path/to/dota/test/dir --run_mode trt_fp16
```
## 预测部署
**使用Paddle Inference但不使用TensorRT**进行部署,执行以下命令:
``` bash
# 导出模型
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams
# 预测图片
python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=paddle --device=gpu
```
**使用Paddle Inference且使用TensorRT**进行部署,执行以下命令:
```
# 导出模型
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True
# 预测图片
python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=trt_fp16 --device=gpu
```
**注意:**
- 使用Paddle-TRT使用确保PaddlePaddle版本大于2.4.0rc且TensorRT版本大于8.2.
## 附录
PP-YOLOE-R消融实验
| 模型 | mAP | 参数量(M) | FLOPs(G) |
| :-: | :-: | :------: | :------: |
| Baseline | 75.61 | 50.65 | 269.09 |
| +Rotated Task Alignment Learning | 77.24 | 50.65 | 269.09 |
| +Decoupled Angle Prediction Head | 77.78 | 52.20 | 272.72 |
| +Angle Prediction with DFL | 78.01 | 53.29 | 281.65 |
| +Learnable Gating Unit for RepVGG | 78.14 | 53.29 | 281.65 |
## 引用
```
@article{xu2022pp,
title={PP-YOLOE: An evolved version of YOLO},
author={Xu, Shangliang and Wang, Xinxin and Lv, Wenyu and Chang, Qinyao and Cui, Cheng and Deng, Kaipeng and Wang, Guanzhong and Dang, Qingqing and Wei, Shengyu and Du, Yuning and others},
journal={arXiv preprint arXiv:2203.16250},
year={2022}
}
@article{llerena2021gaussian,
title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection},
author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio},
journal={arXiv preprint arXiv:2106.06072},
year={2021}
}
```
English | [简体中文](README.md)
# PP-YOLOE-R
## Content
- [Introduction](#Introduction)
- [Model Zoo](#Model-Zoo)
- [Getting Start](#Getting-Start)
- [Deployment](#Deployment)
- [Appendix](#Appendix)
- [Citations](#Citations)
## Introduction
PP-YOLOE-R is an efficient anchor-free rotated object detector. Based on PP-YOLOE, PP-YOLOE-R introduces a bag of useful tricks to improve detection precision at the expense of marginal parameters and computations.PP-YOLOE-R-l and PP-YOLOE-R-x achieve 78.14 and 78.27 mAP respectively on DOTA 1.0 dataset with single-scale training and testing, which outperform almost all other rotated object detectors. With multi-scale training and testing, the detection precision of PP-YOLOE-R-l and PP-YOLOE-R-x is further improved to 80.02 and 80.73 mAP. In this case, PP-YOLOE-R-x surpasses all anchor-free methods and demonstrates competitive performance to state-of-the-art anchor-based two-stage model. Moreover, PP-YOLOE-R-s and PP-YOLOE-R-m can achieve 79.42 and 79.71 mAP with multi-scale training and testing, which is an excellent result considering the parameters and GLOPS of these two models. While maintaining high precision, PP-YOLOE-R avoids using special operators, such as Deformable Convolution or Rotated RoI Align, to be deployed friendly on various hardware. At the input resolution of 1024$\times$1024, PP-YOLOE-R-s/m/l/x can reach 69.8/55.1/48.3/37.1 FPS on RTX 2080 Ti and 114.5/86.8/69.7/50.7 FPS on Tesla V100 GPU with TensorRT and FP16-precision. For more details, please refer to our technical report.
<div align="center">
<img src="../../../docs/images/ppyoloe_r_map_fps.png" width=500 />
</div>
Compared with PP-YOLOE, PP-YOLOE-R has made the following changes:
- Rotated Task Alignment Learning
- Decoupled Angle Prediction Head
- Angle Prediction with DFL
- Learnable Gating Unit for RepVGG
- [ProbIoU Loss](https://arxiv.org/abs/2106.06072)
## Model Zoo
| Model | Backbone | mAP | V100 TRT FP16 (FPS) | RTX 2080 Ti TRT FP16 (FPS) | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config |
|:---:|:--------:|:----:|:--------------------:|:------------:|:--------------------:|:-----:|:--------:|:-------:|:------:|:-----------:|:------:|
| PP-YOLOE-R-s | CRN-s | 73.82 | 114.5 | 69.8 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) |
| PP-YOLOE-R-s | CRN-s | 79.42 | 114.5 | 69.8 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) |
| PP-YOLOE-R-m | CRN-m | 77.64 | 86.8 | 55.1 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) |
| PP-YOLOE-R-m | CRN-m | 79.71 | 86.8 | 55.1 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) |
| PP-YOLOE-R-l | CRN-l | 78.14 | 69.7 | 48.3 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) |
| PP-YOLOE-R-l | CRN-l | 80.02 | 69.7 | 48.3 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) |
| PP-YOLOE-R-x | CRN-x | 78.28 | 50.7 | 37.1 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) |
| PP-YOLOE-R-x | CRN-x | 80.73 | 50.7 | 37.1 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) |
**Notes:**
- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)**.
- Models in model zoo is trained and tested with single scale by default. If `MS` is indicated in the data augmentation column, it means that multi-scale training and multi-scale testing are used. If `RR` is indicated in the data augmentation column, it means that RandomRotate data augmentation is used for training.
- CRN denotes CSPRepResNet proposed in PP-YOLOE
- Speed ​​is calculated and averaged by testing 2000 images on the DOTA test dataset. Refer to [Speed testing](#Speed-testing) to reproduce the results.
## Getting Start
Refer to [Data-Preparation](../README_en.md#Data-Preparation) to prepare data.
### Training
Single GPU Training
``` bash
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml
```
Multiple GPUs Training
``` bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml
```
### Inference
Run the follow command to infer single image, the result of inference will be saved in `output` directory by default.
``` bash
python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5
```
### Evaluation on DOTA Dataset
Refering to [DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), You need to submit a zip file containing results for all test images for evaluation. The detection results of each category are stored in a txt file, each line of which is in the following format
`image_id score x1 y1 x2 y2 x3 y3 x4 y4`. To evaluate, you should submit the generated zip file to the Task1 of [DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html). You can run the following command to get the inference results of test dataset:
``` bash
python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_ppyoloe_r --visualize=False --save_results=True
```
Process the prediction results into the format required for the official website evaluation:
``` bash
python configs/rotate/tools/generate_result.py --pred_txt_dir=output_ppyoloe_r/ --output_dir=submit/ --data_type=dota10
zip -r submit.zip submit
```
### Speed testing
To test speed, make sure that **the version of TensorRT is larger than 8.2 and the version of PaddlePaddle is larger than 2.4.0rc**. Using Paddle Inference with TensorRT to test speed, run following command
``` bash
# export inference model
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True
# speed testing
CUDA_VISIBLE_DEVICES=0 python configs/rotate/tools/inference_benchmark.py --model_dir output_inference/ppyoloe_r_crn_l_3x_dota/ --image_dir /path/to/dota/test/dir --run_mode trt_fp16
```
## Deployment
**Using Paddle Inference without TensorRT** to for deployment, run following command
``` bash
# export inference model
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams
# inference single image
python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=paddle --device=gpu
```
**Using Paddle Inference with TensorRT** to for deployment, run following command
``` bash
# export inference model
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True
# inference single image
python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=trt_fp16 --device=gpu
```
## Appendix
Ablation experiments of PP-YOLOE-R
| Model | mAP | Params(M) | FLOPs(G) |
| :-: | :-: | :------: | :------: |
| Baseline | 75.61 | 50.65 | 269.09 |
| +Rotated Task Alignment Learning | 77.24 | 50.65 | 269.09 |
| +Decoupled Angle Prediction Head | 77.78 | 52.20 | 272.72 |
| +Angle Prediction with DFL | 78.01 | 53.29 | 281.65 |
| +Learnable Gating Unit for RepVGG | 78.14 | 53.29 | 281.65 |
## Citations
```
@article{xu2022pp,
title={PP-YOLOE: An evolved version of YOLO},
author={Xu, Shangliang and Wang, Xinxin and Lv, Wenyu and Chang, Qinyao and Cui, Cheng and Deng, Kaipeng and Wang, Guanzhong and Dang, Qingqing and Wei, Shengyu and Du, Yuning and others},
journal={arXiv preprint arXiv:2203.16250},
year={2022}
}
@article{llerena2021gaussian,
title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection},
author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio},
journal={arXiv preprint arXiv:2106.06072},
year={2021}
}
```
epoch: 36
LearningRate:
base_lr: 0.008
schedulers:
- !CosineDecay
max_epochs: 44
- !LinearWarmup
start_factor: 0.
steps: 1000
OptimizerBuilder:
clip_grad_by_norm: 35.
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
architecture: YOLOv3
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
YOLOv3:
backbone: CSPResNet
neck: CustomCSPPAN
yolo_head: PPYOLOERHead
post_process: ~
CSPResNet:
layers: [3, 6, 6, 3]
channels: [64, 128, 256, 512, 1024]
return_idx: [1, 2, 3]
use_large_stem: True
use_alpha: True
CustomCSPPAN:
out_channels: [768, 384, 192]
stage_num: 1
block_num: 3
act: 'swish'
spp: true
use_alpha: True
PPYOLOERHead:
fpn_strides: [32, 16, 8]
grid_cell_offset: 0.5
use_varifocal_loss: true
static_assigner_epoch: -1
loss_weight: {class: 1.0, iou: 2.5, dfl: 0.05}
static_assigner:
name: FCOSRAssigner
factor: 12
threshold: 0.23
boundary: [[512, 10000], [256, 512], [-1, 256]]
assigner:
name: RotatedTaskAlignedAssigner
topk: 13
alpha: 1.0
beta: 6.0
nms:
name: MultiClassNMS
nms_top_k: 2000
keep_top_k: -1
score_threshold: 0.1
nms_threshold: 0.1
normalized: False
worker_num: 4
image_height: &image_height 1024
image_width: &image_width 1024
image_size: &image_size [*image_height, *image_width]
TrainReader:
sample_transforms:
- Decode: {}
- Poly2Array: {}
- RandomRFlip: {}
- RandomRRotate: {angle_mode: 'value', angle: [0, 90, 180, -90]}
- RandomRRotate: {angle_mode: 'value', angle: [30, 60], rotate_prob: 0.5}
- RResize: {target_size: *image_size, keep_ratio: True, interp: 2}
- Poly2RBox: {filter_threshold: 2, filter_mode: 'edge', rbox_type: 'oc'}
batch_transforms:
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
- PadRGT: {}
- PadBatch: {pad_to_stride: 32}
batch_size: 2
shuffle: true
drop_last: true
use_shared_memory: true
collate_batch: true
EvalReader:
sample_transforms:
- Decode: {}
- Poly2Array: {}
- RResize: {target_size: *image_size, keep_ratio: True, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 2
TestReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: *image_size, keep_ratio: True, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 8
_BASE_: [
'../../datasets/dota.yml',
'../../runtime.yml',
'_base_/optimizer_3x.yml',
'_base_/ppyoloe_r_reader.yml',
'_base_/ppyoloe_r_crn.yml'
]
log_iter: 50
snapshot_epoch: 1
weights: output/ppyoloe_r_crn_l_3x_dota/model_final
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams
depth_mult: 1.0
width_mult: 1.0
_BASE_: [
'../../datasets/dota_ms.yml',
'../../runtime.yml',
'_base_/optimizer_3x.yml',
'_base_/ppyoloe_r_reader.yml',
'_base_/ppyoloe_r_crn.yml'
]
log_iter: 50
snapshot_epoch: 1
weights: output/ppyoloe_r_crn_l_3x_dota/model_final
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams
depth_mult: 1.0
width_mult: 1.0
_BASE_: [
'../../datasets/dota.yml',
'../../runtime.yml',
'_base_/optimizer_3x.yml',
'_base_/ppyoloe_r_reader.yml',
'_base_/ppyoloe_r_crn.yml'
]
log_iter: 50
snapshot_epoch: 1
weights: output/ppyoloe_r_crn_m_3x_dota/model_final
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_m_pretrained.pdparams
depth_mult: 0.67
width_mult: 0.75
_BASE_: [
'../../datasets/dota_ms.yml',
'../../runtime.yml',
'_base_/optimizer_3x.yml',
'_base_/ppyoloe_r_reader.yml',
'_base_/ppyoloe_r_crn.yml'
]
log_iter: 50
snapshot_epoch: 1
weights: output/ppyoloe_r_crn_m_3x_dota/model_final
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_m_pretrained.pdparams
depth_mult: 0.67
width_mult: 0.75
_BASE_: [
'../../datasets/dota.yml',
'../../runtime.yml',
'_base_/optimizer_3x.yml',
'_base_/ppyoloe_r_reader.yml',
'_base_/ppyoloe_r_crn.yml'
]
log_iter: 50
snapshot_epoch: 1
weights: output/ppyoloe_r_crn_s_3x_dota/model_final
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams
depth_mult: 0.33
width_mult: 0.50
_BASE_: [
'../../datasets/dota_ms.yml',
'../../runtime.yml',
'_base_/optimizer_3x.yml',
'_base_/ppyoloe_r_reader.yml',
'_base_/ppyoloe_r_crn.yml'
]
log_iter: 50
snapshot_epoch: 1
weights: output/ppyoloe_r_crn_s_3x_dota/model_final
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams
depth_mult: 0.33
width_mult: 0.50
_BASE_: [
'../../datasets/dota.yml',
'../../runtime.yml',
'_base_/optimizer_3x.yml',
'_base_/ppyoloe_r_reader.yml',
'_base_/ppyoloe_r_crn.yml'
]
log_iter: 50
snapshot_epoch: 1
weights: output/ppyoloe_r_crn_x_3x_dota/model_final
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_x_pretrained.pdparams
depth_mult: 1.33
width_mult: 1.25
_BASE_: [
'../../datasets/dota_ms.yml',
'../../runtime.yml',
'_base_/optimizer_3x.yml',
'_base_/ppyoloe_r_reader.yml',
'_base_/ppyoloe_r_crn.yml'
]
log_iter: 50
snapshot_epoch: 1
weights: output/ppyoloe_r_crn_x_3x_dota/model_final
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_x_pretrained.pdparams
depth_mult: 1.33
width_mult: 1.25
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import six
import glob
import time
import yaml
import argparse
import cv2
import numpy as np
import paddle
import paddle.version as paddle_version
from paddle.inference import Config, create_predictor, PrecisionType, get_trt_runtime_version
TUNED_TRT_DYNAMIC_MODELS = {'DETR'}
def check_version(version='2.2'):
err = "PaddlePaddle version {} or higher is required, " \
"or a suitable develop version is satisfied as well. \n" \
"Please make sure the version is good with your code.".format(version)
version_installed = [
paddle_version.major, paddle_version.minor, paddle_version.patch,
paddle_version.rc
]
if version_installed == ['0', '0', '0', '0']:
return
version_split = version.split('.')
length = min(len(version_installed), len(version_split))
for i in six.moves.range(length):
if version_installed[i] > version_split[i]:
return
if version_installed[i] < version_split[i]:
raise Exception(err)
def check_trt_version(version='8.2'):
err = "TensorRT version {} or higher is required," \
"Please make sure the version is good with your code.".format(version)
version_split = list(map(int, version.split('.')))
version_installed = get_trt_runtime_version()
length = min(len(version_installed), len(version_split))
for i in six.moves.range(length):
if version_installed[i] > version_split[i]:
return
if version_installed[i] < version_split[i]:
raise Exception(err)
# preprocess ops
def decode_image(im_file, im_info):
if isinstance(im_file, str):
with open(im_file, 'rb') as f:
im_read = f.read()
data = np.frombuffer(im_read, dtype='uint8')
im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
else:
im = im_file
im_info['im_shape'] = np.array(im.shape[:2], dtype=np.float32)
im_info['scale_factor'] = np.array([1., 1.], dtype=np.float32)
return im, im_info
class Resize(object):
def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR):
if isinstance(target_size, int):
target_size = [target_size, target_size]
self.target_size = target_size
self.keep_ratio = keep_ratio
self.interp = interp
def __call__(self, im, im_info):
assert len(self.target_size) == 2
assert self.target_size[0] > 0 and self.target_size[1] > 0
im_channel = im.shape[2]
im_scale_y, im_scale_x = self.generate_scale(im)
im = cv2.resize(
im,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=self.interp)
im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
im_info['scale_factor'] = np.array(
[im_scale_y, im_scale_x]).astype('float32')
return im, im_info
def generate_scale(self, im):
origin_shape = im.shape[:2]
im_c = im.shape[2]
if self.keep_ratio:
im_size_min = np.min(origin_shape)
im_size_max = np.max(origin_shape)
target_size_min = np.min(self.target_size)
target_size_max = np.max(self.target_size)
im_scale = float(target_size_min) / float(im_size_min)
if np.round(im_scale * im_size_max) > target_size_max:
im_scale = float(target_size_max) / float(im_size_max)
im_scale_x = im_scale
im_scale_y = im_scale
else:
resize_h, resize_w = self.target_size
im_scale_y = resize_h / float(origin_shape[0])
im_scale_x = resize_w / float(origin_shape[1])
return im_scale_y, im_scale_x
class Permute(object):
def __init__(self, ):
super(Permute, self).__init__()
def __call__(self, im, im_info):
im = im.transpose((2, 0, 1))
return im, im_info
class NormalizeImage(object):
def __init__(self, mean, std, is_scale=True, norm_type='mean_std'):
self.mean = mean
self.std = std
self.is_scale = is_scale
self.norm_type = norm_type
def __call__(self, im, im_info):
im = im.astype(np.float32, copy=False)
if self.is_scale:
scale = 1.0 / 255.0
im *= scale
if self.norm_type == 'mean_std':
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
im -= mean
im /= std
return im, im_info
class PadStride(object):
def __init__(self, stride=0):
self.coarsest_stride = stride
def __call__(self, im, im_info):
coarsest_stride = self.coarsest_stride
if coarsest_stride <= 0:
return im, im_info
im_c, im_h, im_w = im.shape
pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride)
pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride)
padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32)
padding_im[:, :im_h, :im_w] = im
return padding_im, im_info
def preprocess(im, preprocess_ops):
# process image by preprocess_ops
im_info = {
'scale_factor': np.array(
[1., 1.], dtype=np.float32),
'im_shape': None,
}
im, im_info = decode_image(im, im_info)
for operator in preprocess_ops:
im, im_info = operator(im, im_info)
return im, im_info
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--model_dir', type=str, help='directory of inference model')
parser.add_argument('--run_mode', type=str, default='paddle', help='running mode')
parser.add_argument('--batch_size', type=int, default=1, help='batch size')
parser.add_argument('--image_dir', type=str, default='/paddle/data/DOTA_1024_ss/test1024/images', help='directory of test images')
parser.add_argument('--warmup_iter', type=int, default=5, help='num of warmup iters')
parser.add_argument('--total_iter', type=int, default=2000, help='num of total iters')
parser.add_argument('--log_iter', type=int, default=50, help='num of log interval')
parser.add_argument('--tuned_trt_shape_file', type=str, default='shape_range_info.pbtxt', help='dynamic shape range info')
args = parser.parse_args()
return args
def init_predictor(FLAGS):
model_dir, run_mode, batch_size = FLAGS.model_dir, FLAGS.run_mode, FLAGS.batch_size
yaml_file = os.path.join(model_dir, 'infer_cfg.yml')
with open(yaml_file) as f:
yml_conf = yaml.safe_load(f)
config = Config(
os.path.join(model_dir, 'model.pdmodel'),
os.path.join(model_dir, 'model.pdiparams'))
# initial GPU memory(M), device ID
config.enable_use_gpu(200, 0)
# optimize graph and fuse op
config.switch_ir_optim(True)
precision_map = {
'trt_int8': Config.Precision.Int8,
'trt_fp32': Config.Precision.Float32,
'trt_fp16': Config.Precision.Half
}
arch = yml_conf['arch']
tuned_trt_shape_file = os.path.join(model_dir, FLAGS.tuned_trt_shape_file)
if run_mode in precision_map.keys():
if arch in TUNED_TRT_DYNAMIC_MODELS and not os.path.exists(tuned_trt_shape_file):
print('dynamic shape range info is saved in {}. After that, rerun the code'.format(tuned_trt_shape_file))
config.collect_shape_range_info(tuned_trt_shape_file)
config.enable_tensorrt_engine(
workspace_size=(1 << 25) * batch_size,
max_batch_size=batch_size,
min_subgraph_size=yml_conf['min_subgraph_size'],
precision_mode=precision_map[run_mode],
use_static=True,
use_calib_mode=False)
if yml_conf['use_dynamic_shape']:
if arch in TUNED_TRT_DYNAMIC_MODELS and os.path.exists(tuned_trt_shape_file):
config.enable_tuned_tensorrt_dynamic_shape(tuned_trt_shape_file, True)
else:
min_input_shape = {
'image': [batch_size, 3, 640, 640],
'scale_factor': [batch_size, 2]
}
max_input_shape = {
'image': [batch_size, 3, 1280, 1280],
'scale_factor': [batch_size, 2]
}
opt_input_shape = {
'image': [batch_size, 3, 1024, 1024],
'scale_factor': [batch_size, 2]
}
config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape,
opt_input_shape)
# disable print log when predict
config.disable_glog_info()
# enable shared memory
config.enable_memory_optim()
# disable feed, fetch OP, needed by zero_copy_run
config.switch_use_feed_fetch_ops(False)
predictor = create_predictor(config)
return predictor, yml_conf
def create_preprocess_ops(yml_conf):
preprocess_ops = []
for op_info in yml_conf['Preprocess']:
new_op_info = op_info.copy()
op_type = new_op_info.pop('type')
preprocess_ops.append(eval(op_type)(**new_op_info))
return preprocess_ops
def get_test_images(image_dir):
images = set()
infer_dir = os.path.abspath(image_dir)
exts = ['jpg', 'jpeg', 'png', 'bmp']
exts += [ext.upper() for ext in exts]
for ext in exts:
images.update(glob.glob('{}/*.{}'.format(infer_dir, ext)))
images = list(images)
return images
def create_inputs(image_files, preprocess_ops):
inputs = dict()
im_list, im_info_list = [], []
for im_path in image_files:
im, im_info = preprocess(im_path, preprocess_ops)
im_list.append(im)
im_info_list.append(im_info)
inputs['im_shape'] = np.stack([e['im_shape'] for e in im_info_list], axis=0).astype('float32')
inputs['scale_factor'] = np.stack([e['scale_factor'] for e in im_info_list], axis=0).astype('float32')
inputs['image'] = np.stack(im_list, axis=0).astype('float32')
return inputs
def measure_speed(FLAGS):
predictor, yml_conf = init_predictor(FLAGS)
input_names = predictor.get_input_names()
preprocess_ops = create_preprocess_ops(yml_conf)
image_files = get_test_images(FLAGS.image_dir)
batch_size = FLAGS.batch_size
warmup_iter, log_iter, total_iter = FLAGS.warmup_iter, FLAGS.log_iter, FLAGS.total_iter
total_time = 0
fps = 0
for i in range(0, total_iter, batch_size):
# make data ready
inputs = create_inputs(image_files[i:i + batch_size], preprocess_ops)
for name in input_names:
input_tensor = predictor.get_input_handle(name)
input_tensor.copy_from_cpu(inputs[name])
paddle.device.cuda.synchronize()
# start running
start_time = time.perf_counter()
predictor.run()
paddle.device.cuda.synchronize()
if i >= warmup_iter:
total_time += time.perf_counter() - start_time
if (i + 1) % log_iter == 0:
fps = (i + 1 - warmup_iter) / total_time
print(
f'Done image [{i + 1:<3}/ {total_iter}], '
f'fps: {fps:.1f} img / s, '
f'times per image: {1000 / fps:.1f} ms / img',
flush=True)
if (i + 1) == total_iter:
fps = (i + 1 - warmup_iter) / total_time
print(
f'Overall fps: {fps:.1f} img / s, '
f'times per image: {1000 / fps:.1f} ms / img',
flush=True)
break
if __name__ == '__main__':
FLAGS = parse_args()
check_version('2.4')
check_trt_version('8.2')
measure_speed(FLAGS)
......@@ -110,9 +110,7 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
## 旋转框检测
### S2ANet
请参考[S2ANet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/)
[旋转框检测模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate)
## 关键点检测
......
......@@ -107,12 +107,9 @@ Please refer to[YOLOv6](https://github.com/nemonameless/PaddleDetection_YOLOSeri
Please refer to[YOLOv7](https://github.com/nemonameless/PaddleDetection_YOLOSeries/tree/develop/configs/yolov7)
## Rotating frame detection
### S2ANet
Please refer to[S2ANet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/)
## Rotated Object detection
[Model Zoo for Rotated Object Detection](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate)
## KeyPoint Detection
......
......@@ -13,10 +13,10 @@
// limitations under the License.
//
// The code is based on
// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated
// https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/csrc/box_iou_rotated/
#include "../rbox_iou/rbox_iou_utils.h"
#include "paddle/extension.h"
#include "rbox_iou_op.h"
template <typename T>
void matched_rbox_iou_cpu_kernel(const int rbox_num, const T *rbox1_data_ptr,
......@@ -30,29 +30,30 @@ void matched_rbox_iou_cpu_kernel(const int rbox_num, const T *rbox1_data_ptr,
}
#define CHECK_INPUT_CPU(x) \
PD_CHECK(x.place() == paddle::PlaceType::kCPU, #x " must be a CPU Tensor.")
PD_CHECK(x.is_cpu(), #x " must be a CPU Tensor.")
std::vector<paddle::Tensor> MatchedRboxIouCPUForward(const paddle::Tensor &rbox1,
std::vector<paddle::Tensor>
MatchedRboxIouCPUForward(const paddle::Tensor &rbox1,
const paddle::Tensor &rbox2) {
CHECK_INPUT_CPU(rbox1);
CHECK_INPUT_CPU(rbox2);
PD_CHECK(rbox1.shape()[0] == rbox2.shape()[0], "inputs must be same dim");
auto rbox_num = rbox1.shape()[0];
auto output = paddle::Tensor(paddle::PlaceType::kCPU, {rbox_num});
auto output = paddle::empty({rbox_num}, rbox1.dtype(), paddle::CPUPlace());
PD_DISPATCH_FLOATING_TYPES(rbox1.type(), "rotated_iou_cpu_kernel", ([&] {
PD_DISPATCH_FLOATING_TYPES(rbox1.type(), "matched_rbox_iou_cpu_kernel", ([&] {
matched_rbox_iou_cpu_kernel<data_t>(
rbox_num, rbox1.data<data_t>(),
rbox2.data<data_t>(),
output.mutable_data<data_t>());
rbox2.data<data_t>(), output.data<data_t>());
}));
return {output};
}
#ifdef PADDLE_WITH_CUDA
std::vector<paddle::Tensor> MatchedRboxIouCUDAForward(const paddle::Tensor &rbox1,
std::vector<paddle::Tensor>
MatchedRboxIouCUDAForward(const paddle::Tensor &rbox1,
const paddle::Tensor &rbox2);
#endif
......@@ -62,10 +63,10 @@ std::vector<paddle::Tensor> MatchedRboxIouCUDAForward(const paddle::Tensor &rbox
std::vector<paddle::Tensor> MatchedRboxIouForward(const paddle::Tensor &rbox1,
const paddle::Tensor &rbox2) {
CHECK_INPUT_SAME(rbox1, rbox2);
if (rbox1.place() == paddle::PlaceType::kCPU) {
if (rbox1.is_cpu()) {
return MatchedRboxIouCPUForward(rbox1, rbox2);
#ifdef PADDLE_WITH_CUDA
} else if (rbox1.place() == paddle::PlaceType::kGPU) {
} else if (rbox1.is_gpu()) {
return MatchedRboxIouCUDAForward(rbox1, rbox2);
#endif
}
......
......@@ -13,16 +13,10 @@
// limitations under the License.
//
// The code is based on
// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated
// https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/csrc/box_iou_rotated/
#include "../rbox_iou/rbox_iou_utils.h"
#include "paddle/extension.h"
#include "rbox_iou_op.h"
/**
Computes ceil(a / b)
*/
static inline int CeilDiv(const int a, const int b) { return (a + b - 1) / b; }
template <typename T>
__global__ void
......@@ -36,9 +30,10 @@ matched_rbox_iou_cuda_kernel(const int rbox_num, const T *rbox1_data_ptr,
}
#define CHECK_INPUT_GPU(x) \
PD_CHECK(x.place() == paddle::PlaceType::kGPU, #x " must be a GPU Tensor.")
PD_CHECK(x.is_gpu(), #x " must be a GPU Tensor.")
std::vector<paddle::Tensor> MatchedRboxIouCUDAForward(const paddle::Tensor &rbox1,
std::vector<paddle::Tensor>
MatchedRboxIouCUDAForward(const paddle::Tensor &rbox1,
const paddle::Tensor &rbox2) {
CHECK_INPUT_GPU(rbox1);
CHECK_INPUT_GPU(rbox2);
......@@ -46,7 +41,7 @@ std::vector<paddle::Tensor> MatchedRboxIouCUDAForward(const paddle::Tensor &rbox
auto rbox_num = rbox1.shape()[0];
auto output = paddle::Tensor(paddle::PlaceType::kGPU, {rbox_num});
auto output = paddle::empty({rbox_num}, rbox1.dtype(), paddle::GPUPlace());
const int thread_per_block = 512;
const int block_per_grid = CeilDiv(rbox_num, thread_per_block);
......@@ -56,7 +51,7 @@ std::vector<paddle::Tensor> MatchedRboxIouCUDAForward(const paddle::Tensor &rbox
matched_rbox_iou_cuda_kernel<
data_t><<<block_per_grid, thread_per_block, 0, rbox1.stream()>>>(
rbox_num, rbox1.data<data_t>(), rbox2.data<data_t>(),
output.mutable_data<data_t>());
output.data<data_t>());
}));
return {output};
......
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "../rbox_iou/rbox_iou_utils.h"
#include "paddle/extension.h"
template <typename T>
void nms_rotated_cpu_kernel(const T *boxes_data, const float threshold,
const int64_t num_boxes, int64_t *num_keep_boxes,
int64_t *output_data) {
int num_masks = CeilDiv(num_boxes, 64);
std::vector<int64_t> masks(num_masks, 0);
for (int64_t i = 0; i < num_boxes; ++i) {
if (masks[i / 64] & 1ULL << (i % 64))
continue;
T box_1[5];
for (int k = 0; k < 5; ++k) {
box_1[k] = boxes_data[i * 5 + k];
}
for (int64_t j = i + 1; j < num_boxes; ++j) {
if (masks[j / 64] & 1ULL << (j % 64))
continue;
T box_2[5];
for (int k = 0; k < 5; ++k) {
box_2[k] = boxes_data[j * 5 + k];
}
if (rbox_iou_single<T>(box_1, box_2) > threshold) {
masks[j / 64] |= 1ULL << (j % 64);
}
}
}
int64_t output_data_idx = 0;
for (int64_t i = 0; i < num_boxes; ++i) {
if (masks[i / 64] & 1ULL << (i % 64))
continue;
output_data[output_data_idx++] = i;
}
*num_keep_boxes = output_data_idx;
for (; output_data_idx < num_boxes; ++output_data_idx) {
output_data[output_data_idx] = 0;
}
}
#define CHECK_INPUT_CPU(x) \
PD_CHECK(x.is_cpu(), #x " must be a CPU Tensor.")
std::vector<paddle::Tensor> NMSRotatedCPUForward(const paddle::Tensor &boxes,
const paddle::Tensor &scores,
float threshold) {
CHECK_INPUT_CPU(boxes);
CHECK_INPUT_CPU(scores);
auto num_boxes = boxes.shape()[0];
auto order_t =
std::get<1>(paddle::argsort(scores, /* axis=*/0, /* descending=*/true));
auto boxes_sorted = paddle::gather(boxes, order_t, /* axis=*/0);
auto keep =
paddle::empty({num_boxes}, paddle::DataType::INT64, paddle::CPUPlace());
int64_t num_keep_boxes = 0;
PD_DISPATCH_FLOATING_TYPES(boxes.type(), "nms_rotated_cpu_kernel", ([&] {
nms_rotated_cpu_kernel<data_t>(
boxes_sorted.data<data_t>(), threshold,
num_boxes, &num_keep_boxes,
keep.data<int64_t>());
}));
keep = keep.slice(0, num_keep_boxes);
return {paddle::gather(order_t, keep, /* axis=*/0)};
}
#ifdef PADDLE_WITH_CUDA
std::vector<paddle::Tensor> NMSRotatedCUDAForward(const paddle::Tensor &boxes,
const paddle::Tensor &scores,
float threshold);
#endif
std::vector<paddle::Tensor> NMSRotatedForward(const paddle::Tensor &boxes,
const paddle::Tensor &scores,
float threshold) {
if (boxes.is_cpu()) {
return NMSRotatedCPUForward(boxes, scores, threshold);
#ifdef PADDLE_WITH_CUDA
} else if (boxes.is_gpu()) {
return NMSRotatedCUDAForward(boxes, scores, threshold);
#endif
}
}
std::vector<std::vector<int64_t>>
NMSRotatedInferShape(std::vector<int64_t> boxes_shape,
std::vector<int64_t> scores_shape) {
return {{-1}};
}
std::vector<paddle::DataType> NMSRotatedInferDtype(paddle::DataType t1,
paddle::DataType t2) {
return {paddle::DataType::INT64};
}
PD_BUILD_OP(nms_rotated)
.Inputs({"Boxes", "Scores"})
.Outputs({"Output"})
.Attrs({"threshold: float"})
.SetKernelFn(PD_KERNEL(NMSRotatedForward))
.SetInferShapeFn(PD_INFER_SHAPE(NMSRotatedInferShape))
.SetInferDtypeFn(PD_INFER_DTYPE(NMSRotatedInferDtype));
\ No newline at end of file
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "../rbox_iou/rbox_iou_utils.h"
#include "paddle/extension.h"
static const int64_t threadsPerBlock = sizeof(int64_t) * 8;
template <typename T>
__global__ void
nms_rotated_cuda_kernel(const T *boxes_data, const float threshold,
const int64_t num_boxes, int64_t *masks) {
auto raw_start = blockIdx.y;
auto col_start = blockIdx.x;
if (raw_start > col_start)
return;
const int raw_last_storage =
min(num_boxes - raw_start * threadsPerBlock, threadsPerBlock);
const int col_last_storage =
min(num_boxes - col_start * threadsPerBlock, threadsPerBlock);
if (threadIdx.x < raw_last_storage) {
int64_t mask = 0;
auto current_box_idx = raw_start * threadsPerBlock + threadIdx.x;
const T *current_box = boxes_data + current_box_idx * 5;
for (int i = 0; i < col_last_storage; ++i) {
const T *target_box = boxes_data + (col_start * threadsPerBlock + i) * 5;
if (rbox_iou_single<T>(current_box, target_box) > threshold) {
mask |= 1ULL << i;
}
}
const int blocks_per_line = CeilDiv(num_boxes, threadsPerBlock);
masks[current_box_idx * blocks_per_line + col_start] = mask;
}
}
#define CHECK_INPUT_GPU(x) \
PD_CHECK(x.is_gpu(), #x " must be a GPU Tensor.")
std::vector<paddle::Tensor> NMSRotatedCUDAForward(const paddle::Tensor &boxes,
const paddle::Tensor &scores,
float threshold) {
CHECK_INPUT_GPU(boxes);
CHECK_INPUT_GPU(scores);
auto num_boxes = boxes.shape()[0];
auto order_t =
std::get<1>(paddle::argsort(scores, /* axis=*/0, /* descending=*/true));
auto boxes_sorted = paddle::gather(boxes, order_t, /* axis=*/0);
const auto blocks_per_line = CeilDiv(num_boxes, threadsPerBlock);
dim3 block(threadsPerBlock);
dim3 grid(blocks_per_line, blocks_per_line);
auto mask_dev = paddle::empty({num_boxes * blocks_per_line},
paddle::DataType::INT64, paddle::GPUPlace());
PD_DISPATCH_FLOATING_TYPES(
boxes.type(), "nms_rotated_cuda_kernel", ([&] {
nms_rotated_cuda_kernel<data_t><<<grid, block, 0, boxes.stream()>>>(
boxes_sorted.data<data_t>(), threshold, num_boxes,
mask_dev.data<int64_t>());
}));
auto mask_host = mask_dev.copy_to(paddle::CPUPlace(), true);
auto keep_host =
paddle::empty({num_boxes}, paddle::DataType::INT64, paddle::CPUPlace());
int64_t *keep_host_ptr = keep_host.data<int64_t>();
int64_t *mask_host_ptr = mask_host.data<int64_t>();
std::vector<int64_t> remv(blocks_per_line);
int64_t last_box_num = 0;
for (int64_t i = 0; i < num_boxes; ++i) {
auto remv_element_id = i / threadsPerBlock;
auto remv_bit_id = i % threadsPerBlock;
if (!(remv[remv_element_id] & 1ULL << remv_bit_id)) {
keep_host_ptr[last_box_num++] = i;
int64_t *current_mask = mask_host_ptr + i * blocks_per_line;
for (auto j = remv_element_id; j < blocks_per_line; ++j) {
remv[j] |= current_mask[j];
}
}
}
keep_host = keep_host.slice(0, last_box_num);
auto keep_dev = keep_host.copy_to(paddle::GPUPlace(), true);
return {paddle::gather(order_t, keep_dev, /* axis=*/0)};
}
\ No newline at end of file
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
// The code is based on
// https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/csrc/box_iou_rotated/
#include "paddle/extension.h"
#include "rbox_iou_utils.h"
template <typename T>
void rbox_iou_cpu_kernel(const int rbox1_num, const int rbox2_num,
const T *rbox1_data_ptr, const T *rbox2_data_ptr,
T *output_data_ptr) {
int i, j;
for (i = 0; i < rbox1_num; i++) {
for (j = 0; j < rbox2_num; j++) {
int offset = i * rbox2_num + j;
output_data_ptr[offset] =
rbox_iou_single<T>(rbox1_data_ptr + i * 5, rbox2_data_ptr + j * 5);
}
}
}
#define CHECK_INPUT_CPU(x) \
PD_CHECK(x.is_cpu(), #x " must be a CPU Tensor.")
std::vector<paddle::Tensor> RboxIouCPUForward(const paddle::Tensor &rbox1,
const paddle::Tensor &rbox2) {
CHECK_INPUT_CPU(rbox1);
CHECK_INPUT_CPU(rbox2);
auto rbox1_num = rbox1.shape()[0];
auto rbox2_num = rbox2.shape()[0];
auto output =
paddle::empty({rbox1_num, rbox2_num}, rbox1.dtype(), paddle::CPUPlace());
PD_DISPATCH_FLOATING_TYPES(rbox1.type(), "rbox_iou_cpu_kernel", ([&] {
rbox_iou_cpu_kernel<data_t>(
rbox1_num, rbox2_num, rbox1.data<data_t>(),
rbox2.data<data_t>(), output.data<data_t>());
}));
return {output};
}
#ifdef PADDLE_WITH_CUDA
std::vector<paddle::Tensor> RboxIouCUDAForward(const paddle::Tensor &rbox1,
const paddle::Tensor &rbox2);
#endif
#define CHECK_INPUT_SAME(x1, x2) \
PD_CHECK(x1.place() == x2.place(), "input must be smae pacle.")
std::vector<paddle::Tensor> RboxIouForward(const paddle::Tensor &rbox1,
const paddle::Tensor &rbox2) {
CHECK_INPUT_SAME(rbox1, rbox2);
if (rbox1.is_cpu()) {
return RboxIouCPUForward(rbox1, rbox2);
#ifdef PADDLE_WITH_CUDA
} else if (rbox1.is_gpu()) {
return RboxIouCUDAForward(rbox1, rbox2);
#endif
}
}
std::vector<std::vector<int64_t>>
RboxIouInferShape(std::vector<int64_t> rbox1_shape,
std::vector<int64_t> rbox2_shape) {
return {{rbox1_shape[0], rbox2_shape[0]}};
}
std::vector<paddle::DataType> RboxIouInferDtype(paddle::DataType t1,
paddle::DataType t2) {
return {t1};
}
PD_BUILD_OP(rbox_iou)
.Inputs({"RBox1", "RBox2"})
.Outputs({"Output"})
.SetKernelFn(PD_KERNEL(RboxIouForward))
.SetInferShapeFn(PD_INFER_SHAPE(RboxIouInferShape))
.SetInferDtypeFn(PD_INFER_DTYPE(RboxIouInferDtype));
......@@ -13,21 +13,15 @@
// limitations under the License.
//
// The code is based on
// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated
// https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/csrc/box_iou_rotated/
#include "paddle/extension.h"
#include "rbox_iou_op.h"
#include "rbox_iou_utils.h"
// 2D block with 32 * 16 = 512 threads per block
const int BLOCK_DIM_X = 32;
const int BLOCK_DIM_Y = 16;
/**
Computes ceil(a / b)
*/
static inline int CeilDiv(const int a, const int b) { return (a + b - 1) / b; }
template <typename T>
__global__ void rbox_iou_cuda_kernel(const int rbox1_num, const int rbox2_num,
const T *rbox1_data_ptr,
......@@ -85,7 +79,7 @@ __global__ void rbox_iou_cuda_kernel(const int rbox1_num, const int rbox2_num,
}
#define CHECK_INPUT_GPU(x) \
PD_CHECK(x.place() == paddle::PlaceType::kGPU, #x " must be a GPU Tensor.")
PD_CHECK(x.is_gpu(), #x " must be a GPU Tensor.")
std::vector<paddle::Tensor> RboxIouCUDAForward(const paddle::Tensor &rbox1,
const paddle::Tensor &rbox2) {
......@@ -95,7 +89,8 @@ std::vector<paddle::Tensor> RboxIouCUDAForward(const paddle::Tensor &rbox1,
auto rbox1_num = rbox1.shape()[0];
auto rbox2_num = rbox2.shape()[0];
auto output = paddle::Tensor(paddle::PlaceType::kGPU, {rbox1_num, rbox2_num});
auto output =
paddle::empty({rbox1_num, rbox2_num}, rbox1.dtype(), paddle::GPUPlace());
const int blocks_x = CeilDiv(rbox1_num, BLOCK_DIM_X);
const int blocks_y = CeilDiv(rbox2_num, BLOCK_DIM_Y);
......@@ -107,7 +102,7 @@ std::vector<paddle::Tensor> RboxIouCUDAForward(const paddle::Tensor &rbox1,
rbox1.type(), "rbox_iou_cuda_kernel", ([&] {
rbox_iou_cuda_kernel<data_t><<<blocks, threads, 0, rbox1.stream()>>>(
rbox1_num, rbox2_num, rbox1.data<data_t>(), rbox2.data<data_t>(),
output.mutable_data<data_t>());
output.data<data_t>());
}));
return {output};
......
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
// The code is based on https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated
#include "rbox_iou_op.h"
#include "paddle/extension.h"
template <typename T>
void rbox_iou_cpu_kernel(
const int rbox1_num,
const int rbox2_num,
const T* rbox1_data_ptr,
const T* rbox2_data_ptr,
T* output_data_ptr) {
int i, j;
for (i = 0; i < rbox1_num; i++) {
for (j = 0; j < rbox2_num; j++) {
int offset = i * rbox2_num + j;
output_data_ptr[offset] = rbox_iou_single<T>(rbox1_data_ptr + i * 5, rbox2_data_ptr + j * 5);
}
}
}
#define CHECK_INPUT_CPU(x) PD_CHECK(x.place() == paddle::PlaceType::kCPU, #x " must be a CPU Tensor.")
std::vector<paddle::Tensor> RboxIouCPUForward(const paddle::Tensor& rbox1, const paddle::Tensor& rbox2) {
CHECK_INPUT_CPU(rbox1);
CHECK_INPUT_CPU(rbox2);
auto rbox1_num = rbox1.shape()[0];
auto rbox2_num = rbox2.shape()[0];
auto output = paddle::Tensor(paddle::PlaceType::kCPU, {rbox1_num, rbox2_num});
PD_DISPATCH_FLOATING_TYPES(
rbox1.type(),
"rbox_iou_cpu_kernel",
([&] {
rbox_iou_cpu_kernel<data_t>(
rbox1_num,
rbox2_num,
rbox1.data<data_t>(),
rbox2.data<data_t>(),
output.mutable_data<data_t>());
}));
return {output};
}
#ifdef PADDLE_WITH_CUDA
std::vector<paddle::Tensor> RboxIouCUDAForward(const paddle::Tensor& rbox1, const paddle::Tensor& rbox2);
#endif
#define CHECK_INPUT_SAME(x1, x2) PD_CHECK(x1.place() == x2.place(), "input must be smae pacle.")
std::vector<paddle::Tensor> RboxIouForward(const paddle::Tensor& rbox1, const paddle::Tensor& rbox2) {
CHECK_INPUT_SAME(rbox1, rbox2);
if (rbox1.place() == paddle::PlaceType::kCPU) {
return RboxIouCPUForward(rbox1, rbox2);
#ifdef PADDLE_WITH_CUDA
} else if (rbox1.place() == paddle::PlaceType::kGPU) {
return RboxIouCUDAForward(rbox1, rbox2);
#endif
}
}
std::vector<std::vector<int64_t>> InferShape(std::vector<int64_t> rbox1_shape, std::vector<int64_t> rbox2_shape) {
return {{rbox1_shape[0], rbox2_shape[0]}};
}
std::vector<paddle::DataType> InferDtype(paddle::DataType t1, paddle::DataType t2) {
return {t1};
}
PD_BUILD_OP(rbox_iou)
.Inputs({"RBOX1", "RBOX2"})
.Outputs({"Output"})
.SetKernelFn(PD_KERNEL(RboxIouForward))
.SetInferShapeFn(PD_INFER_SHAPE(InferShape))
.SetInferDtypeFn(PD_INFER_DTYPE(InferDtype));
......@@ -13,7 +13,7 @@
// limitations under the License.
//
// The code is based on
// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/box_iou_rotated
// https://github.com/facebookresearch/detectron2/blob/main/detectron2/layers/csrc/box_iou_rotated/
#pragma once
......@@ -336,13 +336,21 @@ HOST_DEVICE_INLINE T rbox_iou_single(T const *const box1_raw,
box2.h = box2_raw[3];
box2.a = box2_raw[4];
const T area1 = box1.w * box1.h;
const T area2 = box2.w * box2.h;
if (area1 < 1e-14 || area2 < 1e-14) {
if (box1.w < 1e-2 || box1.h < 1e-2 || box2.w < 1e-2 || box2.h < 1e-2) {
return 0.f;
}
const T area1 = box1.w * box1.h;
const T area2 = box2.w * box2.h;
const T intersection = rboxes_intersection<T>(box1, box2);
const T iou = intersection / (area1 + area2 - intersection);
return iou;
}
/**
Computes ceil(a / b)
*/
HOST_DEVICE inline int CeilDiv(const int a, const int b) {
return (a + b - 1) / b;
}
\ No newline at end of file
......@@ -18,6 +18,7 @@ from . import atss_assigner
from . import simota_assigner
from . import max_iou_assigner
from . import fcosr_assigner
from . import rotated_task_aligned_assigner
from .utils import *
from .task_aligned_assigner import *
......@@ -25,3 +26,4 @@ from .atss_assigner import *
from .simota_assigner import *
from .max_iou_assigner import *
from .fcosr_assigner import *
from .rotated_task_aligned_assigner import *
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
......
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from ppdet.core.workspace import register
from ..rbox_utils import rotated_iou_similarity, check_points_in_rotated_boxes
from .utils import gather_topk_anchors, compute_max_iou_anchor
__all__ = ['RotatedTaskAlignedAssigner']
@register
class RotatedTaskAlignedAssigner(nn.Layer):
"""TOOD: Task-aligned One-stage Object Detection
"""
def __init__(self, topk=13, alpha=1.0, beta=6.0, eps=1e-9):
super(RotatedTaskAlignedAssigner, self).__init__()
self.topk = topk
self.alpha = alpha
self.beta = beta
self.eps = eps
@paddle.no_grad()
def forward(self,
pred_scores,
pred_bboxes,
anchor_points,
num_anchors_list,
gt_labels,
gt_bboxes,
pad_gt_mask,
bg_index,
gt_scores=None):
r"""This code is based on
https://github.com/fcjian/TOOD/blob/master/mmdet/core/bbox/assigners/task_aligned_assigner.py
The assignment is done in following steps
1. compute alignment metric between all bbox (bbox of all pyramid levels) and gt
2. select top-k bbox as candidates for each gt
3. limit the positive sample's center in gt (because the anchor-free detector
only can predict positive distance)
4. if an anchor box is assigned to multiple gts, the one with the
highest iou will be selected.
Args:
pred_scores (Tensor, float32): predicted class probability, shape(B, L, C)
pred_bboxes (Tensor, float32): predicted bounding boxes, shape(B, L, 5)
anchor_points (Tensor, float32): pre-defined anchors, shape(1, L, 2), "cxcy" format
num_anchors_list (List): num of anchors in each level, shape(L)
gt_labels (Tensor, int64|int32): Label of gt_bboxes, shape(B, n, 1)
gt_bboxes (Tensor, float32): Ground truth bboxes, shape(B, n, 5)
pad_gt_mask (Tensor, float32): 1 means bbox, 0 means no bbox, shape(B, n, 1)
bg_index (int): background index
gt_scores (Tensor|None, float32) Score of gt_bboxes, shape(B, n, 1)
Returns:
assigned_labels (Tensor): (B, L)
assigned_bboxes (Tensor): (B, L, 5)
assigned_scores (Tensor): (B, L, C)
"""
assert pred_scores.ndim == pred_bboxes.ndim
assert gt_labels.ndim == gt_bboxes.ndim and \
gt_bboxes.ndim == 3
batch_size, num_anchors, num_classes = pred_scores.shape
_, num_max_boxes, _ = gt_bboxes.shape
# negative batch
if num_max_boxes == 0:
assigned_labels = paddle.full(
[batch_size, num_anchors], bg_index, dtype=gt_labels.dtype)
assigned_bboxes = paddle.zeros([batch_size, num_anchors, 5])
assigned_scores = paddle.zeros(
[batch_size, num_anchors, num_classes])
return assigned_labels, assigned_bboxes, assigned_scores
# compute iou between gt and pred bbox, [B, n, L]
ious = rotated_iou_similarity(gt_bboxes, pred_bboxes)
ious = paddle.where(ious > 1 + self.eps, paddle.zeros_like(ious), ious)
ious.stop_gradient = True
# gather pred bboxes class score
pred_scores = pred_scores.transpose([0, 2, 1])
batch_ind = paddle.arange(
end=batch_size, dtype=gt_labels.dtype).unsqueeze(-1)
gt_labels_ind = paddle.stack(
[batch_ind.tile([1, num_max_boxes]), gt_labels.squeeze(-1)],
axis=-1)
bbox_cls_scores = paddle.gather_nd(pred_scores, gt_labels_ind)
# compute alignment metrics, [B, n, L]
alignment_metrics = bbox_cls_scores.pow(self.alpha) * ious.pow(
self.beta)
# check the positive sample's center in gt, [B, n, L]
is_in_gts = check_points_in_rotated_boxes(anchor_points, gt_bboxes)
# select topk largest alignment metrics pred bbox as candidates
# for each gt, [B, n, L]
is_in_topk = gather_topk_anchors(
alignment_metrics * is_in_gts, self.topk, topk_mask=pad_gt_mask)
# select positive sample, [B, n, L]
mask_positive = is_in_topk * is_in_gts * pad_gt_mask
# if an anchor box is assigned to multiple gts,
# the one with the highest iou will be selected, [B, n, L]
mask_positive_sum = mask_positive.sum(axis=-2)
if mask_positive_sum.max() > 1:
mask_multiple_gts = (mask_positive_sum.unsqueeze(1) > 1).tile(
[1, num_max_boxes, 1])
is_max_iou = compute_max_iou_anchor(ious)
mask_positive = paddle.where(mask_multiple_gts, is_max_iou,
mask_positive)
mask_positive_sum = mask_positive.sum(axis=-2)
assigned_gt_index = mask_positive.argmax(axis=-2)
# assigned target
assigned_gt_index = assigned_gt_index + batch_ind * num_max_boxes
assigned_labels = paddle.gather(
gt_labels.flatten(), assigned_gt_index.flatten(), axis=0)
assigned_labels = assigned_labels.reshape([batch_size, num_anchors])
assigned_labels = paddle.where(
mask_positive_sum > 0, assigned_labels,
paddle.full_like(assigned_labels, bg_index))
assigned_bboxes = paddle.gather(
gt_bboxes.reshape([-1, 5]), assigned_gt_index.flatten(), axis=0)
assigned_bboxes = assigned_bboxes.reshape([batch_size, num_anchors, 5])
assigned_scores = F.one_hot(assigned_labels, num_classes + 1)
ind = list(range(num_classes + 1))
ind.remove(bg_index)
assigned_scores = paddle.index_select(
assigned_scores, paddle.to_tensor(ind), axis=-1)
# rescale alignment metrics
alignment_metrics *= mask_positive
max_metrics_per_instance = alignment_metrics.max(axis=-1, keepdim=True)
max_ious_per_instance = (ious * mask_positive).max(axis=-1,
keepdim=True)
alignment_metrics = alignment_metrics / (
max_metrics_per_instance + self.eps) * max_ious_per_instance
alignment_metrics = alignment_metrics.max(-2).unsqueeze(-1)
assigned_scores = assigned_scores * alignment_metrics
assigned_bboxes.stop_gradient = True
assigned_scores.stop_gradient = True
assigned_labels.stop_gradient = True
return assigned_labels, assigned_bboxes, assigned_scores
......@@ -34,6 +34,7 @@ from . import tood_head
from . import retina_head
from . import ppyoloe_head
from . import fcosr_head
from . import ppyoloe_r_head
from . import ld_gfl_head
from .bbox_head import *
......@@ -59,3 +60,4 @@ from .retina_head import *
from .ppyoloe_head import *
from .fcosr_head import *
from .ld_gfl_head import *
from .ppyoloe_r_head import *
......@@ -205,8 +205,8 @@ class FCOSRHead(nn.Layer):
anchor_points = []
stride_tensor = []
num_anchors_list = []
for i, stride in enumerate(self.fpn_strides):
_, _, h, w = feats[i].shape
for feat, stride in zip(feats, self.fpn_strides):
_, _, h, w = paddle.shape(feat)
shift_x = (paddle.arange(end=w) + 0.5) * stride
shift_y = (paddle.arange(end=h) + 0.5) * stride
shift_y, shift_x = paddle.meshgrid(shift_y, shift_x)
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from ppdet.core.workspace import register
from ..losses import ProbIoULoss
from ..initializer import bias_init_with_prob, constant_, normal_, vector_
from ppdet.modeling.backbones.cspresnet import ConvBNLayer
from ppdet.modeling.ops import get_static_shape, get_act_fn, anchor_generator
from ppdet.modeling.layers import MultiClassNMS
__all__ = ['PPYOLOERHead']
class ESEAttn(nn.Layer):
def __init__(self, feat_channels, act='swish'):
super(ESEAttn, self).__init__()
self.fc = nn.Conv2D(feat_channels, feat_channels, 1)
self.conv = ConvBNLayer(feat_channels, feat_channels, 1, act=act)
self._init_weights()
def _init_weights(self):
normal_(self.fc.weight, std=0.01)
def forward(self, feat, avg_feat):
weight = F.sigmoid(self.fc(avg_feat))
return self.conv(feat * weight)
@register
class PPYOLOERHead(nn.Layer):
__shared__ = ['num_classes', 'trt']
__inject__ = ['static_assigner', 'assigner', 'nms']
def __init__(self,
in_channels=[1024, 512, 256],
num_classes=15,
act='swish',
fpn_strides=(32, 16, 8),
grid_cell_offset=0.5,
angle_max=90,
use_varifocal_loss=True,
static_assigner_epoch=4,
trt=False,
static_assigner='ATSSAssigner',
assigner='TaskAlignedAssigner',
nms='MultiClassNMS',
loss_weight={'class': 1.0,
'iou': 2.5,
'dfl': 0.05}):
super(PPYOLOERHead, self).__init__()
assert len(in_channels) > 0, "len(in_channels) should > 0"
self.in_channels = in_channels
self.num_classes = num_classes
self.fpn_strides = fpn_strides
self.grid_cell_offset = grid_cell_offset
self.angle_max = angle_max
self.loss_weight = loss_weight
self.use_varifocal_loss = use_varifocal_loss
self.half_pi = paddle.to_tensor(
[1.5707963267948966], dtype=paddle.float32)
self.half_pi_bin = self.half_pi / angle_max
self.iou_loss = ProbIoULoss()
self.static_assigner_epoch = static_assigner_epoch
self.static_assigner = static_assigner
self.assigner = assigner
self.nms = nms
# stem
self.stem_cls = nn.LayerList()
self.stem_reg = nn.LayerList()
self.stem_angle = nn.LayerList()
act = get_act_fn(
act, trt=trt) if act is None or isinstance(act,
(str, dict)) else act
self.trt = trt
for in_c in self.in_channels:
self.stem_cls.append(ESEAttn(in_c, act=act))
self.stem_reg.append(ESEAttn(in_c, act=act))
self.stem_angle.append(ESEAttn(in_c, act=act))
# pred head
self.pred_cls = nn.LayerList()
self.pred_reg = nn.LayerList()
self.pred_angle = nn.LayerList()
for in_c in self.in_channels:
self.pred_cls.append(
nn.Conv2D(
in_c, self.num_classes, 3, padding=1))
self.pred_reg.append(nn.Conv2D(in_c, 4, 3, padding=1))
self.pred_angle.append(
nn.Conv2D(
in_c, self.angle_max + 1, 3, padding=1))
self.angle_proj_conv = nn.Conv2D(
self.angle_max + 1, 1, 1, bias_attr=False)
self._init_weights()
@classmethod
def from_config(cls, cfg, input_shape):
return {'in_channels': [i.channels for i in input_shape], }
def _init_weights(self):
bias_cls = bias_init_with_prob(0.01)
bias_angle = [10.] + [1.] * self.angle_max
for cls_, reg_, angle_ in zip(self.pred_cls, self.pred_reg,
self.pred_angle):
normal_(cls_.weight, std=0.01)
constant_(cls_.bias, bias_cls)
normal_(reg_.weight, std=0.01)
constant_(reg_.bias)
constant_(angle_.weight)
vector_(angle_.bias, bias_angle)
angle_proj = paddle.linspace(0, self.angle_max, self.angle_max + 1)
self.angle_proj = angle_proj * self.half_pi_bin
self.angle_proj_conv.weight.set_value(
self.angle_proj.reshape([1, self.angle_max + 1, 1, 1]))
self.angle_proj_conv.weight.stop_gradient = True
def _generate_anchors(self, feats):
if self.trt:
anchor_points = []
for feat, stride in zip(feats, self.fpn_strides):
_, _, h, w = paddle.shape(feat)
anchor, _ = anchor_generator(
feat,
stride * 4,
1.0, [1.0, 1.0, 1.0, 1.0], [stride, stride],
offset=0.5)
x1, y1, x2, y2 = paddle.split(anchor, 4, axis=-1)
xc = (x1 + x2 + 1) / 2
yc = (y1 + y2 + 1) / 2
anchor_point = paddle.concat(
[xc, yc], axis=-1).reshape((1, h * w, 2))
anchor_points.append(anchor_point)
anchor_points = paddle.concat(anchor_points, axis=1)
return anchor_points, None, None
else:
anchor_points = []
stride_tensor = []
num_anchors_list = []
for feat, stride in zip(feats, self.fpn_strides):
_, _, h, w = paddle.shape(feat)
shift_x = (paddle.arange(end=w) + 0.5) * stride
shift_y = (paddle.arange(end=h) + 0.5) * stride
shift_y, shift_x = paddle.meshgrid(shift_y, shift_x)
anchor_point = paddle.cast(
paddle.stack(
[shift_x, shift_y], axis=-1), dtype='float32')
anchor_points.append(anchor_point.reshape([1, -1, 2]))
stride_tensor.append(
paddle.full(
[1, h * w, 1], stride, dtype='float32'))
num_anchors_list.append(h * w)
anchor_points = paddle.concat(anchor_points, axis=1)
stride_tensor = paddle.concat(stride_tensor, axis=1)
return anchor_points, stride_tensor, num_anchors_list
def forward(self, feats, targets=None):
assert len(feats) == len(self.fpn_strides), \
"The size of feats is not equal to size of fpn_strides"
if self.training:
return self.forward_train(feats, targets)
else:
return self.forward_eval(feats)
def forward_train(self, feats, targets):
anchor_points, stride_tensor, num_anchors_list = self._generate_anchors(
feats)
cls_score_list, reg_dist_list, reg_angle_list = [], [], []
for i, feat in enumerate(feats):
avg_feat = F.adaptive_avg_pool2d(feat, (1, 1))
cls_logit = self.pred_cls[i](self.stem_cls[i](feat, avg_feat) +
feat)
reg_dist = self.pred_reg[i](self.stem_reg[i](feat, avg_feat))
reg_angle = self.pred_angle[i](self.stem_angle[i](feat, avg_feat))
# cls and reg
cls_score = F.sigmoid(cls_logit)
cls_score_list.append(cls_score.flatten(2).transpose([0, 2, 1]))
reg_dist_list.append(reg_dist.flatten(2).transpose([0, 2, 1]))
reg_angle_list.append(reg_angle.flatten(2).transpose([0, 2, 1]))
cls_score_list = paddle.concat(cls_score_list, axis=1)
reg_dist_list = paddle.concat(reg_dist_list, axis=1)
reg_angle_list = paddle.concat(reg_angle_list, axis=1)
return self.get_loss([
cls_score_list, reg_dist_list, reg_angle_list, anchor_points,
num_anchors_list, stride_tensor
], targets)
def forward_eval(self, feats):
cls_score_list, reg_box_list = [], []
anchor_points, _, _ = self._generate_anchors(feats)
for i, (feat, stride) in enumerate(zip(feats, self.fpn_strides)):
b, _, h, w = paddle.shape(feat)
l = h * w
# cls
avg_feat = F.adaptive_avg_pool2d(feat, (1, 1))
cls_logit = self.pred_cls[i](self.stem_cls[i](feat, avg_feat) +
feat)
# reg
reg_dist = self.pred_reg[i](self.stem_reg[i](feat, avg_feat))
reg_xy, reg_wh = paddle.split(reg_dist, 2, axis=1)
reg_xy = reg_xy * stride
reg_wh = (F.elu(reg_wh) + 1.) * stride
reg_angle = self.pred_angle[i](self.stem_angle[i](feat, avg_feat))
reg_angle = self.angle_proj_conv(F.softmax(reg_angle, axis=1))
reg_box = paddle.concat([reg_xy, reg_wh, reg_angle], axis=1)
# cls and reg
cls_score = F.sigmoid(cls_logit)
cls_score_list.append(cls_score.reshape([b, self.num_classes, l]))
reg_box_list.append(reg_box.reshape([b, 5, l]))
cls_score_list = paddle.concat(cls_score_list, axis=-1)
reg_box_list = paddle.concat(reg_box_list, axis=-1).transpose([0, 2, 1])
reg_xy, reg_wha = paddle.split(reg_box_list, [2, 3], axis=-1)
reg_xy = reg_xy + anchor_points
reg_box_list = paddle.concat([reg_xy, reg_wha], axis=-1)
return cls_score_list, reg_box_list
def _bbox_decode(self, points, pred_dist, pred_angle, stride_tensor):
# predict vector to x, y, w, h, angle
b, l = pred_angle.shape[:2]
xy, wh = paddle.split(pred_dist, 2, axis=-1)
xy = xy * stride_tensor + points
wh = (F.elu(wh) + 1.) * stride_tensor
angle = F.softmax(pred_angle.reshape([b, l, 1, self.angle_max + 1
])).matmul(self.angle_proj)
return paddle.concat([xy, wh, angle], axis=-1)
def get_loss(self, head_outs, gt_meta):
pred_scores, pred_dist, pred_angle, \
anchor_points, num_anchors_list, stride_tensor = head_outs
# [B, N, 5] -> [B, N, 5]
pred_bboxes = self._bbox_decode(anchor_points, pred_dist, pred_angle,
stride_tensor)
gt_labels = gt_meta['gt_class']
# [B, N, 5]
gt_bboxes = gt_meta['gt_rbox']
pad_gt_mask = gt_meta['pad_gt_mask']
# label assignment
if gt_meta['epoch_id'] < self.static_assigner_epoch:
assigned_labels, assigned_bboxes, assigned_scores = \
self.static_assigner(
anchor_points,
stride_tensor,
num_anchors_list,
gt_labels,
gt_meta['gt_bbox'],
gt_bboxes,
pad_gt_mask,
self.num_classes,
pred_bboxes.detach()
)
else:
assigned_labels, assigned_bboxes, assigned_scores = \
self.assigner(
pred_scores.detach(),
pred_bboxes.detach(),
anchor_points,
num_anchors_list,
gt_labels,
gt_bboxes,
pad_gt_mask,
bg_index=self.num_classes)
alpha_l = -1
# cls loss
if self.use_varifocal_loss:
one_hot_label = F.one_hot(assigned_labels,
self.num_classes + 1)[..., :-1]
loss_cls = self._varifocal_loss(pred_scores, assigned_scores,
one_hot_label)
else:
loss_cls = self._focal_loss(pred_scores, assigned_scores, alpha_l)
assigned_scores_sum = assigned_scores.sum()
if paddle.distributed.get_world_size() > 1:
paddle.distributed.all_reduce(assigned_scores_sum)
assigned_scores_sum = paddle.clip(
assigned_scores_sum / paddle.distributed.get_world_size(),
min=1.)
else:
assigned_scores_sum = paddle.clip(assigned_scores_sum, min=1.)
loss_cls /= assigned_scores_sum
loss_iou, loss_dfl = self._bbox_loss(pred_angle, pred_bboxes,
anchor_points, assigned_labels,
assigned_bboxes, assigned_scores,
assigned_scores_sum, stride_tensor)
loss = self.loss_weight['class'] * loss_cls + \
self.loss_weight['iou'] * loss_iou + \
self.loss_weight['dfl'] * loss_dfl
out_dict = {
'loss': loss,
'loss_cls': loss_cls,
'loss_iou': loss_iou,
'loss_dfl': loss_dfl
}
return out_dict
@staticmethod
def _focal_loss(score, label, alpha=0.25, gamma=2.0):
weight = (score - label).pow(gamma)
if alpha > 0:
alpha_t = alpha * label + (1 - alpha) * (1 - label)
weight *= alpha_t
loss = F.binary_cross_entropy(
score, label, weight=weight, reduction='sum')
return loss
@staticmethod
def _varifocal_loss(pred_score, gt_score, label, alpha=0.75, gamma=2.0):
weight = alpha * pred_score.pow(gamma) * (1 - label) + gt_score * label
loss = F.binary_cross_entropy(
pred_score, gt_score, weight=weight, reduction='sum')
return loss
@staticmethod
def _df_loss(pred_dist, target):
target_left = paddle.cast(target, 'int64')
target_right = target_left + 1
weight_left = target_right.astype('float32') - target
weight_right = 1 - weight_left
loss_left = F.cross_entropy(
pred_dist, target_left, reduction='none') * weight_left
loss_right = F.cross_entropy(
pred_dist, target_right, reduction='none') * weight_right
return (loss_left + loss_right).mean(-1, keepdim=True)
def _bbox_loss(self, pred_angle, pred_bboxes, anchor_points,
assigned_labels, assigned_bboxes, assigned_scores,
assigned_scores_sum, stride_tensor):
# select positive samples mask
mask_positive = (assigned_labels != self.num_classes)
num_pos = mask_positive.sum()
# pos/neg loss
if num_pos > 0:
# iou
bbox_mask = mask_positive.unsqueeze(-1).tile([1, 1, 5])
pred_bboxes_pos = paddle.masked_select(pred_bboxes,
bbox_mask).reshape([-1, 5])
assigned_bboxes_pos = paddle.masked_select(
assigned_bboxes, bbox_mask).reshape([-1, 5])
bbox_weight = paddle.masked_select(
assigned_scores.sum(-1), mask_positive).reshape([-1])
loss_iou = self.iou_loss(pred_bboxes_pos,
assigned_bboxes_pos) * bbox_weight
loss_iou = loss_iou.sum() / assigned_scores_sum
# dfl
angle_mask = mask_positive.unsqueeze(-1).tile(
[1, 1, self.angle_max + 1])
pred_angle_pos = paddle.masked_select(
pred_angle, angle_mask).reshape([-1, self.angle_max + 1])
assigned_angle_pos = (
assigned_bboxes_pos[:, 4] /
self.half_pi_bin).clip(0, self.angle_max - 0.01)
loss_dfl = self._df_loss(pred_angle_pos, assigned_angle_pos)
else:
loss_iou = pred_bboxes.sum() * 0.
loss_dfl = paddle.zeros([1])
return loss_iou, loss_dfl
def _box2corners(self, pred_bboxes):
""" convert (x, y, w, h, angle) to (x1, y1, x2, y2, x3, y3, x4, y4)
Args:
pred_bboxes (Tensor): [B, N, 5]
Returns:
polys (Tensor): [B, N, 8]
"""
x, y, w, h, angle = paddle.split(pred_bboxes, 5, axis=-1)
cos_a_half = paddle.cos(angle) * 0.5
sin_a_half = paddle.sin(angle) * 0.5
w_x = cos_a_half * w
w_y = sin_a_half * w
h_x = -sin_a_half * h
h_y = cos_a_half * h
return paddle.concat(
[
x + w_x + h_x, y + w_y + h_y, x - w_x + h_x, y - w_y + h_y,
x - w_x - h_x, y - w_y - h_y, x + w_x - h_x, y + w_y - h_y
],
axis=-1)
def post_process(self, head_outs, scale_factor):
pred_scores, pred_bboxes = head_outs
# [B, N, 5] -> [B, N, 8]
pred_bboxes = self._box2corners(pred_bboxes)
# scale bbox to origin
scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1)
scale_factor = paddle.concat(
[
scale_x, scale_y, scale_x, scale_y, scale_x, scale_y, scale_x,
scale_y
],
axis=-1).reshape([-1, 1, 8])
pred_bboxes /= scale_factor
bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores)
return bbox_pred, bbox_num
......@@ -118,6 +118,12 @@ def zeros_(tensor):
return _no_grad_fill_(tensor, 0)
def vector_(tensor, vector):
with paddle.no_grad():
tensor.set_value(paddle.to_tensor(vector, dtype=tensor.dtype))
return tensor
def _calculate_fan_in_and_fan_out(tensor, reverse=False):
"""
Calculate (fan_in, _fan_out) for tensor
......
......@@ -61,7 +61,14 @@ class SPP(nn.Layer):
class CSPStage(nn.Layer):
def __init__(self, block_fn, ch_in, ch_out, n, act='swish', spp=False):
def __init__(self,
block_fn,
ch_in,
ch_out,
n,
act='swish',
spp=False,
use_alpha=False):
super(CSPStage, self).__init__()
ch_mid = int(ch_out // 2)
......@@ -72,7 +79,11 @@ class CSPStage(nn.Layer):
for i in range(n):
self.convs.add_sublayer(
str(i),
eval(block_fn)(next_ch_in, ch_mid, act=act, shortcut=False))
eval(block_fn)(next_ch_in,
ch_mid,
act=act,
shortcut=False,
use_alpha=use_alpha))
if i == (n - 1) // 2 and spp:
self.convs.add_sublayer(
'spp', SPP(ch_mid * 4, ch_mid, 1, [5, 9, 13], act=act))
......@@ -109,6 +120,7 @@ class CustomCSPPAN(nn.Layer):
data_format='NCHW',
width_mult=1.0,
depth_mult=1.0,
use_alpha=False,
trt=False):
super(CustomCSPPAN, self).__init__()
......@@ -136,7 +148,8 @@ class CustomCSPPAN(nn.Layer):
ch_out,
block_num,
act=act,
spp=(spp and i == 0)))
spp=(spp and i == 0),
use_alpha=use_alpha))
if drop_block:
stage.add_sublayer('drop', DropBlock(block_size, keep_prob))
......@@ -181,7 +194,8 @@ class CustomCSPPAN(nn.Layer):
ch_out,
block_num,
act=act,
spp=False))
spp=False,
use_alpha=use_alpha))
if drop_block:
stage.add_sublayer('drop', DropBlock(block_size, keep_prob))
......
......@@ -26,18 +26,9 @@ from paddle import in_dynamic_mode
from paddle.common_ops_import import Variable, LayerHelper, check_variable_and_dtype, check_type, check_dtype
__all__ = [
'prior_box',
'generate_proposals',
'box_coder',
'multiclass_nms',
'distribute_fpn_proposals',
'matrix_nms',
'batch_norm',
'mish',
'silu',
'swish',
'identity',
'anchor_generator'
'prior_box', 'generate_proposals', 'box_coder', 'multiclass_nms',
'distribute_fpn_proposals', 'matrix_nms', 'batch_norm', 'mish', 'silu',
'swish', 'identity', 'anchor_generator'
]
......@@ -118,6 +109,7 @@ def batch_norm(ch,
return norm_layer
@paddle.jit.not_to_static
def anchor_generator(input,
anchor_sizes=None,
......
......@@ -239,3 +239,57 @@ def check_points_in_polys(points, polys):
is_in_polys = (ap_dot_ab >= 0) & (ap_dot_ab <= norm_ab) & (
ap_dot_ad >= 0) & (ap_dot_ad <= norm_ad)
return is_in_polys
def check_points_in_rotated_boxes(points, boxes):
"""Check whether point is in rotated boxes
Args:
points (tensor): (1, L, 2) anchor points
boxes (tensor): [B, N, 5] gt_bboxes
eps (float): default 1e-9
Returns:
is_in_box (tensor): (B, N, L)
"""
# [B, N, 5] -> [B, N, 4, 2]
corners = box2corners(boxes)
# [1, L, 2] -> [1, 1, L, 2]
points = points.unsqueeze(0)
# [B, N, 4, 2] -> [B, N, 1, 2]
a, b, c, d = corners.split(4, axis=2)
ab = b - a
ad = d - a
# [B, N, L, 2]
ap = points - a
# [B, N, L]
norm_ab = paddle.sum(ab * ab, axis=-1)
# [B, N, L]
norm_ad = paddle.sum(ad * ad, axis=-1)
# [B, N, L] dot product
ap_dot_ab = paddle.sum(ap * ab, axis=-1)
# [B, N, L] dot product
ap_dot_ad = paddle.sum(ap * ad, axis=-1)
# [B, N, L] <A, B> = |A|*|B|*cos(theta)
is_in_box = (ap_dot_ab >= 0) & (ap_dot_ab <= norm_ab) & (ap_dot_ad >= 0) & (
ap_dot_ad <= norm_ad)
return is_in_box
def rotated_iou_similarity(box1, box2, eps=1e-9, func=''):
"""Calculate iou of box1 and box2
Args:
box1 (Tensor): box with the shape [N, M1, 5]
box2 (Tensor): box with the shape [N, M2, 5]
Return:
iou (Tensor): iou between box1 and box2 with the shape [N, M1, M2]
"""
from ext_op import rbox_iou
rotated_ious = []
for b1, b2 in zip(box1, box2):
rotated_ious.append(rbox_iou(b1, b2))
return paddle.stack(rotated_ious, axis=0)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册