未验证 提交 e55e4194 编写于 作者: W wangxinxin08 提交者: GitHub

Refactor rbox (#6704)

* refactor rbox

* modify the code of save results

* fix some problem

* add .gitignore in dataset/dota

* fix test anno path
上级 fd949c73
...@@ -3,19 +3,19 @@ num_classes: 15 ...@@ -3,19 +3,19 @@ num_classes: 15
TrainDataset: TrainDataset:
!COCODataSet !COCODataSet
image_dir: trainval_split/images image_dir: trainval1024/images
anno_path: trainval_split/s2anet_trainval_paddle_coco.json anno_path: trainval1024/DOTA_trainval1024.json
dataset_dir: dataset/DOTA_1024_s2anet dataset_dir: dataset/dota/
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_rbox'] data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly']
EvalDataset: EvalDataset:
!COCODataSet !COCODataSet
image_dir: trainval_split/images image_dir: trainval1024/images
anno_path: trainval_split/s2anet_trainval_paddle_coco.json anno_path: trainval1024/DOTA_trainval1024.json
dataset_dir: dataset/DOTA_1024_s2anet/ dataset_dir: dataset/dota/
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_rbox'] data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly']
TestDataset: TestDataset:
!ImageFolder !ImageFolder
anno_path: trainval_split/s2anet_trainval_paddle_coco.json anno_path: test1024/DOTA_test1024.json
dataset_dir: dataset/DOTA_1024_s2anet/ dataset_dir: dataset/dota/
...@@ -6,14 +6,14 @@ TrainDataset: ...@@ -6,14 +6,14 @@ TrainDataset:
image_dir: images image_dir: images
anno_path: annotations/train.json anno_path: annotations/train.json
dataset_dir: dataset/spine_coco dataset_dir: dataset/spine_coco
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_rbox'] data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly']
EvalDataset: EvalDataset:
!COCODataSet !COCODataSet
image_dir: images image_dir: images
anno_path: annotations/valid.json anno_path: annotations/valid.json
dataset_dir: dataset/spine_coco dataset_dir: dataset/spine_coco
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_rbox'] data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly']
TestDataset: TestDataset:
!ImageFolder !ImageFolder
......
# S2ANet模型
## 内容
- [简介](#简介)
- [准备数据](#准备数据)
- [开始训练](#开始训练)
- [模型库](#模型库)
- [预测部署](#预测部署)
## 简介
[S2ANet](https://arxiv.org/pdf/2008.09397.pdf)是用于检测旋转框的模型,要求使用PaddlePaddle 2.1.1(可使用pip安装) 或适当的[develop版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release)
## 准备数据
### DOTA数据
[DOTA Dataset]是航空影像中物体检测的数据集,包含2806张图像,每张图像4000*4000分辨率。
| 数据版本 | 类别数 | 图像数 | 图像尺寸 | 实例数 | 标注方式 |
|:--------:|:-------:|:---------:|:---------:| :---------:| :------------: |
| v1.0 | 15 | 2806 | 800~4000 | 118282 | OBB + HBB |
| v1.5 | 16 | 2806 | 800~4000 | 400000 | OBB + HBB |
注:OBB标注方式是指标注任意四边形;顶点按顺时针顺序排列。HBB标注方式是指标注示例的外接矩形。
DOTA数据集中总共有2806张图像,其中1411张图像作为训练集,458张图像作为评估集,剩余937张图像作为测试集。
如果需要切割图像数据,请参考[DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit)
设置`crop_size=1024, stride=824, gap=200`参数切割数据后,训练集15749张图像,评估集5297张图像,测试集10833张图像。
### 自定义数据
数据标注有两种方式:
- 第一种是标注旋转矩形,可以通过旋转矩形标注工具[roLabelImg](https://github.com/cgvict/roLabelImg) 来标注旋转矩形框。
- 第二种是标注四边形,通过脚本转成外接旋转矩形,这样得到的标注可能跟真实的物体框有一定误差。
然后将标注结果转换成coco标注格式,其中每个`bbox`的格式为 `[x_center, y_center, width, height, angle]`,这里角度以弧度表示。
参考[脊椎间盘数据集](https://aistudio.baidu.com/aistudio/datasetdetail/85885) ,我们将数据集划分为训练集(230)、测试集(57),数据地址为:[spine_coco](https://paddledet.bj.bcebos.com/data/spine_coco.tar) 。该数据集图像数量比较少,使用这个数据集可以快速训练S2ANet模型。
## 开始训练
### 1. 安装旋转框IOU计算OP
旋转框IOU计算OP[ext_op](../../ppdet/ext_op)是参考Paddle[自定义外部算子](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/new_custom_op.html) 的方式开发。
若使用旋转框IOU计算OP,需要环境满足:
- PaddlePaddle >= 2.1.1
- GCC == 8.2
推荐使用docker镜像 paddle:2.1.1-gpu-cuda10.1-cudnn7。
执行如下命令下载镜像并启动容器:
```
sudo nvidia-docker run -it --name paddle_s2anet -v $PWD:/paddle --network=host registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.1-cudnn7 /bin/bash
```
镜像中paddle已安装好,进入python3.7,执行如下代码检查paddle安装是否正常:
```
import paddle
print(paddle.__version__)
paddle.utils.run_check()
```
进入到`ppdet/ext_op`文件夹,安装:
```
python3.7 setup.py install
```
Windows环境请按照如下步骤安装:
(1)准备Visual Studio (版本需要>=Visual Studio 2015 update3),这里以VS2017为例;
(2)点击开始-->Visual Studio 2017-->适用于 VS 2017 的x64本机工具命令提示;
(3)设置环境变量:`set DISTUTILS_USE_SDK=1`
(4)进入`PaddleDetection/ppdet/ext_op`目录,通过`python3.7 setup.py install`命令进行安装。
安装完成后,测试自定义op是否可以正常编译以及计算结果:
```
cd PaddleDetecetion/ppdet/ext_op
python3.7 test.py
```
### 2. 训练
**注意:**
配置文件中学习率是按照8卡GPU训练设置的,如果使用单卡GPU训练,请将学习率设置为原来的1/8。
GPU单卡训练
```bash
export CUDA_VISIBLE_DEVICES=0
python3.7 tools/train.py -c configs/dota/s2anet_1x_spine.yml
```
GPU多卡训练
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python3.7 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/dota/s2anet_1x_spine.yml
```
可以通过`--eval`开启边训练边测试。
### 3. 评估
```bash
python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams
# 使用提供训练好的模型评估
python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams
```
** 注意:**
(1) dota数据集中是train和val数据作为训练集一起训练的,对dota数据集进行评估时需要自定义设置评估数据集配置。
(2) 骨骼数据集是由分割数据转换而来,由于椎间盘不同类别对于检测任务而言区别很小,且s2anet算法最后得出的分数较低,评估时默认阈值为0.5,mAP较低是正常的。建议通过可视化查看检测结果。
### 4. 预测
执行如下命令,会将图像预测结果保存到`output`文件夹下。
```bash
python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
```
使用提供训练好的模型预测:
```bash
python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
```
### 5. DOTA数据评估
执行如下命令,会在`output`文件夹下将每个图像预测结果保存到同文件夹名的txt文本中。
```
python3.7 tools/infer.py -c configs/dota/s2anet_alignconv_2x_dota.yml -o weights=./weights/s2anet_alignconv_2x_dota.pdparams --infer_dir=dota_test_images --draw_threshold=0.05 --save_txt=True --output_dir=output
```
请参考[DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit) 生成评估文件,评估文件格式请参考[DOTA Test](http://captain.whu.edu.cn/DOTAweb/tasks.html) ,生成zip文件,每个类一个txt文件,txt文件中每行格式为:`image_id score x1 y1 x2 y2 x3 y3 x4 y4`,提交服务器进行评估。您也可以参考`dataset/dota_coco/dota_generate_test_result.py`脚本生成评估文件,提交到服务器。
## 模型库
### S2ANet模型
| 模型 | Conv类型 | mAP | 模型下载 | 配置文件 |
|:-----------:|:----------:|:--------:| :----------:| :---------: |
| S2ANet | Conv | 71.42 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_conv_2x_dota.yml) |
| S2ANet | AlignConv | 74.0 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_alignconv_2x_dota.yml) |
**注意:** 这里使用`multiclass_nms`,与原作者使用nms略有不同。
## 预测部署
Paddle中`multiclass_nms`算子的输入支持四边形输入,因此部署时可以不需要依赖旋转框IOU计算算子。
部署教程请参考[预测部署](../../deploy/README.md)
## Citations
```
@article{han2021align,
author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Align Deep Features for Oriented Object Detection},
year={2021},
pages={1-11},
doi={10.1109/TGRS.2021.3062048}}
@inproceedings{xia2018dota,
title={DOTA: A large-scale dataset for object detection in aerial images},
author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={3974--3983},
year={2018}
}
```
# S2ANet Model
## Content
- [S2ANet Model](#s2anet-model)
- [Content](#content)
- [Introduction](#introduction)
- [Prepare Data](#prepare-data)
- [DOTA data](#dota-data)
- [Customize Data](#customize-data)
- [Start Training](#start-training)
- [1. Install the rotating frame IOU and calculate the OP](#1-install-the-rotating-frame-iou-and-calculate-the-op)
- [2. Train](#2-train)
- [3. Evaluation](#3-evaluation)
- [4. Prediction](#4-prediction)
- [5. DOTA Data evaluation](#5-dota-data-evaluation)
- [Model Library](#model-library)
- [S2ANet Model](#s2anet-model-1)
- [Predict Deployment](#predict-deployment)
- [Citations](#citations)
## Introduction
[S2ANet](https://arxiv.org/pdf/2008.09397.pdf) is used to detect rotating frame's model, required use of PaddlePaddle 2.1.1(can be installed using PIP) or proper [develop version](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release).
## Prepare Data
### DOTA data
[DOTA Dataset] is a dataset of object detection in aerial images, which contains 2806 images with a resolution of 4000x4000 per image.
| Data version | categories | images | size | instances | annotation method |
|:--------:|:-------:|:---------:|:---------:| :---------:| :------------: |
| v1.0 | 15 | 2806 | 800~4000 | 118282 | OBB + HBB |
| v1.5 | 16 | 2806 | 800~4000 | 400000 | OBB + HBB |
Note: OBB annotation is an arbitrary quadrilateral; The vertices are arranged in clockwise order. The HBB annotation mode is the outer rectangle of the indicator note example.
There were 2,806 images in the DOTA dataset, including 1,411 images as a training set, 458 images as an evaluation set, and the remaining 937 images as a test set.
If you need to cut the image data, please refer to the [DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit).
After setting `crop_size=1024, stride=824, gap=200` parameters to cut data, there are 15,749 images in the training set, 5,297 images in the evaluation set, and 10,833 images in the test set.
### Customize Data
There are two ways to annotate data:
- The first is a tagging rotating rectangular, can pass rotating rectangular annotation tool [roLabelImg](https://github.com/cgvict/roLabelImg) to describe rotating rectangular box.
- The second is to mark the quadrilateral, through the script into an external rotating rectangle, so that the obtained mark may have a certain error with the real object frame.
Then convert the annotation result into coco annotation format, where each `bbox` is in the format of `[x_center, y_center, width, height, angle]`, where the angle is expressed in radians.
Reference [spinal disk dataset](https://aistudio.baidu.com/aistudio/datasetdetail/85885), we divide dataset into training set (230), the test set (57), data address is: [spine_coco](https://paddledet.bj.bcebos.com/data/spine_coco.tar). The dataset has a small number of images, which can be used to train the S2ANet model quickly.
## Start Training
### 1. Install the rotating frame IOU and calculate the OP
Rotate box IoU calculate [ext_op](../../ppdet/ext_op) is a reference PaddlePaddle [custom external operator](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/new_custom_op.html).
To use the rotating frame IOU to calculate the OP, the following conditions must be met:
- PaddlePaddle >= 2.1.1
- GCC == 8.2
Docker images are recommended paddle:2.1.1-gpu-cuda10.1-cudnn7。
Run the following command to download the image and start the container:
```
sudo nvidia-docker run -it --name paddle_s2anet -v $PWD:/paddle --network=host registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.1-cudnn7 /bin/bash
```
If the PaddlePaddle are installed in the mirror, go to python3.7 and run the following code to check whether the PaddlePaddle are installed properly:
```
import paddle
print(paddle.__version__)
paddle.utils.run_check()
```
enter `ppdet/ext_op` directory, install:
```
python3.7 setup.py install
```
In Windows, perform the following steps to install it:
(1)Visual Studio (version required >= Visual Studio 2015 Update3);
(2)Go to Start --> Visual Studio 2017 --> X64 native Tools command prompt for VS 2017;
(3)Setting Environment Variables:`set DISTUTILS_USE_SDK=1`
(4)Enter `PaddleDetection/ppdet/ext_op` directory,use `python3.7 setup.py install` to install。
After the installation, test whether the custom OP can compile normally and calculate the results:
```
cd PaddleDetecetion/ppdet/ext_op
python3.7 test.py
```
### 2. Train
**Attention:**
In the configuration file, the learning rate is set based on the eight-card GPU training. If the single-card GPU training is used, set the learning rate to 1/8 of the original value.
Single GPU Training
```bash
export CUDA_VISIBLE_DEVICES=0
python3.7 tools/train.py -c configs/dota/s2anet_1x_spine.yml
```
Multiple GPUs Training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python3.7 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/dota/s2anet_1x_spine.yml
```
You can use `--eval`to enable train-by-test.
### 3. Evaluation
```bash
python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams
# Use a trained model to evaluate
python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams
```
**Attention:**
(1) The DOTA dataset is trained together with train and val data as a training set, and the evaluation dataset configuration needs to be customized when evaluating the DOTA dataset.
(2) Bone dataset is transformed from segmented data. As there is little difference between different types of discs for detection tasks, and the score obtained by S2ANET algorithm is low, the default threshold for evaluation is 0.5, a low mAP is normal. You are advised to view the detection result visually.
### 4. Prediction
Executing the following command will save the image prediction results to the `output` folder.
```bash
python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
```
Prediction using models that provide training:
```bash
python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
```
### 5. DOTA Data evaluation
Execute the following command, will save each image prediction result in `output` folder txt text with the same folder name.
```
python3.7 tools/infer.py -c configs/dota/s2anet_alignconv_2x_dota.yml -o weights=./weights/s2anet_alignconv_2x_dota.pdparams --infer_dir=dota_test_images --draw_threshold=0.05 --save_txt=True --output_dir=output
```
Please refer to [DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit) generate assessment files, Assessment file format, please refer to [DOTA Test](http://captain.whu.edu.cn/DOTAweb/tasks.html), and generate the zip file, each class a txt file, every row in the txt file format for: `image_id score x1 y1 x2 y2 x3 y3 x4 y4` You can also reference the `dataset/dota_coco/dota_generate_test_result.py` script to generate an evaluation file and submit it to the server.
## Model Library
### S2ANet Model
| Model | Conv Type | mAP | Model Download | Configuration File |
|:-----------:|:----------:|:--------:| :----------:| :---------: |
| S2ANet | Conv | 71.42 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_conv_2x_dota.yml) |
| S2ANet | AlignConv | 74.0 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_alignconv_2x_dota.yml) |
**Attention:** `multiclass_nms` is used here, which is slightly different from the original author's use of NMS.
## Predict Deployment
The inputs of the `multiclass_nms` operator in Paddle support quadrilateral inputs, so deployment can be done without relying on the rotating frame IOU operator.
Please refer to the deployment tutorial[Predict deployment](../../deploy/README_en.md)
## Citations
```
@article{han2021align,
author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Align Deep Features for Oriented Object Detection},
year={2021},
pages={1-11},
doi={10.1109/TGRS.2021.3062048}}
@inproceedings{xia2018dota,
title={DOTA: A large-scale dataset for object detection in aerial images},
author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={3974--3983},
year={2018}
}
```
简体中文 | [English](README_en.md)
# 旋转框检测
## 内容
- [简介](#简介)
- [模型库](#模型库)
- [数据准备](#数据准备)
- [安装依赖](#安装依赖)
## 简介
旋转框常用于检测带有角度信息的矩形框,即矩形框的宽和高不再与图像坐标轴平行。相较于水平矩形框,旋转矩形框一般包括更少的背景信息。旋转框检测常用于遥感等场景中。
## 模型库
| 模型 | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 |
|:---:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
| [S2ANet](./s2anet/README.md) | 74.0 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_alignconv_2x_dota.yml) |
**注意:**
- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)** 调整学习率。
## 数据准备
### DOTA数据准备
DOTA数据集是一个大规模的遥感图像数据集,包含旋转框和水平框的标注。可以从[DOTA数据集官网](https://captain-whu.github.io/DOTA/)下载数据集并解压,解压后的数据集目录结构如下所示:
```
${DOTA_ROOT}
├── test
│ └── images
├── train
│ ├── images
│ └── labelTxt
└── val
├── images
└── labelTxt
```
DOTA数据集分辨率较高,因此一般在训练和测试之前对图像进行切图,使用单尺度进行切图可以使用以下命令:
```
python configs/rotate/tools/prepare_data.py \
--input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \
--output_dir ${OUTPUT_DIR}/trainval1024/ \
--coco_json_file DOTA_trainval1024.json \
--subsize 1024 \
--gap 200 \
--rates 1.0
```
使用多尺度进行切图可以使用以下命令:
```
python configs/rotate/tools/prepare_data.py \
--input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \
--output_dir ${OUTPUT_DIR}/trainval/ \
--coco_json_file DOTA_trainval1024.json \
--subsize 1024 \
--gap 500 \
--rates 0.5 1.0 1.5 \
```
对于无标注的数据可以设置`--image_only`进行切图,如下所示:
```
python configs/rotate/tools/prepare_data.py \
--input_dirs ${DOTA_ROOT}/test/ \
--output_dir ${OUTPUT_DIR}/test1024/ \
--coco_json_file DOTA_test1024.json \
--subsize 1024 \
--gap 200 \
--rates 1.0 \
--image_only
```
## 安装依赖
旋转框检测模型需要依赖外部算子进行训练,评估等。Linux环境下,你可以执行以下命令进行编译安装
```
cd ppdet/ext_op
python setup.py install
```
Windows环境请按照如下步骤安装:
(1)准备Visual Studio (版本需要>=Visual Studio 2015 update3),这里以VS2017为例;
(2)点击开始-->Visual Studio 2017-->适用于 VS 2017 的x64本机工具命令提示;
(3)设置环境变量:`set DISTUTILS_USE_SDK=1`
(4)进入`PaddleDetection/ppdet/ext_op`目录,通过`python setup.py install`命令进行安装。
安装完成后,可以执行`ppdet/ext_op/unittest`下的单测验证外部op是否正确安装
English | [简体中文](README.md)
# Rotated Object Detection
## Table of Contents
- [Introduction](#Introduction)
- [Model Zoo](#Model-Zoo)
- [Data Preparation](#Data-Preparation)
- [Installation](#Installation)
## Introduction
Rotated object detection is used to detect rectangular bounding boxes with angle information, that is, the long and short sides of the rectangular bounding box are no longer parallel to the image coordinate axes. Oriented bounding boxes generally contain less background information than horizontal bounding boxes. Rotated object detection is often used in remote sensing scenarios.
## Model Zoo
| Model | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config |
|:---:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
| [S2ANet](./s2anet/README.md) | 74.0 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_alignconv_2x_dota.yml) |
**Notes:**
- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)**.
## Data Preparation
### DOTA Dataset preparation
The DOTA dataset is a large-scale remote sensing image dataset containing annotations of oriented and horizontal bounding boxes. The dataset can be download from [Official Website of DOTA Dataset](https://captain-whu.github.io/DOTA/). When the dataset is decompressed, its directory structure is shown as follows.
```
${DOTA_ROOT}
├── test
│ └── images
├── train
│ ├── images
│ └── labelTxt
└── val
├── images
└── labelTxt
```
The image resolution of DOTA dataset is relatively high, so we usually slice the images before training and testing. To slice the images with a single scale, you can use the command below
```
python configs/rotate/tools/prepare_data.py \
--input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \
--output_dir ${OUTPUT_DIR}/trainval1024/ \
--coco_json_file DOTA_trainval1024.json \
--subsize 1024 \
--gap 200 \
--rates 1.0
```
To slice the images with multiple scales, you can use the command below
```
python configs/rotate/tools/prepare_data.py \
--input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \
--output_dir ${OUTPUT_DIR}/trainval/ \
--coco_json_file DOTA_trainval1024.json \
--subsize 1024 \
--gap 500 \
--rates 0.5 1.0 1.5 \
```
For data without annotations, you should set `--image_only` as follows
```
python configs/rotate/tools/prepare_data.py \
--input_dirs ${DOTA_ROOT}/test/ \
--output_dir ${OUTPUT_DIR}/test1024/ \
--coco_json_file DOTA_test1024.json \
--subsize 1024 \
--gap 200 \
--rates 1.0 \
--image_only
```
## Installation
Models of rotated object detection depend on external operators for training, evaluation, etc. In Linux environment, you can execute the following command to compile and install.
```
cd ppdet/ext_op
python setup.py install
```
In Windows environment, perform the following steps to install it:
(1)Visual Studio (version required >= Visual Studio 2015 Update3);
(2)Go to Start --> Visual Studio 2017 --> X64 native Tools command prompt for VS 2017;
(3)Setting Environment Variables:set DISTUTILS_USE_SDK=1
(4)Enter `ppdet/ext_op` directory,use `python setup.py install` to install。
After the installation, you can execute the unittest of `ppdet/ext_op/unittest` to verify whether the external oprators is installed correctly.
# S2ANet模型
## 内容
- [简介](#简介)
- [开始训练](#开始训练)
- [模型库](#模型库)
- [预测部署](#预测部署)
## 简介
[S2ANet](https://arxiv.org/pdf/2008.09397.pdf)是用于检测旋转框的模型,在DOTA 1.0数据集上单尺度训练能达到74.0的mAP.
## 开始训练
### 1. 训练
GPU单卡训练
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml
```
GPU多卡训练
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml
```
可以通过`--eval`开启边训练边测试。
### 3. 评估
```bash
python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams
# 使用提供训练好的模型评估
python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams
```
### 4. 预测
执行如下命令,会将图像预测结果保存到`output`文件夹下。
```bash
python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
```
使用提供训练好的模型预测:
```bash
python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
```
### 5. DOTA数据评估
执行如下命令,会在`output`文件夹下将每个图像预测结果保存到同文件夹名的txt文本中。
```
python tools/infer.py -c configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml -o weights=./weights/s2anet_alignconv_2x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output --visualize=False --save_results=True
```
参考[DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), 评估DOTA数据集需要生成一个包含所有检测结果的zip文件,每一类的检测结果储存在一个txt文件中,txt文件中每行格式为:`image_name score x1 y1 x2 y2 x3 y3 x4 y4`。将生成的zip文件提交到[DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html)的Task1进行评估。你可以执行以下命令生成评估文件
```
python configs/rotate/tools/generate_result.py --pred_txt_dir=output/ --output_dir=submit/ --data_type=dota10
zip -r submit.zip submit
```
## 模型库
### S2ANet模型
| 模型 | Conv类型 | mAP | 模型下载 | 配置文件 |
|:-----------:|:----------:|:--------:| :----------:| :---------: |
| S2ANet | Conv | 71.42 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_conv_2x_dota.yml) |
| S2ANet | AlignConv | 74.0 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) |
**注意:** 这里使用`multiclass_nms`,与原作者使用nms略有不同。
## 预测部署
Paddle中`multiclass_nms`算子的输入支持四边形输入,因此部署时可以不需要依赖旋转框IOU计算算子。
部署教程请参考[预测部署](../../deploy/README.md)
## Citations
```
@article{han2021align,
author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Align Deep Features for Oriented Object Detection},
year={2021},
pages={1-11},
doi={10.1109/TGRS.2021.3062048}}
@inproceedings{xia2018dota,
title={DOTA: A large-scale dataset for object detection in aerial images},
author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={3974--3983},
year={2018}
}
```
# S2ANet Model
## Content
- [S2ANet Model](#s2anet-model)
- [Content](#content)
- [Introduction](#introduction)
- [Start Training](#start-training)
- [1. Train](#1-train)
- [2. Evaluation](#2-evaluation)
- [3. Prediction](#3-prediction)
- [4. DOTA Data evaluation](#4-dota-data-evaluation)
- [Model Library](#model-library)
- [S2ANet Model](#s2anet-model-1)
- [Predict Deployment](#predict-deployment)
- [Citations](#citations)
## Introduction
[S2ANet](https://arxiv.org/pdf/2008.09397.pdf) is used to detect rotated objects and acheives 74.0 mAP on DOTA 1.0 dataset.
## Start Training
### 2. Train
Single GPU Training
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml
```
Multiple GPUs Training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml
```
You can use `--eval`to enable train-by-test.
### 3. Evaluation
```bash
python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams
# Use a trained model to evaluate
python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams
```
### 4. Prediction
Executing the following command will save the image prediction results to the `output` folder.
```bash
python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
```
Prediction using models that provide training:
```bash
python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
```
### 5. DOTA Data evaluation
Execute the following command, will save each image prediction result in `output` folder txt text with the same folder name.
```
python tools/infer.py -c configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml -o weights=./weights/s2anet_alignconv_2x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output --visualize=False --save_results=True
```
Refering to [DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), You need to submit a zip file containing results for all test images for evaluation. The detection results of each category are stored in a txt file, each line of which is in the following format
`image_id score x1 y1 x2 y2 x3 y3 x4 y4`. To evaluate, you should submit the generated zip file to the Task1 of [DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html). You can execute the following command to generate the file
```
python configs/rotate/tools/generate_result.py --pred_txt_dir=output/ --output_dir=submit/ --data_type=dota10
zip -r submit.zip submit
```
## Model Library
### S2ANet Model
| Model | Conv Type | mAP | Model Download | Configuration File |
|:-----------:|:----------:|:--------:| :----------:| :---------: |
| S2ANet | Conv | 71.42 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_conv_2x_dota.yml) |
| S2ANet | AlignConv | 74.0 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) |
**Attention:** `multiclass_nms` is used here, which is slightly different from the original author's use of NMS.
## Predict Deployment
The inputs of the `multiclass_nms` operator in Paddle support quadrilateral inputs, so deployment can be done without relying on the rotating frame IOU operator.
Please refer to the deployment tutorial[Predict deployment](../../deploy/README_en.md)
## Citations
```
@article{han2021align,
author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Align Deep Features for Oriented Object Detection},
year={2021},
pages={1-11},
doi={10.1109/TGRS.2021.3062048}}
@inproceedings{xia2018dota,
title={DOTA: A large-scale dataset for object detection in aerial images},
author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={3974--3983},
year={2018}
}
```
...@@ -2,7 +2,7 @@ worker_num: 4 ...@@ -2,7 +2,7 @@ worker_num: 4
TrainReader: TrainReader:
sample_transforms: sample_transforms:
- Decode: {} - Decode: {}
- Rbox2Poly: {} - Poly2Array: {}
- RandomRFlip: {} - RandomRFlip: {}
- RResize: {target_size: [1024, 1024], keep_ratio: True, interp: 2} - RResize: {target_size: [1024, 1024], keep_ratio: True, interp: 2}
- Poly2RBox: {rbox_type: 'le135'} - Poly2RBox: {rbox_type: 'le135'}
...@@ -19,6 +19,7 @@ TrainReader: ...@@ -19,6 +19,7 @@ TrainReader:
EvalReader: EvalReader:
sample_transforms: sample_transforms:
- Decode: {} - Decode: {}
- Poly2Array: {}
- RResize: {target_size: [1024, 1024], keep_ratio: True, interp: 2} - RResize: {target_size: [1024, 1024], keep_ratio: True, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {} - Permute: {}
......
_BASE_: [ _BASE_: [
'../datasets/spine_coco.yml', '../../datasets/spine_coco.yml',
'../runtime.yml', '../../runtime.yml',
'_base_/s2anet_optimizer_1x.yml', '_base_/s2anet_optimizer_1x.yml',
'_base_/s2anet.yml', '_base_/s2anet.yml',
'_base_/s2anet_reader.yml', '_base_/s2anet_reader.yml',
...@@ -9,7 +9,7 @@ _BASE_: [ ...@@ -9,7 +9,7 @@ _BASE_: [
weights: output/s2anet_1x_spine/model_final weights: output/s2anet_1x_spine/model_final
pretrain_weights: https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams pretrain_weights: https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams
# for 8 card # for 4 card
LearningRate: LearningRate:
base_lr: 0.01 base_lr: 0.01
schedulers: schedulers:
......
_BASE_: [ _BASE_: [
'../datasets/dota.yml', '../../datasets/dota.yml',
'../runtime.yml', '../../runtime.yml',
'_base_/s2anet_optimizer_2x.yml', '_base_/s2anet_optimizer_2x.yml',
'_base_/s2anet.yml', '_base_/s2anet.yml',
'_base_/s2anet_reader.yml', '_base_/s2anet_reader.yml',
......
_BASE_: [ _BASE_: [
'../datasets/dota.yml', '../../datasets/dota.yml',
'../runtime.yml', '../../runtime.yml',
'_base_/s2anet_optimizer_2x.yml', '_base_/s2anet_optimizer_2x.yml',
'_base_/s2anet.yml', '_base_/s2anet.yml',
'_base_/s2anet_reader.yml', '_base_/s2anet_reader.yml',
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Reference: https://github.com/CAPTAIN-WHU/DOTA_devkit
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import json
import cv2
from tqdm import tqdm
from multiprocessing import Pool
def load_dota_info(image_dir, anno_dir, file_name, ext=None):
base_name, extension = os.path.splitext(file_name)
if ext and (extension != ext and extension not in ext):
return None
info = {'image_file': os.path.join(image_dir, file_name), 'annotation': []}
anno_file = os.path.join(anno_dir, base_name + '.txt')
if not os.path.exists(anno_file):
return info
with open(anno_file, 'r') as f:
for line in f:
items = line.strip().split()
if (len(items) < 9):
continue
anno = {
'poly': list(map(float, items[:8])),
'name': items[8],
'difficult': '0' if len(items) == 9 else items[9],
}
info['annotation'].append(anno)
return info
def load_dota_infos(root_dir, num_process=8, ext=None):
image_dir = os.path.join(root_dir, 'images')
anno_dir = os.path.join(root_dir, 'labelTxt')
data_infos = []
if num_process > 1:
pool = Pool(num_process)
results = []
for file_name in os.listdir(image_dir):
results.append(
pool.apply_async(load_dota_info, (image_dir, anno_dir,
file_name, ext)))
pool.close()
pool.join()
for result in results:
info = result.get()
if info:
data_infos.append(info)
else:
for file_name in os.listdir(image_dir):
info = load_dota_info(image_dir, anno_dir, file_name, ext)
if info:
data_infos.append(info)
return data_infos
def process_single_sample(info, image_id, class_names):
image_file = info['image_file']
single_image = dict()
single_image['file_name'] = os.path.split(image_file)[-1]
single_image['id'] = image_id
image = cv2.imread(image_file)
height, width, _ = image.shape
single_image['width'] = width
single_image['height'] = height
# process annotation field
single_objs = []
objects = info['annotation']
for obj in objects:
poly, name, difficult = obj['poly'], obj['name'], obj['difficult']
if difficult == '2':
continue
single_obj = dict()
single_obj['category_id'] = class_names.index(name) + 1
single_obj['segmentation'] = [poly]
single_obj['iscrowd'] = 0
xmin, ymin, xmax, ymax = min(poly[0::2]), min(poly[1::2]), max(poly[
0::2]), max(poly[1::2])
width, height = xmax - xmin, ymax - ymin
single_obj['bbox'] = [xmin, ymin, width, height]
single_obj['area'] = height * width
single_obj['image_id'] = image_id
single_objs.append(single_obj)
return (single_image, single_objs)
def data_to_coco(infos, output_path, class_names, num_process):
data_dict = dict()
data_dict['categories'] = []
for i, name in enumerate(class_names):
data_dict['categories'].append({
'id': i + 1,
'name': name,
'supercategory': name
})
pbar = tqdm(total=len(infos), desc='data to coco')
images, annotations = [], []
if num_process > 1:
pool = Pool(num_process)
results = []
for i, info in enumerate(infos):
image_id = i + 1
results.append(
pool.apply_async(
process_single_sample, (info, image_id, class_names),
callback=lambda x: pbar.update()))
pool.close()
pool.join()
for result in results:
single_image, single_anno = result.get()
images.append(single_image)
annotations += single_anno
else:
for i, info in enumerate(infos):
image_id = i + 1
single_image, single_anno = process_single_sample(info, image_id,
class_names)
images.append(single_image)
annotations += single_anno
pbar.update()
pbar.close()
for i, anno in enumerate(annotations):
anno['id'] = i + 1
data_dict['images'] = images
data_dict['annotations'] = annotations
with open(output_path, 'w') as f:
json.dump(data_dict, f)
...@@ -22,21 +22,22 @@ from functools import partial ...@@ -22,21 +22,22 @@ from functools import partial
from shapely.geometry import Polygon from shapely.geometry import Polygon
import argparse import argparse
nms_thresh = 0.1 wordname_15 = [
class_name_15 = [
'plane', 'baseball-diamond', 'bridge', 'ground-track-field', 'plane', 'baseball-diamond', 'bridge', 'ground-track-field',
'small-vehicle', 'large-vehicle', 'ship', 'tennis-court', 'small-vehicle', 'large-vehicle', 'ship', 'tennis-court',
'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout', 'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout',
'harbor', 'swimming-pool', 'helicopter' 'harbor', 'swimming-pool', 'helicopter'
] ]
class_name_16 = [ wordname_16 = wordname_15 + ['container-crane']
'plane', 'baseball-diamond', 'bridge', 'ground-track-field',
'small-vehicle', 'large-vehicle', 'ship', 'tennis-court', wordname_18 = wordname_16 + ['airport', 'helipad']
'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout',
'harbor', 'swimming-pool', 'helicopter', 'container-crane' DATA_CLASSES = {
] 'dota10': wordname_15,
'dota15': wordname_16,
'dota20': wordname_18
}
def rbox_iou(g, p): def rbox_iou(g, p):
...@@ -99,14 +100,11 @@ def py_cpu_nms_poly_fast(dets, thresh): ...@@ -99,14 +100,11 @@ def py_cpu_nms_poly_fast(dets, thresh):
h = np.maximum(0.0, yy2 - yy1) h = np.maximum(0.0, yy2 - yy1)
hbb_inter = w * h hbb_inter = w * h
hbb_ovr = hbb_inter / (areas[i] + areas[order[1:]] - hbb_inter) hbb_ovr = hbb_inter / (areas[i] + areas[order[1:]] - hbb_inter)
# h_keep_inds = np.where(hbb_ovr == 0)[0]
h_inds = np.where(hbb_ovr > 0)[0] h_inds = np.where(hbb_ovr > 0)[0]
tmp_order = order[h_inds + 1] tmp_order = order[h_inds + 1]
for j in range(tmp_order.size): for j in range(tmp_order.size):
iou = rbox_iou(polys[i], polys[tmp_order[j]]) iou = rbox_iou(polys[i], polys[tmp_order[j]])
hbb_ovr[h_inds[j]] = iou hbb_ovr[h_inds[j]] = iou
# ovr.append(iou)
# ovr_index.append(tmp_order[j])
try: try:
if math.isnan(ovr[0]): if math.isnan(ovr[0]):
...@@ -148,7 +146,7 @@ def nmsbynamedict(nameboxdict, nms, thresh): ...@@ -148,7 +146,7 @@ def nmsbynamedict(nameboxdict, nms, thresh):
return nameboxnmsdict return nameboxnmsdict
def merge_single(output_dir, nms, pred_class_lst): def merge_single(output_dir, nms, nms_thresh, pred_class_lst):
""" """
Args: Args:
output_dir: output_dir output_dir: output_dir
...@@ -198,20 +196,20 @@ def merge_single(output_dir, nms, pred_class_lst): ...@@ -198,20 +196,20 @@ def merge_single(output_dir, nms, pred_class_lst):
f_out.write(outline + '\n') f_out.write(outline + '\n')
def dota_generate_test_result(pred_txt_dir, def generate_result(pred_txt_dir,
output_dir='output', output_dir='output',
dota_version='v1.0'): class_names=wordname_15,
nms_thresh=0.1):
""" """
pred_txt_dir: dir of pred txt pred_txt_dir: dir of pred txt
output_dir: dir of output output_dir: dir of output
dota_version: dota_version v1.0 or v1.5 or v2.0 class_names: class names of data
""" """
pred_txt_list = glob.glob("{}/*.txt".format(pred_txt_dir)) pred_txt_list = glob.glob("{}/*.txt".format(pred_txt_dir))
# step1: summary pred bbox # step1: summary pred bbox
pred_classes = {} pred_classes = {}
class_lst = class_name_15 if dota_version == 'v1.0' else class_name_16 for class_name in class_names:
for class_name in class_lst:
pred_classes[class_name] = [] pred_classes[class_name] = []
for current_txt in pred_txt_list: for current_txt in pred_txt_list:
...@@ -233,26 +231,36 @@ def dota_generate_test_result(pred_txt_dir, ...@@ -233,26 +231,36 @@ def dota_generate_test_result(pred_txt_dir,
pred_classes_lst.append((class_name, pred_classes[class_name])) pred_classes_lst.append((class_name, pred_classes[class_name]))
# step2: merge # step2: merge
pool = Pool(len(class_lst)) pool = Pool(len(class_names))
nms = py_cpu_nms_poly_fast nms = py_cpu_nms_poly_fast
mergesingle_fn = partial(merge_single, output_dir, nms) mergesingle_fn = partial(merge_single, output_dir, nms, nms_thresh)
pool.map(mergesingle_fn, pred_classes_lst) pool.map(mergesingle_fn, pred_classes_lst)
if __name__ == '__main__': def parse_args():
parser = argparse.ArgumentParser(description='dota anno to coco') parser = argparse.ArgumentParser(description='generate test results')
parser.add_argument('--pred_txt_dir', help='path of pred txt dir') parser.add_argument('--pred_txt_dir', type=str, help='path of pred txt dir')
parser.add_argument(
'--output_dir', type=str, default='output', help='path of output dir')
parser.add_argument( parser.add_argument(
'--output_dir', help='path of output dir', default='output') '--data_type', type=str, default='dota10', help='data type')
parser.add_argument( parser.add_argument(
'--dota_version', '--nms_thresh',
help='dota_version, v1.0 or v1.5 or v2.0', type=float,
type=str, default=0.1,
default='v1.0') help='nms threshold whild merging results')
return parser.parse_args()
if __name__ == '__main__':
args = parse_args()
output_dir = args.output_dir
if not os.path.exists(output_dir):
os.makedirs(output_dir)
args = parser.parse_args() class_names = DATA_CLASSES[args.data_type]
# process generate_result(args.pred_txt_dir, output_dir, class_names)
dota_generate_test_result(args.pred_txt_dir, args.output_dir,
args.dota_version)
print('done!') print('done!')
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import argparse
from convert import load_dota_infos, data_to_coco
from slicebase import SliceBase
wordname_15 = [
'plane', 'baseball-diamond', 'bridge', 'ground-track-field',
'small-vehicle', 'large-vehicle', 'ship', 'tennis-court',
'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout',
'harbor', 'swimming-pool', 'helicopter'
]
wordname_16 = wordname_15 + ['container-crane']
wordname_18 = wordname_16 + ['airport', 'helipad']
DATA_CLASSES = {
'dota10': wordname_15,
'dota15': wordname_16,
'dota20': wordname_18
}
def parse_args():
parser = argparse.ArgumentParser('prepare data for training')
parser.add_argument(
'--input_dirs',
nargs='+',
type=str,
default=None,
help='input dirs which contain image and labelTxt dir')
parser.add_argument(
'--output_dir',
type=str,
default=None,
help='output dirs which contain image and labelTxt dir and coco style json file'
)
parser.add_argument(
'--coco_json_file',
type=str,
default='',
help='coco json annotation files')
parser.add_argument('--subsize', type=int, default=1024, help='patch size')
parser.add_argument('--gap', type=int, default=200, help='step size')
parser.add_argument(
'--data_type', type=str, default='dota10', help='data type')
parser.add_argument(
'--rates',
nargs='+',
type=float,
default=[1.],
help='scales for multi-sclace training')
parser.add_argument(
'--nproc', type=int, default=8, help='the processor number')
parser.add_argument(
'--iof_thr',
type=float,
default=0.5,
help='the minimal iof between a object and a window')
parser.add_argument(
'--image_only',
action='store_true',
default=False,
help='only processing image')
args = parser.parse_args()
return args
def load_dataset(input_dir, nproc, data_type):
if 'dota' in data_type.lower():
infos = load_dota_infos(input_dir, nproc)
else:
raise ValueError('only dota dataset is supported now')
return infos
def main():
args = parse_args()
infos = []
for input_dir in args.input_dirs:
infos += load_dataset(input_dir, args.nproc, args.data_type)
slicer = SliceBase(
args.gap,
args.subsize,
args.iof_thr,
num_process=args.nproc,
image_only=args.image_only)
slicer.slice_data(infos, args.rates, args.output_dir)
if args.coco_json_file:
infos = load_dota_infos(args.output_dir, args.nproc)
coco_json_file = os.path.join(args.output_dir, args.coco_json_file)
class_names = DATA_CLASSES[args.data_type]
data_to_coco(infos, coco_json_file, class_names, args.nproc)
if __name__ == '__main__':
main()
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Reference: https://github.com/CAPTAIN-WHU/DOTA_devkit
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import math
import copy
from numbers import Number
from multiprocessing import Pool
import cv2
import numpy as np
from tqdm import tqdm
import shapely.geometry as shgeo
def choose_best_pointorder_fit_another(poly1, poly2):
"""
To make the two polygons best fit with each point
"""
x1, y1, x2, y2, x3, y3, x4, y4 = poly1
combinate = [
np.array([x1, y1, x2, y2, x3, y3, x4, y4]),
np.array([x2, y2, x3, y3, x4, y4, x1, y1]),
np.array([x3, y3, x4, y4, x1, y1, x2, y2]),
np.array([x4, y4, x1, y1, x2, y2, x3, y3])
]
dst_coordinate = np.array(poly2)
distances = np.array(
[np.sum((coord - dst_coordinate)**2) for coord in combinate])
sorted = distances.argsort()
return combinate[sorted[0]]
def cal_line_length(point1, point2):
return math.sqrt(
math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1], 2))
class SliceBase(object):
def __init__(self,
gap=512,
subsize=1024,
thresh=0.7,
choosebestpoint=True,
ext='.png',
padding=True,
num_process=8,
image_only=False):
self.gap = gap
self.subsize = subsize
self.slide = subsize - gap
self.thresh = thresh
self.choosebestpoint = choosebestpoint
self.ext = ext
self.padding = padding
self.num_process = num_process
self.image_only = image_only
def get_windows(self, height, width):
windows = []
left, up = 0, 0
while (left < width):
if (left + self.subsize >= width):
left = max(width - self.subsize, 0)
up = 0
while (up < height):
if (up + self.subsize >= height):
up = max(height - self.subsize, 0)
right = min(left + self.subsize, width - 1)
down = min(up + self.subsize, height - 1)
windows.append((left, up, right, down))
if (up + self.subsize >= height):
break
else:
up = up + self.slide
if (left + self.subsize >= width):
break
else:
left = left + self.slide
return windows
def slice_image_single(self, image, windows, output_dir, output_name):
image_dir = os.path.join(output_dir, 'images')
for (left, up, right, down) in windows:
image_name = output_name + str(left) + '___' + str(up) + self.ext
subimg = copy.deepcopy(image[up:up + self.subsize, left:left +
self.subsize])
h, w, c = subimg.shape
if (self.padding):
outimg = np.zeros((self.subsize, self.subsize, 3))
outimg[0:h, 0:w, :] = subimg
cv2.imwrite(os.path.join(image_dir, image_name), outimg)
else:
cv2.imwrite(os.path.join(image_dir, image_name), subimg)
def iof(self, poly1, poly2):
inter_poly = poly1.intersection(poly2)
inter_area = inter_poly.area
poly1_area = poly1.area
half_iou = inter_area / poly1_area
return inter_poly, half_iou
def translate(self, poly, left, up):
n = len(poly)
out_poly = np.zeros(n)
for i in range(n // 2):
out_poly[i * 2] = int(poly[i * 2] - left)
out_poly[i * 2 + 1] = int(poly[i * 2 + 1] - up)
return out_poly
def get_poly4_from_poly5(self, poly):
distances = [
cal_line_length((poly[i * 2], poly[i * 2 + 1]),
(poly[(i + 1) * 2], poly[(i + 1) * 2 + 1]))
for i in range(int(len(poly) / 2 - 1))
]
distances.append(
cal_line_length((poly[0], poly[1]), (poly[8], poly[9])))
pos = np.array(distances).argsort()[0]
count = 0
out_poly = []
while count < 5:
if (count == pos):
out_poly.append(
(poly[count * 2] + poly[(count * 2 + 2) % 10]) / 2)
out_poly.append(
(poly[(count * 2 + 1) % 10] + poly[(count * 2 + 3) % 10]) /
2)
count = count + 1
elif (count == (pos + 1) % 5):
count = count + 1
continue
else:
out_poly.append(poly[count * 2])
out_poly.append(poly[count * 2 + 1])
count = count + 1
return out_poly
def slice_anno_single(self, annos, windows, output_dir, output_name):
anno_dir = os.path.join(output_dir, 'labelTxt')
for (left, up, right, down) in windows:
image_poly = shgeo.Polygon(
[(left, up), (right, up), (right, down), (left, down)])
anno_file = output_name + str(left) + '___' + str(up) + '.txt'
with open(os.path.join(anno_dir, anno_file), 'w') as f:
for anno in annos:
gt_poly = shgeo.Polygon(
[(anno['poly'][0], anno['poly'][1]),
(anno['poly'][2], anno['poly'][3]),
(anno['poly'][4], anno['poly'][5]),
(anno['poly'][6], anno['poly'][7])])
if gt_poly.area <= 0:
continue
inter_poly, iof = self.iof(gt_poly, image_poly)
if iof == 1:
final_poly = self.translate(anno['poly'], left, up)
elif iof > 0:
inter_poly = shgeo.polygon.orient(inter_poly, sign=1)
out_poly = list(inter_poly.exterior.coords)[0:-1]
if len(out_poly) < 4 or len(out_poly) > 5:
continue
final_poly = []
for p in out_poly:
final_poly.append(p[0])
final_poly.append(p[1])
if len(out_poly) == 5:
final_poly = self.get_poly4_from_poly5(final_poly)
if self.choosebestpoint:
final_poly = choose_best_pointorder_fit_another(
final_poly, anno['poly'])
final_poly = self.translate(final_poly, left, up)
final_poly = np.clip(final_poly, 1, self.subsize)
else:
continue
outline = ' '.join(list(map(str, final_poly)))
if iof >= self.thresh:
outline = outline + ' ' + anno['name'] + ' ' + str(anno[
'difficult'])
else:
outline = outline + ' ' + anno['name'] + ' ' + '2'
f.write(outline + '\n')
def slice_data_single(self, info, rate, output_dir):
file_name = info['image_file']
base_name = os.path.splitext(os.path.split(file_name)[-1])[0]
base_name = base_name + '__' + str(rate) + '__'
img = cv2.imread(file_name)
if img.shape == ():
return
if (rate != 1):
resize_img = cv2.resize(
img, None, fx=rate, fy=rate, interpolation=cv2.INTER_CUBIC)
else:
resize_img = img
height, width, _ = resize_img.shape
windows = self.get_windows(height, width)
self.slice_image_single(resize_img, windows, output_dir, base_name)
if not self.image_only:
self.slice_anno_single(info['annotation'], windows, output_dir,
base_name)
def check_or_mkdirs(self, path):
if not os.path.exists(path):
os.makedirs(path, exist_ok=True)
def slice_data(self, infos, rates, output_dir):
"""
Args:
infos (list[dict]): data_infos
rates (float, list): scale rates
output_dir (str): output directory
"""
if isinstance(rates, Number):
rates = [rates, ]
self.check_or_mkdirs(output_dir)
self.check_or_mkdirs(os.path.join(output_dir, 'images'))
if not self.image_only:
self.check_or_mkdirs(os.path.join(output_dir, 'labelTxt'))
pbar = tqdm(total=len(rates) * len(infos), desc='slicing data')
if self.num_process <= 1:
for rate in rates:
for info in infos:
self.slice_data_single(info, rate, output_dir)
pbar.update()
else:
pool = Pool(self.num_process)
for rate in rates:
for info in infos:
pool.apply_async(
self.slice_data_single, (info, rate, output_dir),
callback=lambda x: pbar.update())
pool.close()
pool.join()
pbar.close()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os.path as osp
import json
import glob
import cv2
import argparse
# add python path of PadleDetection to sys.path
parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3)))
if parent_path not in sys.path:
sys.path.append(parent_path)
from ppdet.modeling.bbox_utils import poly2rbox
from ppdet.utils.logger import setup_logger
logger = setup_logger(__name__)
class_name_15 = [
'plane', 'baseball-diamond', 'bridge', 'ground-track-field',
'small-vehicle', 'large-vehicle', 'ship', 'tennis-court',
'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout',
'harbor', 'swimming-pool', 'helicopter'
]
class_name_16 = [
'plane', 'baseball-diamond', 'bridge', 'ground-track-field',
'small-vehicle', 'large-vehicle', 'ship', 'tennis-court',
'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout',
'harbor', 'swimming-pool', 'helicopter', 'container-crane'
]
def dota_2_coco(image_dir,
txt_dir,
json_path='dota_coco.json',
is_obb=True,
dota_version='v1.0'):
"""
image_dir: image dir
txt_dir: txt label dir
json_path: json save path
is_obb: is obb or not
dota_version: dota_version v1.0 or v1.5 or v2.0
"""
img_lists = glob.glob("{}/*.png".format(image_dir))
data_dict = {}
data_dict['images'] = []
data_dict['categories'] = []
data_dict['annotations'] = []
inst_count = 0
# categories
class_name2id = {}
if dota_version == 'v1.0':
for class_id, class_name in enumerate(class_name_15):
class_name2id[class_name] = class_id + 1
single_cat = {
'id': class_id + 1,
'name': class_name,
'supercategory': class_name
}
data_dict['categories'].append(single_cat)
for image_id, img_path in enumerate(img_lists):
single_image = {}
basename = osp.basename(img_path)
single_image['file_name'] = basename
single_image['id'] = image_id
img = cv2.imread(img_path)
height, width, _ = img.shape
single_image['width'] = width
single_image['height'] = height
# add image
data_dict['images'].append(single_image)
# annotations
anno_txt_path = osp.join(txt_dir, osp.splitext(basename)[0] + '.txt')
if not osp.exists(anno_txt_path):
logger.warning('path of {} not exists'.format(anno_txt_path))
for line in open(anno_txt_path):
line = line.strip()
# skip
if line.find('imagesource') >= 0 or line.find('gsd') >= 0:
continue
# x1,y1,x2,y2,x3,y3,x4,y4 class_name, is_different
single_obj_anno = line.split(' ')
assert len(single_obj_anno) == 10
single_obj_poly = [float(e) for e in single_obj_anno[0:8]]
single_obj_classname = single_obj_anno[8]
single_obj_different = int(single_obj_anno[9])
single_obj = {}
single_obj['category_id'] = class_name2id[single_obj_classname]
single_obj['segmentation'] = []
single_obj['segmentation'].append(single_obj_poly)
single_obj['iscrowd'] = 0
# rbox or bbox
if is_obb:
polys = [single_obj_poly]
rboxs = poly2rbox(polys)
rbox = rboxs[0].tolist()
single_obj['bbox'] = rbox
single_obj['area'] = rbox[2] * rbox[3]
else:
xmin, ymin, xmax, ymax = min(single_obj_poly[0::2]), min(single_obj_poly[1::2]), \
max(single_obj_poly[0::2]), max(single_obj_poly[1::2])
width, height = xmax - xmin, ymax - ymin
single_obj['bbox'] = xmin, ymin, width, height
single_obj['area'] = width * height
single_obj['image_id'] = image_id
data_dict['annotations'].append(single_obj)
single_obj['id'] = inst_count
inst_count = inst_count + 1
# add annotation
data_dict['annotations'].append(single_obj)
with open(json_path, 'w') as f:
json.dump(data_dict, f)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='dota anno to coco')
parser.add_argument('--images_dir', help='path_to_images')
parser.add_argument('--label_dir', help='path_to_labelTxt', type=str)
parser.add_argument(
'--json_path',
help='save json path',
type=str,
default='dota_coco.json')
parser.add_argument(
'--is_obb', help='is_obb or not', type=bool, default=True)
parser.add_argument(
'--dota_version',
help='dota_version, v1.0 or v1.5 or v2.0',
type=str,
default='v1.0')
args = parser.parse_args()
# process
dota_2_coco(args.images_dir, args.label_dir, args.json_path, args.is_obb,
args.dota_version)
print('done!')
...@@ -145,15 +145,6 @@ class COCODataSet(DetDataset): ...@@ -145,15 +145,6 @@ class COCODataSet(DetDataset):
if not any(np.array(inst['bbox'])): if not any(np.array(inst['bbox'])):
continue continue
# read rbox anno or not
is_rbox_anno = True if len(inst['bbox']) == 5 else False
if is_rbox_anno:
xc, yc, box_w, box_h, angle = inst['bbox']
x1 = xc - box_w / 2.0
y1 = yc - box_h / 2.0
x2 = x1 + box_w
y2 = y1 + box_h
else:
x1, y1, box_w, box_h = inst['bbox'] x1, y1, box_w, box_h = inst['bbox']
x2 = x1 + box_w x2 = x1 + box_w
y2 = y1 + box_h y2 = y1 + box_h
...@@ -162,8 +153,6 @@ class COCODataSet(DetDataset): ...@@ -162,8 +153,6 @@ class COCODataSet(DetDataset):
inst['clean_bbox'] = [ inst['clean_bbox'] = [
round(float(x), 3) for x in [x1, y1, x2, y2] round(float(x), 3) for x in [x1, y1, x2, y2]
] ]
if is_rbox_anno:
inst['clean_rbox'] = [xc, yc, box_w, box_h, angle]
bboxes.append(inst) bboxes.append(inst)
else: else:
logger.warning( logger.warning(
...@@ -178,8 +167,6 @@ class COCODataSet(DetDataset): ...@@ -178,8 +167,6 @@ class COCODataSet(DetDataset):
is_empty = True is_empty = True
gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32) gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32)
if is_rbox_anno:
gt_rbox = np.zeros((num_bbox, 5), dtype=np.float32)
gt_class = np.zeros((num_bbox, 1), dtype=np.int32) gt_class = np.zeros((num_bbox, 1), dtype=np.int32)
is_crowd = np.zeros((num_bbox, 1), dtype=np.int32) is_crowd = np.zeros((num_bbox, 1), dtype=np.int32)
gt_poly = [None] * num_bbox gt_poly = [None] * num_bbox
...@@ -189,13 +176,10 @@ class COCODataSet(DetDataset): ...@@ -189,13 +176,10 @@ class COCODataSet(DetDataset):
catid = box['category_id'] catid = box['category_id']
gt_class[i][0] = self.catid2clsid[catid] gt_class[i][0] = self.catid2clsid[catid]
gt_bbox[i, :] = box['clean_bbox'] gt_bbox[i, :] = box['clean_bbox']
# xc, yc, w, h, theta
if is_rbox_anno:
gt_rbox[i, :] = box['clean_rbox']
is_crowd[i][0] = box['iscrowd'] is_crowd[i][0] = box['iscrowd']
# check RLE format # check RLE format
if 'segmentation' in box and box['iscrowd'] == 1: if 'segmentation' in box and box['iscrowd'] == 1:
gt_poly[i] = [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]] gt_poly[i] = [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]
elif 'segmentation' in box and box['segmentation']: elif 'segmentation' in box and box['segmentation']:
if not np.array(box['segmentation'] if not np.array(box['segmentation']
).size > 0 and not self.allow_empty: ).size > 0 and not self.allow_empty:
...@@ -212,15 +196,6 @@ class COCODataSet(DetDataset): ...@@ -212,15 +196,6 @@ class COCODataSet(DetDataset):
gt_poly) and not self.allow_empty: gt_poly) and not self.allow_empty:
continue continue
if is_rbox_anno:
gt_rec = {
'is_crowd': is_crowd,
'gt_class': gt_class,
'gt_bbox': gt_bbox,
'gt_rbox': gt_rbox,
'gt_poly': gt_poly,
}
else:
gt_rec = { gt_rec = {
'is_crowd': is_crowd, 'is_crowd': is_crowd,
'gt_class': gt_class, 'gt_class': gt_class,
......
...@@ -492,72 +492,3 @@ def get_border(border, size): ...@@ -492,72 +492,3 @@ def get_border(border, size):
while size - border // i <= border // i: while size - border // i <= border // i:
i *= 2 i *= 2
return border // i return border // i
def norm_angle(angle, range=[-np.pi / 4, np.pi]):
return (angle - range[0]) % range[1] + range[0]
def poly2rbox_le135(poly):
"""convert poly to rbox [-pi / 4, 3 * pi / 4]
Args:
poly: [x1, y1, x2, y2, x3, y3, x4, y4]
Returns:
rbox: [cx, cy, w, h, angle]
"""
poly = np.array(poly[:8], dtype=np.float32)
pt1 = (poly[0], poly[1])
pt2 = (poly[2], poly[3])
pt3 = (poly[4], poly[5])
pt4 = (poly[6], poly[7])
edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[1]) *
(pt1[1] - pt2[1]))
edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[1]) *
(pt2[1] - pt3[1]))
width = max(edge1, edge2)
height = min(edge1, edge2)
rbox_angle = 0
if edge1 > edge2:
rbox_angle = np.arctan2(float(pt2[1] - pt1[1]), float(pt2[0] - pt1[0]))
elif edge2 >= edge1:
rbox_angle = np.arctan2(float(pt4[1] - pt1[1]), float(pt4[0] - pt1[0]))
rbox_angle = norm_angle(rbox_angle)
x_ctr = float(pt1[0] + pt3[0]) / 2
y_ctr = float(pt1[1] + pt3[1]) / 2
return x_ctr, y_ctr, width, height, rbox_angle
def poly2rbox_oc(poly):
"""convert poly to rbox (0, pi / 2]
Args:
poly: [x1, y1, x2, y2, x3, y3, x4, y4]
Returns:
rbox: [cx, cy, w, h, angle]
"""
points = np.array(poly, dtype=np.float32).reshape((-1, 2))
(cx, cy), (w, h), angle = cv2.minAreaRect(points)
# using the new OpenCV Rotated BBox definition since 4.5.1
# if angle < 0, opencv is older than 4.5.1, angle is in [-90, 0)
if angle < 0:
angle += 90
w, h = h, w
# convert angle to [0, 90)
if angle == -0.0:
angle = 0.0
if angle == 90.0:
angle = 0.0
w, h = h, w
angle = angle / 180 * np.pi
return cx, cy, w, h, angle
...@@ -29,8 +29,7 @@ import math ...@@ -29,8 +29,7 @@ import math
import copy import copy
from .operators import register_op, BaseOperator from .operators import register_op, BaseOperator
from .op_helper import poly2rbox_le135, poly2rbox_oc from ppdet.modeling.rbox_utils import poly2rbox_le135_np, poly2rbox_oc_np, rbox2poly_np
from ppdet.modeling import bbox_utils
from ppdet.utils.logger import setup_logger from ppdet.utils.logger import setup_logger
logger = setup_logger(__name__) logger = setup_logger(__name__)
...@@ -195,7 +194,7 @@ class Poly2RBox(BaseOperator): ...@@ -195,7 +194,7 @@ class Poly2RBox(BaseOperator):
def __init__(self, filter_threshold=4, filter_mode=None, rbox_type='le135'): def __init__(self, filter_threshold=4, filter_mode=None, rbox_type='le135'):
super(Poly2RBox, self).__init__() super(Poly2RBox, self).__init__()
self.filter_fn = lambda size: self.filter(size, filter_threshold, filter_mode) self.filter_fn = lambda size: self.filter(size, filter_threshold, filter_mode)
self.rbox_fn = poly2rbox_le135 if rbox_type == 'le135' else poly2rbox_oc self.rbox_fn = poly2rbox_le135_np if rbox_type == 'le135' else poly2rbox_oc_np
def filter(self, size, threshold, mode): def filter(self, size, threshold, mode):
if mode == 'area': if mode == 'area':
...@@ -248,7 +247,6 @@ class Poly2Array(BaseOperator): ...@@ -248,7 +247,6 @@ class Poly2Array(BaseOperator):
def apply(self, sample, context=None): def apply(self, sample, context=None):
if 'gt_poly' in sample: if 'gt_poly' in sample:
logger.info('gt_poly shape: {}'.format(sample['gt_poly']))
sample['gt_poly'] = np.array( sample['gt_poly'] = np.array(
sample['gt_poly'], dtype=np.float32).reshape((-1, 8)) sample['gt_poly'], dtype=np.float32).reshape((-1, 8))
...@@ -472,16 +470,10 @@ class Rbox2Poly(BaseOperator): ...@@ -472,16 +470,10 @@ class Rbox2Poly(BaseOperator):
def apply(self, sample, context=None): def apply(self, sample, context=None):
assert 'gt_rbox' in sample assert 'gt_rbox' in sample
assert sample['gt_rbox'].shape[1] == 5 assert sample['gt_rbox'].shape[1] == 5
rrects = sample['gt_rbox'] rboxes = sample['gt_rbox']
x_ctr = rrects[:, 0] polys = rbox2poly_np(rboxes)
y_ctr = rrects[:, 1]
width = rrects[:, 2]
height = rrects[:, 3]
x1 = x_ctr - width / 2.0
y1 = y_ctr - height / 2.0
x2 = x_ctr + width / 2.0
y2 = y_ctr + height / 2.0
sample['gt_bbox'] = np.stack([x1, y1, x2, y2], axis=1)
polys = bbox_utils.rbox2poly_np(rrects)
sample['gt_poly'] = polys sample['gt_poly'] = polys
xmin, ymin = polys[:, 0::2].min(1), polys[:, 1::2].min(1)
xmax, ymax = polys[:, 0::2].max(1), polys[:, 1::2].max(1)
sample['gt_bbox'] = np.stack([xmin, ymin, xmin, ymin], axis=1)
return sample return sample
...@@ -268,11 +268,7 @@ class Trainer(object): ...@@ -268,11 +268,7 @@ class Trainer(object):
output_eval = self.cfg['output_eval'] \ output_eval = self.cfg['output_eval'] \
if 'output_eval' in self.cfg else None if 'output_eval' in self.cfg else None
save_prediction_only = self.cfg.get('save_prediction_only', False) save_prediction_only = self.cfg.get('save_prediction_only', False)
imid2path = self.cfg.get('imid2path', None)
# pass clsid2catid info to metric instance to avoid multiple loading
# annotation file
clsid2catid = {v: k for k, v in self.dataset.catid2clsid.items()} \
if self.mode == 'eval' else None
# when do validation in train, annotation file should be get from # when do validation in train, annotation file should be get from
# EvalReader instead of self.dataset(which is TrainReader) # EvalReader instead of self.dataset(which is TrainReader)
...@@ -285,11 +281,11 @@ class Trainer(object): ...@@ -285,11 +281,11 @@ class Trainer(object):
self._metrics = [ self._metrics = [
RBoxMetric( RBoxMetric(
anno_file=anno_file, anno_file=anno_file,
clsid2catid=clsid2catid,
classwise=classwise, classwise=classwise,
output_eval=output_eval, output_eval=output_eval,
bias=bias, bias=bias,
save_prediction_only=save_prediction_only) save_prediction_only=save_prediction_only,
imid2path=imid2path)
] ]
elif self.cfg.metric == 'VOC': elif self.cfg.metric == 'VOC':
output_eval = self.cfg['output_eval'] \ output_eval = self.cfg['output_eval'] \
...@@ -810,10 +806,16 @@ class Trainer(object): ...@@ -810,10 +806,16 @@ class Trainer(object):
images, images,
draw_threshold=0.5, draw_threshold=0.5,
output_dir='output', output_dir='output',
save_results=False): save_results=False,
visualize=True):
if not os.path.exists(output_dir):
os.makedirs(output_dir)
self.dataset.set_images(images) self.dataset.set_images(images)
loader = create('TestReader')(self.dataset, 0) loader = create('TestReader')(self.dataset, 0)
imid2path = self.dataset.get_imid2path()
def setup_metrics_for_loader(): def setup_metrics_for_loader():
# mem # mem
metrics = copy.deepcopy(self._metrics) metrics = copy.deepcopy(self._metrics)
...@@ -827,6 +829,7 @@ class Trainer(object): ...@@ -827,6 +829,7 @@ class Trainer(object):
self.mode = '_test' self.mode = '_test'
self.cfg['save_prediction_only'] = True self.cfg['save_prediction_only'] = True
self.cfg['output_eval'] = output_dir self.cfg['output_eval'] = output_dir
self.cfg['imid2path'] = imid2path
self._init_metrics() self._init_metrics()
# restore # restore
...@@ -839,6 +842,8 @@ class Trainer(object): ...@@ -839,6 +842,8 @@ class Trainer(object):
if output_eval is not None: if output_eval is not None:
self.cfg['output_eval'] = output_eval self.cfg['output_eval'] = output_eval
self.cfg.pop('imid2path')
_metrics = copy.deepcopy(self._metrics) _metrics = copy.deepcopy(self._metrics)
self._metrics = metrics self._metrics = metrics
...@@ -849,8 +854,6 @@ class Trainer(object): ...@@ -849,8 +854,6 @@ class Trainer(object):
else: else:
metrics = [] metrics = []
imid2path = self.dataset.get_imid2path()
anno_file = self.dataset.get_anno() anno_file = self.dataset.get_anno()
clsid2catid, catid2name = get_categories( clsid2catid, catid2name = get_categories(
self.cfg.metric, anno_file=anno_file) self.cfg.metric, anno_file=anno_file)
...@@ -889,6 +892,7 @@ class Trainer(object): ...@@ -889,6 +892,7 @@ class Trainer(object):
_m.accumulate() _m.accumulate()
_m.reset() _m.reset()
if visualize:
for outs in results: for outs in results:
batch_res = get_infer_results(outs, clsid2catid) batch_res = get_infer_results(outs, clsid2catid)
bbox_num = outs['bbox_num'] bbox_num = outs['bbox_num']
...@@ -916,7 +920,8 @@ class Trainer(object): ...@@ -916,7 +920,8 @@ class Trainer(object):
if self._compose_callback: if self._compose_callback:
self._compose_callback.on_step_end(self.status) self._compose_callback.on_step_end(self.status)
# save image with detection # save image with detection
save_name = self._get_save_image_name(output_dir, image_path) save_name = self._get_save_image_name(output_dir,
image_path)
logger.info("Detection bbox results save in {}".format( logger.info("Detection bbox results save in {}".format(
save_name)) save_name))
image.save(save_name, quality=95) image.save(save_name, quality=95)
...@@ -927,8 +932,6 @@ class Trainer(object): ...@@ -927,8 +932,6 @@ class Trainer(object):
""" """
Get save image name from source image path. Get save image name from source image path.
""" """
if not os.path.exists(output_dir):
os.makedirs(output_dir)
image_name = os.path.split(image_path)[-1] image_name = os.path.split(image_path)[-1]
name, ext = os.path.splitext(image_name) name, ext = os.path.splitext(image_name)
return os.path.join(output_dir, "{}".format(name)) + ext return os.path.join(output_dir, "{}".format(name)) + ext
......
...@@ -22,7 +22,7 @@ import sys ...@@ -22,7 +22,7 @@ import sys
import numpy as np import numpy as np
import itertools import itertools
import paddle import paddle
from ppdet.modeling.bbox_utils import poly2rbox, rbox2poly_np from ppdet.modeling.rbox_utils import poly2rbox_np
from ppdet.utils.logger import setup_logger from ppdet.utils.logger import setup_logger
logger = setup_logger(__name__) logger = setup_logger(__name__)
...@@ -91,15 +91,13 @@ def jaccard_overlap(pred, gt, is_bbox_normalized=False): ...@@ -91,15 +91,13 @@ def jaccard_overlap(pred, gt, is_bbox_normalized=False):
return overlap return overlap
def calc_rbox_iou(pred, gt_rbox): def calc_rbox_iou(pred, gt_poly):
""" """
calc iou between rotated bbox calc iou between rotated bbox
""" """
# calc iou of bounding box for speedup # calc iou of bounding box for speedup
pred = np.array(pred, np.float32).reshape(-1, 8) pred = np.array(pred, np.float32).reshape(-1, 2)
pred = pred.reshape(-1, 2) gt_poly = np.array(gt_poly, np.float32).reshape(-1, 2)
gt_poly = rbox2poly_np(np.array(gt_rbox).reshape(-1, 5))[0]
gt_poly = gt_poly.reshape(-1, 2)
pred_rect = [ pred_rect = [
np.min(pred[:, 0]), np.min(pred[:, 1]), np.max(pred[:, 0]), np.min(pred[:, 0]), np.min(pred[:, 1]), np.max(pred[:, 0]),
np.max(pred[:, 1]) np.max(pred[:, 1])
...@@ -114,12 +112,8 @@ def calc_rbox_iou(pred, gt_rbox): ...@@ -114,12 +112,8 @@ def calc_rbox_iou(pred, gt_rbox):
return iou return iou
# calc rbox iou # calc rbox iou
pred = pred.reshape(-1, 8) pred_rbox = poly2rbox_np(pred.reshape(-1, 8)).reshape(-1, 5)
gt_rbox = poly2rbox_np(gt_poly.reshape(-1, 8)).reshape(-1, 5)
pred = np.array(pred, np.float32).reshape(-1, 8)
pred_rbox = poly2rbox(pred)
pred_rbox = pred_rbox.reshape(-1, 5)
pred_rbox = pred_rbox.reshape(-1, 5)
try: try:
from ext_op import rbox_iou from ext_op import rbox_iou
except Exception as e: except Exception as e:
...@@ -127,7 +121,6 @@ def calc_rbox_iou(pred, gt_rbox): ...@@ -127,7 +121,6 @@ def calc_rbox_iou(pred, gt_rbox):
"following ppdet/ext_op/README.md", e) "following ppdet/ext_op/README.md", e)
sys.stdout.flush() sys.stdout.flush()
sys.exit(-1) sys.exit(-1)
gt_rbox = np.array(gt_rbox, np.float32).reshape(-1, 5)
pd_gt_rbox = paddle.to_tensor(gt_rbox, dtype='float32') pd_gt_rbox = paddle.to_tensor(gt_rbox, dtype='float32')
pd_pred_rbox = paddle.to_tensor(pred_rbox, dtype='float32') pd_pred_rbox = paddle.to_tensor(pred_rbox, dtype='float32')
iou = rbox_iou(pd_gt_rbox, pd_pred_rbox) iou = rbox_iou(pd_gt_rbox, pd_pred_rbox)
...@@ -211,7 +204,7 @@ class DetectionMAP(object): ...@@ -211,7 +204,7 @@ class DetectionMAP(object):
max_overlap = -1.0 max_overlap = -1.0
for i, gl in enumerate(gt_label): for i, gl in enumerate(gt_label):
if int(gl) == int(l): if int(gl) == int(l):
if len(gt_box[i]) == 5: if len(gt_box[i]) == 8:
overlap = calc_rbox_iou(pred, gt_box[i]) overlap = calc_rbox_iou(pred, gt_box[i])
else: else:
overlap = jaccard_overlap(pred, gt_box[i], overlap = jaccard_overlap(pred, gt_box[i],
......
...@@ -22,12 +22,14 @@ import json ...@@ -22,12 +22,14 @@ import json
import paddle import paddle
import numpy as np import numpy as np
import typing import typing
from collections import defaultdict
from pathlib import Path from pathlib import Path
from .map_utils import prune_zero_padding, DetectionMAP from .map_utils import prune_zero_padding, DetectionMAP
from .coco_utils import get_infer_results, cocoapi_eval from .coco_utils import get_infer_results, cocoapi_eval
from .widerface_utils import face_eval_run from .widerface_utils import face_eval_run
from ppdet.data.source.category import get_categories from ppdet.data.source.category import get_categories
from ppdet.modeling.rbox_utils import poly2rbox_np
from ppdet.utils.logger import setup_logger from ppdet.utils.logger import setup_logger
logger = setup_logger(__name__) logger = setup_logger(__name__)
...@@ -356,6 +358,7 @@ class RBoxMetric(Metric): ...@@ -356,6 +358,7 @@ class RBoxMetric(Metric):
self.overlap_thresh = kwargs.get('overlap_thresh', 0.5) self.overlap_thresh = kwargs.get('overlap_thresh', 0.5)
self.map_type = kwargs.get('map_type', '11point') self.map_type = kwargs.get('map_type', '11point')
self.evaluate_difficult = kwargs.get('evaluate_difficult', False) self.evaluate_difficult = kwargs.get('evaluate_difficult', False)
self.imid2path = kwargs.get('imid2path', None)
class_num = len(self.catid2name) class_num = len(self.catid2name)
self.detection_map = DetectionMAP( self.detection_map = DetectionMAP(
class_num=class_num, class_num=class_num,
...@@ -388,11 +391,21 @@ class RBoxMetric(Metric): ...@@ -388,11 +391,21 @@ class RBoxMetric(Metric):
if self.save_prediction_only: if self.save_prediction_only:
return return
gt_boxes = inputs['gt_rbox'] gt_boxes = inputs['gt_poly']
gt_labels = inputs['gt_class'] gt_labels = inputs['gt_class']
if 'scale_factor' in inputs:
scale_factor = inputs['scale_factor'].numpy() if isinstance(
inputs['scale_factor'],
paddle.Tensor) else inputs['scale_factor']
else:
scale_factor = np.ones((gt_boxes.shape[0], 2)).astype('float32')
for i in range(len(gt_boxes)): for i in range(len(gt_boxes)):
gt_box = gt_boxes[i].numpy() if isinstance( gt_box = gt_boxes[i].numpy() if isinstance(
gt_boxes[i], paddle.Tensor) else gt_boxes[i] gt_boxes[i], paddle.Tensor) else gt_boxes[i]
h, w = scale_factor[i]
gt_box = gt_box / np.array([w, h, w, h, w, h, w, h])
gt_label = gt_labels[i].numpy() if isinstance( gt_label = gt_labels[i].numpy() if isinstance(
gt_labels[i], paddle.Tensor) else gt_labels[i] gt_labels[i], paddle.Tensor) else gt_labels[i]
gt_box, gt_label, _ = prune_zero_padding(gt_box, gt_label) gt_box, gt_label, _ = prune_zero_padding(gt_box, gt_label)
...@@ -411,19 +424,39 @@ class RBoxMetric(Metric): ...@@ -411,19 +424,39 @@ class RBoxMetric(Metric):
] ]
self.detection_map.update(bbox, score, label, gt_box, gt_label) self.detection_map.update(bbox, score, label, gt_box, gt_label)
def save_results(self, results, output_dir, imid2path):
if imid2path:
data_dicts = defaultdict(list)
for result in results:
image_id = result['image_id']
data_dicts[image_id].append(result)
for image_id, image_path in imid2path.items():
basename = os.path.splitext(os.path.split(image_path)[-1])[0]
output = os.path.join(output_dir, "{}.txt".format(basename))
dets = data_dicts.get(image_id, [])
with open(output, 'w') as f:
for det in dets:
catid, bbox, score = det['category_id'], det[
'bbox'], det['score']
bbox_pred = '{} {} '.format(self.catid2name[catid],
score) + ' '.join(
[str(e) for e in bbox])
f.write(bbox_pred + '\n')
logger.info('The bbox result is saved to {}.'.format(output_dir))
else:
output = os.path.join(output_dir, "bbox.json")
with open(output, 'w') as f:
json.dump(results, f)
logger.info('The bbox result is saved to {}.'.format(output))
def accumulate(self): def accumulate(self):
if len(self.results) > 0:
output = "bbox.json"
if self.output_eval: if self.output_eval:
output = os.path.join(self.output_eval, output) self.save_results(self.results, self.output_eval, self.imid2path)
with open(output, 'w') as f:
json.dump(self.results, f)
logger.info('The bbox result is saved to bbox.json.')
if self.save_prediction_only: if not self.save_prediction_only:
logger.info('The bbox result is saved to {} and do not '
'evaluate the mAP.'.format(output))
else:
logger.info("Accumulating evaluatation results...") logger.info("Accumulating evaluatation results...")
self.detection_map.accumulate() self.detection_map.accumulate()
......
...@@ -29,6 +29,7 @@ from . import reid ...@@ -29,6 +29,7 @@ from . import reid
from . import mot from . import mot
from . import transformers from . import transformers
from . import assigners from . import assigners
from . import rbox_utils
from .ops import * from .ops import *
from .backbones import * from .backbones import *
...@@ -43,3 +44,4 @@ from .reid import * ...@@ -43,3 +44,4 @@ from .reid import *
from .mot import * from .mot import *
from .transformers import * from .transformers import *
from .assigners import * from .assigners import *
from .rbox_utils import *
...@@ -359,295 +359,6 @@ def bbox_iou(box1, box2, giou=False, diou=False, ciou=False, eps=1e-9): ...@@ -359,295 +359,6 @@ def bbox_iou(box1, box2, giou=False, diou=False, ciou=False, eps=1e-9):
return iou return iou
def rect2rbox(bboxes):
"""
:param bboxes: shape (n, 4) (xmin, ymin, xmax, ymax)
:return: dbboxes: shape (n, 5) (x_ctr, y_ctr, w, h, angle)
"""
bboxes = bboxes.reshape(-1, 4)
num_boxes = bboxes.shape[0]
x_ctr = (bboxes[:, 2] + bboxes[:, 0]) / 2.0
y_ctr = (bboxes[:, 3] + bboxes[:, 1]) / 2.0
edges1 = np.abs(bboxes[:, 2] - bboxes[:, 0])
edges2 = np.abs(bboxes[:, 3] - bboxes[:, 1])
angles = np.zeros([num_boxes], dtype=bboxes.dtype)
inds = edges1 < edges2
rboxes = np.stack((x_ctr, y_ctr, edges1, edges2, angles), axis=1)
rboxes[inds, 2] = edges2[inds]
rboxes[inds, 3] = edges1[inds]
rboxes[inds, 4] = np.pi / 2.0
return rboxes
def delta2rbox(rrois,
deltas,
means=[0, 0, 0, 0, 0],
stds=[1, 1, 1, 1, 1],
wh_ratio_clip=1e-6):
"""
:param rrois: (cx, cy, w, h, theta)
:param deltas: (dx, dy, dw, dh, dtheta)
:param means:
:param stds:
:param wh_ratio_clip:
:return:
"""
means = paddle.to_tensor(means)
stds = paddle.to_tensor(stds)
deltas = paddle.reshape(deltas, [-1, deltas.shape[-1]])
denorm_deltas = deltas * stds + means
dx = denorm_deltas[:, 0]
dy = denorm_deltas[:, 1]
dw = denorm_deltas[:, 2]
dh = denorm_deltas[:, 3]
dangle = denorm_deltas[:, 4]
max_ratio = np.abs(np.log(wh_ratio_clip))
dw = paddle.clip(dw, min=-max_ratio, max=max_ratio)
dh = paddle.clip(dh, min=-max_ratio, max=max_ratio)
rroi_x = rrois[:, 0]
rroi_y = rrois[:, 1]
rroi_w = rrois[:, 2]
rroi_h = rrois[:, 3]
rroi_angle = rrois[:, 4]
gx = dx * rroi_w * paddle.cos(rroi_angle) - dy * rroi_h * paddle.sin(
rroi_angle) + rroi_x
gy = dx * rroi_w * paddle.sin(rroi_angle) + dy * rroi_h * paddle.cos(
rroi_angle) + rroi_y
gw = rroi_w * dw.exp()
gh = rroi_h * dh.exp()
ga = np.pi * dangle + rroi_angle
ga = (ga + np.pi / 4) % np.pi - np.pi / 4
ga = paddle.to_tensor(ga)
gw = paddle.to_tensor(gw, dtype='float32')
gh = paddle.to_tensor(gh, dtype='float32')
bboxes = paddle.stack([gx, gy, gw, gh, ga], axis=-1)
return bboxes
def rbox2delta(proposals, gt, means=[0, 0, 0, 0, 0], stds=[1, 1, 1, 1, 1]):
"""
Args:
proposals:
gt:
means: 1x5
stds: 1x5
Returns:
"""
proposals = proposals.astype(np.float64)
PI = np.pi
gt_widths = gt[..., 2]
gt_heights = gt[..., 3]
gt_angle = gt[..., 4]
proposals_widths = proposals[..., 2]
proposals_heights = proposals[..., 3]
proposals_angle = proposals[..., 4]
coord = gt[..., 0:2] - proposals[..., 0:2]
dx = (np.cos(proposals[..., 4]) * coord[..., 0] + np.sin(proposals[..., 4])
* coord[..., 1]) / proposals_widths
dy = (-np.sin(proposals[..., 4]) * coord[..., 0] + np.cos(proposals[..., 4])
* coord[..., 1]) / proposals_heights
dw = np.log(gt_widths / proposals_widths)
dh = np.log(gt_heights / proposals_heights)
da = (gt_angle - proposals_angle)
da = (da + PI / 4) % PI - PI / 4
da /= PI
deltas = np.stack([dx, dy, dw, dh, da], axis=-1)
means = np.array(means, dtype=deltas.dtype)
stds = np.array(stds, dtype=deltas.dtype)
deltas = (deltas - means) / stds
deltas = deltas.astype(np.float32)
return deltas
def bbox_decode(bbox_preds,
anchors,
means=[0, 0, 0, 0, 0],
stds=[1, 1, 1, 1, 1]):
"""decode bbox from deltas
Args:
bbox_preds: [N,H,W,5]
anchors: [H*W,5]
return:
bboxes: [N,H,W,5]
"""
means = paddle.to_tensor(means)
stds = paddle.to_tensor(stds)
num_imgs, H, W, _ = bbox_preds.shape
bboxes_list = []
for img_id in range(num_imgs):
bbox_pred = bbox_preds[img_id]
# bbox_pred.shape=[5,H,W]
bbox_delta = bbox_pred
anchors = paddle.to_tensor(anchors)
bboxes = delta2rbox(
anchors, bbox_delta, means, stds, wh_ratio_clip=1e-6)
bboxes = paddle.reshape(bboxes, [H, W, 5])
bboxes_list.append(bboxes)
return paddle.stack(bboxes_list, axis=0)
def poly2rbox(polys):
"""
poly:[x0,y0,x1,y1,x2,y2,x3,y3]
to
rotated_boxes:[x_ctr,y_ctr,w,h,angle]
"""
rotated_boxes = []
for poly in polys:
poly = np.array(poly[:8], dtype=np.float32)
pt1 = (poly[0], poly[1])
pt2 = (poly[2], poly[3])
pt3 = (poly[4], poly[5])
pt4 = (poly[6], poly[7])
edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[
1]) * (pt1[1] - pt2[1]))
edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[
1]) * (pt2[1] - pt3[1]))
width = max(edge1, edge2)
height = min(edge1, edge2)
rbox_angle = 0
if edge1 > edge2:
rbox_angle = np.arctan2(
float(pt2[1] - pt1[1]), float(pt2[0] - pt1[0]))
elif edge2 >= edge1:
rbox_angle = np.arctan2(
float(pt4[1] - pt1[1]), float(pt4[0] - pt1[0]))
def norm_angle(angle, range=[-np.pi / 4, np.pi]):
return (angle - range[0]) % range[1] + range[0]
rbox_angle = norm_angle(rbox_angle)
x_ctr = float(pt1[0] + pt3[0]) / 2
y_ctr = float(pt1[1] + pt3[1]) / 2
rotated_box = np.array([x_ctr, y_ctr, width, height, rbox_angle])
rotated_boxes.append(rotated_box)
ret_rotated_boxes = np.array(rotated_boxes)
assert ret_rotated_boxes.shape[1] == 5
return ret_rotated_boxes
def cal_line_length(point1, point2):
import math
return math.sqrt(
math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1], 2))
def get_best_begin_point_single(coordinate):
x1, y1, x2, y2, x3, y3, x4, y4 = coordinate
xmin = min(x1, x2, x3, x4)
ymin = min(y1, y2, y3, y4)
xmax = max(x1, x2, x3, x4)
ymax = max(y1, y2, y3, y4)
combinate = [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
[[x4, y4], [x1, y1], [x2, y2], [x3, y3]],
[[x3, y3], [x4, y4], [x1, y1], [x2, y2]],
[[x2, y2], [x3, y3], [x4, y4], [x1, y1]]]
dst_coordinate = [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]]
force = 100000000.0
force_flag = 0
for i in range(4):
temp_force = cal_line_length(combinate[i][0], dst_coordinate[0]) \
+ cal_line_length(combinate[i][1], dst_coordinate[1]) \
+ cal_line_length(combinate[i][2], dst_coordinate[2]) \
+ cal_line_length(combinate[i][3], dst_coordinate[3])
if temp_force < force:
force = temp_force
force_flag = i
if force_flag != 0:
pass
return np.array(combinate[force_flag]).reshape(8)
def rbox2poly_np(rrects):
"""
rrect:[x_ctr,y_ctr,w,h,angle]
to
poly:[x0,y0,x1,y1,x2,y2,x3,y3]
"""
polys = []
for i in range(rrects.shape[0]):
rrect = rrects[i]
# x_ctr, y_ctr, width, height, angle = rrect[:5]
x_ctr = rrect[0]
y_ctr = rrect[1]
width = rrect[2]
height = rrect[3]
angle = rrect[4]
tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2
rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]])
R = np.array([[np.cos(angle), -np.sin(angle)],
[np.sin(angle), np.cos(angle)]])
poly = R.dot(rect)
x0, x1, x2, x3 = poly[0, :4] + x_ctr
y0, y1, y2, y3 = poly[1, :4] + y_ctr
poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float32)
poly = get_best_begin_point_single(poly)
polys.append(poly)
polys = np.array(polys)
return polys
def rbox2poly(rrects):
"""
rrect:[x_ctr,y_ctr,w,h,angle]
to
poly:[x0,y0,x1,y1,x2,y2,x3,y3]
"""
N = paddle.shape(rrects)[0]
x_ctr = rrects[:, 0]
y_ctr = rrects[:, 1]
width = rrects[:, 2]
height = rrects[:, 3]
angle = rrects[:, 4]
tl_x, tl_y, br_x, br_y = -width * 0.5, -height * 0.5, width * 0.5, height * 0.5
normal_rects = paddle.stack(
[tl_x, br_x, br_x, tl_x, tl_y, tl_y, br_y, br_y], axis=0)
normal_rects = paddle.reshape(normal_rects, [2, 4, N])
normal_rects = paddle.transpose(normal_rects, [2, 0, 1])
sin, cos = paddle.sin(angle), paddle.cos(angle)
# M.shape=[N,2,2]
M = paddle.stack([cos, -sin, sin, cos], axis=0)
M = paddle.reshape(M, [2, 2, N])
M = paddle.transpose(M, [2, 0, 1])
# polys:[N,8]
polys = paddle.matmul(M, normal_rects)
polys = paddle.transpose(polys, [2, 1, 0])
polys = paddle.reshape(polys, [-1, N])
polys = paddle.transpose(polys, [1, 0])
tmp = paddle.stack(
[x_ctr, y_ctr, x_ctr, y_ctr, x_ctr, y_ctr, x_ctr, y_ctr], axis=1)
polys = polys + tmp
return polys
def bbox_iou_np_expand(box1, box2, x1y1x2y2=True, eps=1e-16): def bbox_iou_np_expand(box1, box2, x1y1x2y2=True, eps=1e-16):
""" """
Calculate the iou of box1 and box2 with numpy. Calculate the iou of box1 and box2 with numpy.
......
...@@ -20,7 +20,6 @@ import paddle.nn as nn ...@@ -20,7 +20,6 @@ import paddle.nn as nn
import paddle.nn.functional as F import paddle.nn.functional as F
from paddle.nn.initializer import Normal, Constant from paddle.nn.initializer import Normal, Constant
from ppdet.core.workspace import register from ppdet.core.workspace import register
from ppdet.modeling.bbox_utils import rbox2poly
from ppdet.modeling.proposal_generator.target_layer import RBoxAssigner from ppdet.modeling.proposal_generator.target_layer import RBoxAssigner
from ppdet.modeling.proposal_generator.anchor_generator import S2ANetAnchorGenerator from ppdet.modeling.proposal_generator.anchor_generator import S2ANetAnchorGenerator
from ppdet.modeling.layers import AlignConv from ppdet.modeling.layers import AlignConv
...@@ -424,7 +423,7 @@ class S2ANetHead(nn.Layer): ...@@ -424,7 +423,7 @@ class S2ANetHead(nn.Layer):
mlvl_bboxes = paddle.concat(mlvl_bboxes) mlvl_bboxes = paddle.concat(mlvl_bboxes)
mlvl_scores = paddle.concat(mlvl_scores) mlvl_scores = paddle.concat(mlvl_scores)
mlvl_polys = rbox2poly(mlvl_bboxes).unsqueeze(0) mlvl_polys = self.rbox2poly(mlvl_bboxes).unsqueeze(0)
mlvl_scores = paddle.transpose(mlvl_scores, [1, 0]).unsqueeze(0) mlvl_scores = paddle.transpose(mlvl_scores, [1, 0]).unsqueeze(0)
bbox, bbox_num, _ = self.nms(mlvl_polys, mlvl_scores) bbox, bbox_num, _ = self.nms(mlvl_polys, mlvl_scores)
...@@ -706,3 +705,41 @@ class S2ANetHead(nn.Layer): ...@@ -706,3 +705,41 @@ class S2ANetHead(nn.Layer):
ga = (ga + np.pi / 4) % np.pi - np.pi / 4 ga = (ga + np.pi / 4) % np.pi - np.pi / 4
bboxes = paddle.concat([gx, gy, gw, gh, ga], axis=-1) bboxes = paddle.concat([gx, gy, gw, gh, ga], axis=-1)
return bboxes return bboxes
def rbox2poly(self, rboxes):
"""
rboxes: [x_ctr,y_ctr,w,h,angle]
to
polys: [x0,y0,x1,y1,x2,y2,x3,y3]
"""
N = paddle.shape(rboxes)[0]
x_ctr = rboxes[:, 0]
y_ctr = rboxes[:, 1]
width = rboxes[:, 2]
height = rboxes[:, 3]
angle = rboxes[:, 4]
tl_x, tl_y, br_x, br_y = -width * 0.5, -height * 0.5, width * 0.5, height * 0.5
normal_rects = paddle.stack(
[tl_x, br_x, br_x, tl_x, tl_y, tl_y, br_y, br_y], axis=0)
normal_rects = paddle.reshape(normal_rects, [2, 4, N])
normal_rects = paddle.transpose(normal_rects, [2, 0, 1])
sin, cos = paddle.sin(angle), paddle.cos(angle)
# M: [N,2,2]
M = paddle.stack([cos, -sin, sin, cos], axis=0)
M = paddle.reshape(M, [2, 2, N])
M = paddle.transpose(M, [2, 0, 1])
# polys: [N,8]
polys = paddle.matmul(M, normal_rects)
polys = paddle.transpose(polys, [2, 1, 0])
polys = paddle.reshape(polys, [-1, N])
polys = paddle.transpose(polys, [1, 0])
tmp = paddle.stack(
[x_ctr, y_ctr, x_ctr, y_ctr, x_ctr, y_ctr, x_ctr, y_ctr], axis=1)
polys = polys + tmp
return polys
...@@ -17,7 +17,7 @@ import paddle ...@@ -17,7 +17,7 @@ import paddle
import paddle.nn as nn import paddle.nn as nn
import paddle.nn.functional as F import paddle.nn.functional as F
from ppdet.core.workspace import register from ppdet.core.workspace import register
from ppdet.modeling.bbox_utils import nonempty_bbox, rbox2poly from ppdet.modeling.bbox_utils import nonempty_bbox
from ppdet.modeling.layers import TTFBox from ppdet.modeling.layers import TTFBox
from .transformers import bbox_cxcywh_to_xyxy from .transformers import bbox_cxcywh_to_xyxy
try: try:
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import paddle
import numpy as np
import cv2
def norm_angle(angle, range=[-np.pi / 4, np.pi]):
return (angle - range[0]) % range[1] + range[0]
# rbox function implemented using numpy
def poly2rbox_le135_np(poly):
"""convert poly to rbox [-pi / 4, 3 * pi / 4]
Args:
poly: [x1, y1, x2, y2, x3, y3, x4, y4]
Returns:
rbox: [cx, cy, w, h, angle]
"""
poly = np.array(poly[:8], dtype=np.float32)
pt1 = (poly[0], poly[1])
pt2 = (poly[2], poly[3])
pt3 = (poly[4], poly[5])
pt4 = (poly[6], poly[7])
edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[1]) *
(pt1[1] - pt2[1]))
edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[1]) *
(pt2[1] - pt3[1]))
width = max(edge1, edge2)
height = min(edge1, edge2)
rbox_angle = 0
if edge1 > edge2:
rbox_angle = np.arctan2(float(pt2[1] - pt1[1]), float(pt2[0] - pt1[0]))
elif edge2 >= edge1:
rbox_angle = np.arctan2(float(pt4[1] - pt1[1]), float(pt4[0] - pt1[0]))
rbox_angle = norm_angle(rbox_angle)
x_ctr = float(pt1[0] + pt3[0]) / 2
y_ctr = float(pt1[1] + pt3[1]) / 2
return [x_ctr, y_ctr, width, height, rbox_angle]
def poly2rbox_oc_np(poly):
"""convert poly to rbox (0, pi / 2]
Args:
poly: [x1, y1, x2, y2, x3, y3, x4, y4]
Returns:
rbox: [cx, cy, w, h, angle]
"""
points = np.array(poly, dtype=np.float32).reshape((-1, 2))
(cx, cy), (w, h), angle = cv2.minAreaRect(points)
# using the new OpenCV Rotated BBox definition since 4.5.1
# if angle < 0, opencv is older than 4.5.1, angle is in [-90, 0)
if angle < 0:
angle += 90
w, h = h, w
# convert angle to [0, 90)
if angle == -0.0:
angle = 0.0
if angle == 90.0:
angle = 0.0
w, h = h, w
angle = angle / 180 * np.pi
return [cx, cy, w, h, angle]
def poly2rbox_np(polys, rbox_type='oc'):
"""
polys: [x0,y0,x1,y1,x2,y2,x3,y3]
to
rboxes: [x_ctr,y_ctr,w,h,angle]
"""
assert rbox_type in ['oc', 'le135'], 'only oc or le135 is supported now'
poly2rbox_fn = poly2rbox_oc_np if rbox_type == 'oc' else poly2rbox_le135_np
rboxes = []
for poly in polys:
x, y, w, h, angle = poly2rbox_fn(poly)
rbox = np.array([x, y, w, h, angle], dtype=np.float32)
rboxes.append(rbox)
return np.array(rboxes)
def cal_line_length(point1, point2):
return math.sqrt(
math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1], 2))
def get_best_begin_point_single(coordinate):
x1, y1, x2, y2, x3, y3, x4, y4 = coordinate
xmin = min(x1, x2, x3, x4)
ymin = min(y1, y2, y3, y4)
xmax = max(x1, x2, x3, x4)
ymax = max(y1, y2, y3, y4)
combinate = [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
[[x4, y4], [x1, y1], [x2, y2], [x3, y3]],
[[x3, y3], [x4, y4], [x1, y1], [x2, y2]],
[[x2, y2], [x3, y3], [x4, y4], [x1, y1]]]
dst_coordinate = [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]]
force = 100000000.0
force_flag = 0
for i in range(4):
temp_force = cal_line_length(combinate[i][0], dst_coordinate[0]) \
+ cal_line_length(combinate[i][1], dst_coordinate[1]) \
+ cal_line_length(combinate[i][2], dst_coordinate[2]) \
+ cal_line_length(combinate[i][3], dst_coordinate[3])
if temp_force < force:
force = temp_force
force_flag = i
if force_flag != 0:
pass
return np.array(combinate[force_flag]).reshape(8)
def rbox2poly_np(rboxes):
"""
rboxes:[x_ctr,y_ctr,w,h,angle]
to
poly:[x0,y0,x1,y1,x2,y2,x3,y3]
"""
polys = []
for i in range(len(rboxes)):
x_ctr, y_ctr, width, height, angle = rboxes[i][:5]
tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2
rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]])
R = np.array([[np.cos(angle), -np.sin(angle)],
[np.sin(angle), np.cos(angle)]])
poly = R.dot(rect)
x0, x1, x2, x3 = poly[0, :4] + x_ctr
y0, y1, y2, y3 = poly[1, :4] + y_ctr
poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float32)
poly = get_best_begin_point_single(poly)
polys.append(poly)
polys = np.array(polys)
return polys
...@@ -96,8 +96,8 @@ DATASETS = { ...@@ -96,8 +96,8 @@ DATASETS = {
'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_coco.tar', 'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_coco.tar',
'49ce5a9b5ad0d6266163cd01de4b018e', ), ], ['annotations', 'images']), '49ce5a9b5ad0d6266163cd01de4b018e', ), ], ['annotations', 'images']),
'spine_coco': ([( 'spine_coco': ([(
'https://paddledet.bj.bcebos.com/data/spine_coco.tar', 'https://paddledet.bj.bcebos.com/data/spine.tar',
'03030f42d9b6202a6e425d4becefda0d', ), ], ['annotations', 'images']), '8a3a353c2c54a2284ad7d2780b65f6a6', ), ], ['annotations', 'images']),
'mot': (), 'mot': (),
'objects365': (), 'objects365': (),
'coco_ce': ([( 'coco_ce': ([(
......
...@@ -27,6 +27,7 @@ sys.path.insert(0, parent_path) ...@@ -27,6 +27,7 @@ sys.path.insert(0, parent_path)
import warnings import warnings
warnings.filterwarnings('ignore') warnings.filterwarnings('ignore')
import glob import glob
import ast
import paddle import paddle
from ppdet.core.workspace import load_config, merge_config from ppdet.core.workspace import load_config, merge_config
...@@ -114,6 +115,11 @@ def parse_args(): ...@@ -114,6 +115,11 @@ def parse_args():
type=str, type=str,
default='iou', default='iou',
help="Combine method matching metric, choose in ['iou', 'ios'].") help="Combine method matching metric, choose in ['iou', 'ios'].")
parser.add_argument(
"--visualize",
type=ast.literal_eval,
default=True,
help="Whether to save visualize results to output_dir.")
args = parser.parse_args() args = parser.parse_args()
return args return args
...@@ -170,13 +176,15 @@ def run(FLAGS, cfg): ...@@ -170,13 +176,15 @@ def run(FLAGS, cfg):
match_metric=FLAGS.match_metric, match_metric=FLAGS.match_metric,
draw_threshold=FLAGS.draw_threshold, draw_threshold=FLAGS.draw_threshold,
output_dir=FLAGS.output_dir, output_dir=FLAGS.output_dir,
save_results=FLAGS.save_results) save_results=FLAGS.save_results,
visualize=FLAGS.visualize)
else: else:
trainer.predict( trainer.predict(
images, images,
draw_threshold=FLAGS.draw_threshold, draw_threshold=FLAGS.draw_threshold,
output_dir=FLAGS.output_dir, output_dir=FLAGS.output_dir,
save_results=FLAGS.save_results) save_results=FLAGS.save_results,
visualize=FLAGS.visualize)
def main(): def main():
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册