diff --git a/configs/datasets/dota.yml b/configs/datasets/dota.yml index 2830b829c218f725ac045108d462dc833ea08a89..9dda08400aaac1d914b1858dda32ff0f82717b49 100644 --- a/configs/datasets/dota.yml +++ b/configs/datasets/dota.yml @@ -3,19 +3,19 @@ num_classes: 15 TrainDataset: !COCODataSet - image_dir: trainval_split/images - anno_path: trainval_split/s2anet_trainval_paddle_coco.json - dataset_dir: dataset/DOTA_1024_s2anet - data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_rbox'] + image_dir: trainval1024/images + anno_path: trainval1024/DOTA_trainval1024.json + dataset_dir: dataset/dota/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly'] EvalDataset: !COCODataSet - image_dir: trainval_split/images - anno_path: trainval_split/s2anet_trainval_paddle_coco.json - dataset_dir: dataset/DOTA_1024_s2anet/ - data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_rbox'] + image_dir: trainval1024/images + anno_path: trainval1024/DOTA_trainval1024.json + dataset_dir: dataset/dota/ + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly'] TestDataset: !ImageFolder - anno_path: trainval_split/s2anet_trainval_paddle_coco.json - dataset_dir: dataset/DOTA_1024_s2anet/ + anno_path: test1024/DOTA_test1024.json + dataset_dir: dataset/dota/ diff --git a/configs/datasets/spine_coco.yml b/configs/datasets/spine_coco.yml index 41cf51e0ec1d98decc7eb930753d3bd1800912ca..2339c26db1fcd55a52c8cc7b7dc2623964b7c97a 100644 --- a/configs/datasets/spine_coco.yml +++ b/configs/datasets/spine_coco.yml @@ -6,14 +6,14 @@ TrainDataset: image_dir: images anno_path: annotations/train.json dataset_dir: dataset/spine_coco - data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_rbox'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly'] EvalDataset: !COCODataSet image_dir: images anno_path: annotations/valid.json dataset_dir: dataset/spine_coco - data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_rbox'] + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly'] TestDataset: !ImageFolder diff --git a/configs/dota/README.md b/configs/dota/README.md deleted file mode 100644 index adde27691ef78606d2528b95dee8e30842bfff64..0000000000000000000000000000000000000000 --- a/configs/dota/README.md +++ /dev/null @@ -1,175 +0,0 @@ -# S2ANet模型 - -## 内容 -- [简介](#简介) -- [准备数据](#准备数据) -- [开始训练](#开始训练) -- [模型库](#模型库) -- [预测部署](#预测部署) - -## 简介 - -[S2ANet](https://arxiv.org/pdf/2008.09397.pdf)是用于检测旋转框的模型,要求使用PaddlePaddle 2.1.1(可使用pip安装) 或适当的[develop版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release)。 - - -## 准备数据 - -### DOTA数据 -[DOTA Dataset]是航空影像中物体检测的数据集,包含2806张图像,每张图像4000*4000分辨率。 - -| 数据版本 | 类别数 | 图像数 | 图像尺寸 | 实例数 | 标注方式 | -|:--------:|:-------:|:---------:|:---------:| :---------:| :------------: | -| v1.0 | 15 | 2806 | 800~4000 | 118282 | OBB + HBB | -| v1.5 | 16 | 2806 | 800~4000 | 400000 | OBB + HBB | - -注:OBB标注方式是指标注任意四边形;顶点按顺时针顺序排列。HBB标注方式是指标注示例的外接矩形。 - -DOTA数据集中总共有2806张图像,其中1411张图像作为训练集,458张图像作为评估集,剩余937张图像作为测试集。 - -如果需要切割图像数据,请参考[DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit) 。 - -设置`crop_size=1024, stride=824, gap=200`参数切割数据后,训练集15749张图像,评估集5297张图像,测试集10833张图像。 - -### 自定义数据 - -数据标注有两种方式: - -- 第一种是标注旋转矩形,可以通过旋转矩形标注工具[roLabelImg](https://github.com/cgvict/roLabelImg) 来标注旋转矩形框。 - -- 第二种是标注四边形,通过脚本转成外接旋转矩形,这样得到的标注可能跟真实的物体框有一定误差。 - -然后将标注结果转换成coco标注格式,其中每个`bbox`的格式为 `[x_center, y_center, width, height, angle]`,这里角度以弧度表示。 - -参考[脊椎间盘数据集](https://aistudio.baidu.com/aistudio/datasetdetail/85885) ,我们将数据集划分为训练集(230)、测试集(57),数据地址为:[spine_coco](https://paddledet.bj.bcebos.com/data/spine_coco.tar) 。该数据集图像数量比较少,使用这个数据集可以快速训练S2ANet模型。 - - -## 开始训练 - -### 1. 安装旋转框IOU计算OP - -旋转框IOU计算OP[ext_op](../../ppdet/ext_op)是参考Paddle[自定义外部算子](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/new_custom_op.html) 的方式开发。 - -若使用旋转框IOU计算OP,需要环境满足: -- PaddlePaddle >= 2.1.1 -- GCC == 8.2 - -推荐使用docker镜像 paddle:2.1.1-gpu-cuda10.1-cudnn7。 - -执行如下命令下载镜像并启动容器: -``` -sudo nvidia-docker run -it --name paddle_s2anet -v $PWD:/paddle --network=host registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.1-cudnn7 /bin/bash -``` - -镜像中paddle已安装好,进入python3.7,执行如下代码检查paddle安装是否正常: -``` -import paddle -print(paddle.__version__) -paddle.utils.run_check() -``` - -进入到`ppdet/ext_op`文件夹,安装: -``` -python3.7 setup.py install -``` - -Windows环境请按照如下步骤安装: - -(1)准备Visual Studio (版本需要>=Visual Studio 2015 update3),这里以VS2017为例; - -(2)点击开始-->Visual Studio 2017-->适用于 VS 2017 的x64本机工具命令提示; - -(3)设置环境变量:`set DISTUTILS_USE_SDK=1` - -(4)进入`PaddleDetection/ppdet/ext_op`目录,通过`python3.7 setup.py install`命令进行安装。 - -安装完成后,测试自定义op是否可以正常编译以及计算结果: -``` -cd PaddleDetecetion/ppdet/ext_op -python3.7 test.py -``` - -### 2. 训练 -**注意:** -配置文件中学习率是按照8卡GPU训练设置的,如果使用单卡GPU训练,请将学习率设置为原来的1/8。 - -GPU单卡训练 -```bash -export CUDA_VISIBLE_DEVICES=0 -python3.7 tools/train.py -c configs/dota/s2anet_1x_spine.yml -``` - -GPU多卡训练 -```bash -export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -python3.7 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/dota/s2anet_1x_spine.yml -``` - -可以通过`--eval`开启边训练边测试。 - -### 3. 评估 -```bash -python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams - -# 使用提供训练好的模型评估 -python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams -``` -** 注意:** -(1) dota数据集中是train和val数据作为训练集一起训练的,对dota数据集进行评估时需要自定义设置评估数据集配置。 - -(2) 骨骼数据集是由分割数据转换而来,由于椎间盘不同类别对于检测任务而言区别很小,且s2anet算法最后得出的分数较低,评估时默认阈值为0.5,mAP较低是正常的。建议通过可视化查看检测结果。 - -### 4. 预测 -执行如下命令,会将图像预测结果保存到`output`文件夹下。 -```bash -python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 -``` -使用提供训练好的模型预测: -```bash -python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 -``` - -### 5. DOTA数据评估 -执行如下命令,会在`output`文件夹下将每个图像预测结果保存到同文件夹名的txt文本中。 -``` -python3.7 tools/infer.py -c configs/dota/s2anet_alignconv_2x_dota.yml -o weights=./weights/s2anet_alignconv_2x_dota.pdparams --infer_dir=dota_test_images --draw_threshold=0.05 --save_txt=True --output_dir=output -``` - -请参考[DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit) 生成评估文件,评估文件格式请参考[DOTA Test](http://captain.whu.edu.cn/DOTAweb/tasks.html) ,生成zip文件,每个类一个txt文件,txt文件中每行格式为:`image_id score x1 y1 x2 y2 x3 y3 x4 y4`,提交服务器进行评估。您也可以参考`dataset/dota_coco/dota_generate_test_result.py`脚本生成评估文件,提交到服务器。 - -## 模型库 - -### S2ANet模型 - -| 模型 | Conv类型 | mAP | 模型下载 | 配置文件 | -|:-----------:|:----------:|:--------:| :----------:| :---------: | -| S2ANet | Conv | 71.42 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_conv_2x_dota.yml) | -| S2ANet | AlignConv | 74.0 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_alignconv_2x_dota.yml) | - -**注意:** 这里使用`multiclass_nms`,与原作者使用nms略有不同。 - - -## 预测部署 - -Paddle中`multiclass_nms`算子的输入支持四边形输入,因此部署时可以不需要依赖旋转框IOU计算算子。 - -部署教程请参考[预测部署](../../deploy/README.md) - - -## Citations -``` -@article{han2021align, - author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}}, - journal={IEEE Transactions on Geoscience and Remote Sensing}, - title={Align Deep Features for Oriented Object Detection}, - year={2021}, - pages={1-11}, - doi={10.1109/TGRS.2021.3062048}} - -@inproceedings{xia2018dota, - title={DOTA: A large-scale dataset for object detection in aerial images}, - author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei}, - booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, - pages={3974--3983}, - year={2018} -} -``` diff --git a/configs/dota/README_en.md b/configs/dota/README_en.md deleted file mode 100644 index 61eeee7f5c53b7ec4e01c2a68c75f98f9a09bd14..0000000000000000000000000000000000000000 --- a/configs/dota/README_en.md +++ /dev/null @@ -1,185 +0,0 @@ -# S2ANet Model - -## Content -- [S2ANet Model](#s2anet-model) - - [Content](#content) - - [Introduction](#introduction) - - [Prepare Data](#prepare-data) - - [DOTA data](#dota-data) - - [Customize Data](#customize-data) - - [Start Training](#start-training) - - [1. Install the rotating frame IOU and calculate the OP](#1-install-the-rotating-frame-iou-and-calculate-the-op) - - [2. Train](#2-train) - - [3. Evaluation](#3-evaluation) - - [4. Prediction](#4-prediction) - - [5. DOTA Data evaluation](#5-dota-data-evaluation) - - [Model Library](#model-library) - - [S2ANet Model](#s2anet-model-1) - - [Predict Deployment](#predict-deployment) - - [Citations](#citations) - -## Introduction - -[S2ANet](https://arxiv.org/pdf/2008.09397.pdf) is used to detect rotating frame's model, required use of PaddlePaddle 2.1.1(can be installed using PIP) or proper [develop version](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release). - - -## Prepare Data - -### DOTA data -[DOTA Dataset] is a dataset of object detection in aerial images, which contains 2806 images with a resolution of 4000x4000 per image. - -| Data version | categories | images | size | instances | annotation method | -|:--------:|:-------:|:---------:|:---------:| :---------:| :------------: | -| v1.0 | 15 | 2806 | 800~4000 | 118282 | OBB + HBB | -| v1.5 | 16 | 2806 | 800~4000 | 400000 | OBB + HBB | - -Note: OBB annotation is an arbitrary quadrilateral; The vertices are arranged in clockwise order. The HBB annotation mode is the outer rectangle of the indicator note example. - -There were 2,806 images in the DOTA dataset, including 1,411 images as a training set, 458 images as an evaluation set, and the remaining 937 images as a test set. - -If you need to cut the image data, please refer to the [DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit). - -After setting `crop_size=1024, stride=824, gap=200` parameters to cut data, there are 15,749 images in the training set, 5,297 images in the evaluation set, and 10,833 images in the test set. - -### Customize Data - -There are two ways to annotate data: - -- The first is a tagging rotating rectangular, can pass rotating rectangular annotation tool [roLabelImg](https://github.com/cgvict/roLabelImg) to describe rotating rectangular box. - -- The second is to mark the quadrilateral, through the script into an external rotating rectangle, so that the obtained mark may have a certain error with the real object frame. - -Then convert the annotation result into coco annotation format, where each `bbox` is in the format of `[x_center, y_center, width, height, angle]`, where the angle is expressed in radians. - -Reference [spinal disk dataset](https://aistudio.baidu.com/aistudio/datasetdetail/85885), we divide dataset into training set (230), the test set (57), data address is: [spine_coco](https://paddledet.bj.bcebos.com/data/spine_coco.tar). The dataset has a small number of images, which can be used to train the S2ANet model quickly. - - -## Start Training - -### 1. Install the rotating frame IOU and calculate the OP - -Rotate box IoU calculate [ext_op](../../ppdet/ext_op) is a reference PaddlePaddle [custom external operator](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/new_custom_op.html). - -To use the rotating frame IOU to calculate the OP, the following conditions must be met: -- PaddlePaddle >= 2.1.1 -- GCC == 8.2 - -Docker images are recommended paddle:2.1.1-gpu-cuda10.1-cudnn7。 - -Run the following command to download the image and start the container: -``` -sudo nvidia-docker run -it --name paddle_s2anet -v $PWD:/paddle --network=host registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.1-cudnn7 /bin/bash -``` - -If the PaddlePaddle are installed in the mirror, go to python3.7 and run the following code to check whether the PaddlePaddle are installed properly: -``` -import paddle -print(paddle.__version__) -paddle.utils.run_check() -``` - -enter `ppdet/ext_op` directory, install: -``` -python3.7 setup.py install -``` - -In Windows, perform the following steps to install it: - -(1)Visual Studio (version required >= Visual Studio 2015 Update3); - -(2)Go to Start --> Visual Studio 2017 --> X64 native Tools command prompt for VS 2017; - -(3)Setting Environment Variables:`set DISTUTILS_USE_SDK=1` - -(4)Enter `PaddleDetection/ppdet/ext_op` directory,use `python3.7 setup.py install` to install。 - -After the installation, test whether the custom OP can compile normally and calculate the results: -``` -cd PaddleDetecetion/ppdet/ext_op -python3.7 test.py -``` - -### 2. Train -**Attention:** -In the configuration file, the learning rate is set based on the eight-card GPU training. If the single-card GPU training is used, set the learning rate to 1/8 of the original value. - -Single GPU Training -```bash -export CUDA_VISIBLE_DEVICES=0 -python3.7 tools/train.py -c configs/dota/s2anet_1x_spine.yml -``` - -Multiple GPUs Training -```bash -export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -python3.7 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/dota/s2anet_1x_spine.yml -``` - -You can use `--eval`to enable train-by-test. - -### 3. Evaluation -```bash -python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams - -# Use a trained model to evaluate -python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams -``` -**Attention:** -(1) The DOTA dataset is trained together with train and val data as a training set, and the evaluation dataset configuration needs to be customized when evaluating the DOTA dataset. - -(2) Bone dataset is transformed from segmented data. As there is little difference between different types of discs for detection tasks, and the score obtained by S2ANET algorithm is low, the default threshold for evaluation is 0.5, a low mAP is normal. You are advised to view the detection result visually. - -### 4. Prediction -Executing the following command will save the image prediction results to the `output` folder. -```bash -python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 -``` -Prediction using models that provide training: -```bash -python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 -``` - -### 5. DOTA Data evaluation -Execute the following command, will save each image prediction result in `output` folder txt text with the same folder name. -``` -python3.7 tools/infer.py -c configs/dota/s2anet_alignconv_2x_dota.yml -o weights=./weights/s2anet_alignconv_2x_dota.pdparams --infer_dir=dota_test_images --draw_threshold=0.05 --save_txt=True --output_dir=output -``` -Please refer to [DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit) generate assessment files, Assessment file format, please refer to [DOTA Test](http://captain.whu.edu.cn/DOTAweb/tasks.html), and generate the zip file, each class a txt file, every row in the txt file format for: `image_id score x1 y1 x2 y2 x3 y3 x4 y4` You can also reference the `dataset/dota_coco/dota_generate_test_result.py` script to generate an evaluation file and submit it to the server. - -## Model Library - -### S2ANet Model - -| Model | Conv Type | mAP | Model Download | Configuration File | -|:-----------:|:----------:|:--------:| :----------:| :---------: | -| S2ANet | Conv | 71.42 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_conv_2x_dota.yml) | -| S2ANet | AlignConv | 74.0 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_alignconv_2x_dota.yml) | - -**Attention:** `multiclass_nms` is used here, which is slightly different from the original author's use of NMS. - - -## Predict Deployment - -The inputs of the `multiclass_nms` operator in Paddle support quadrilateral inputs, so deployment can be done without relying on the rotating frame IOU operator. - -Please refer to the deployment tutorial[Predict deployment](../../deploy/README_en.md) - - -## Citations -``` -@article{han2021align, - author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}}, - journal={IEEE Transactions on Geoscience and Remote Sensing}, - title={Align Deep Features for Oriented Object Detection}, - year={2021}, - pages={1-11}, - doi={10.1109/TGRS.2021.3062048}} - -@inproceedings{xia2018dota, - title={DOTA: A large-scale dataset for object detection in aerial images}, - author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei}, - booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, - pages={3974--3983}, - year={2018} -} -``` diff --git a/configs/rotate/README.md b/configs/rotate/README.md new file mode 100644 index 0000000000000000000000000000000000000000..72d52014f066fd7456906b7345d22f87a3b882f4 --- /dev/null +++ b/configs/rotate/README.md @@ -0,0 +1,87 @@ +简体中文 | [English](README_en.md) + +# 旋转框检测 + +## 内容 +- [简介](#简介) +- [模型库](#模型库) +- [数据准备](#数据准备) +- [安装依赖](#安装依赖) + +## 简介 +旋转框常用于检测带有角度信息的矩形框,即矩形框的宽和高不再与图像坐标轴平行。相较于水平矩形框,旋转矩形框一般包括更少的背景信息。旋转框检测常用于遥感等场景中。 + +## 模型库 + +| 模型 | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 | +|:---:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:| +| [S2ANet](./s2anet/README.md) | 74.0 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_alignconv_2x_dota.yml) | + +**注意:** + +- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)** 调整学习率。 + +## 数据准备 +### DOTA数据准备 +DOTA数据集是一个大规模的遥感图像数据集,包含旋转框和水平框的标注。可以从[DOTA数据集官网](https://captain-whu.github.io/DOTA/)下载数据集并解压,解压后的数据集目录结构如下所示: +``` +${DOTA_ROOT} +├── test +│ └── images +├── train +│ ├── images +│ └── labelTxt +└── val + ├── images + └── labelTxt +``` + +DOTA数据集分辨率较高,因此一般在训练和测试之前对图像进行切图,使用单尺度进行切图可以使用以下命令: +``` +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \ + --output_dir ${OUTPUT_DIR}/trainval1024/ \ + --coco_json_file DOTA_trainval1024.json \ + --subsize 1024 \ + --gap 200 \ + --rates 1.0 +``` +使用多尺度进行切图可以使用以下命令: +``` +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \ + --output_dir ${OUTPUT_DIR}/trainval/ \ + --coco_json_file DOTA_trainval1024.json \ + --subsize 1024 \ + --gap 500 \ + --rates 0.5 1.0 1.5 \ +``` +对于无标注的数据可以设置`--image_only`进行切图,如下所示: +``` +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/test/ \ + --output_dir ${OUTPUT_DIR}/test1024/ \ + --coco_json_file DOTA_test1024.json \ + --subsize 1024 \ + --gap 200 \ + --rates 1.0 \ + --image_only +``` + +## 安装依赖 +旋转框检测模型需要依赖外部算子进行训练,评估等。Linux环境下,你可以执行以下命令进行编译安装 +``` +cd ppdet/ext_op +python setup.py install +``` +Windows环境请按照如下步骤安装: + +(1)准备Visual Studio (版本需要>=Visual Studio 2015 update3),这里以VS2017为例; + +(2)点击开始-->Visual Studio 2017-->适用于 VS 2017 的x64本机工具命令提示; + +(3)设置环境变量:`set DISTUTILS_USE_SDK=1` + +(4)进入`PaddleDetection/ppdet/ext_op`目录,通过`python setup.py install`命令进行安装。 + +安装完成后,可以执行`ppdet/ext_op/unittest`下的单测验证外部op是否正确安装 diff --git a/configs/rotate/README_en.md b/configs/rotate/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..a91f66a9d61c576070b954b312676aea54a2d4ec --- /dev/null +++ b/configs/rotate/README_en.md @@ -0,0 +1,86 @@ +English | [简体中文](README.md) + +# Rotated Object Detection + +## Table of Contents +- [Introduction](#Introduction) +- [Model Zoo](#Model-Zoo) +- [Data Preparation](#Data-Preparation) +- [Installation](#Installation) + +## Introduction +Rotated object detection is used to detect rectangular bounding boxes with angle information, that is, the long and short sides of the rectangular bounding box are no longer parallel to the image coordinate axes. Oriented bounding boxes generally contain less background information than horizontal bounding boxes. Rotated object detection is often used in remote sensing scenarios. + +## Model Zoo +| Model | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config | +|:---:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:| +| [S2ANet](./s2anet/README.md) | 74.0 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_alignconv_2x_dota.yml) | + +**Notes:** + +- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lrnew = lrdefault * (batch_sizenew * GPU_numbernew) / (batch_sizedefault * GPU_numberdefault)**. + +## Data Preparation +### DOTA Dataset preparation +The DOTA dataset is a large-scale remote sensing image dataset containing annotations of oriented and horizontal bounding boxes. The dataset can be download from [Official Website of DOTA Dataset](https://captain-whu.github.io/DOTA/). When the dataset is decompressed, its directory structure is shown as follows. +``` +${DOTA_ROOT} +├── test +│ └── images +├── train +│ ├── images +│ └── labelTxt +└── val + ├── images + └── labelTxt +``` + +The image resolution of DOTA dataset is relatively high, so we usually slice the images before training and testing. To slice the images with a single scale, you can use the command below +``` +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \ + --output_dir ${OUTPUT_DIR}/trainval1024/ \ + --coco_json_file DOTA_trainval1024.json \ + --subsize 1024 \ + --gap 200 \ + --rates 1.0 +``` +To slice the images with multiple scales, you can use the command below +``` +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \ + --output_dir ${OUTPUT_DIR}/trainval/ \ + --coco_json_file DOTA_trainval1024.json \ + --subsize 1024 \ + --gap 500 \ + --rates 0.5 1.0 1.5 \ +``` +For data without annotations, you should set `--image_only` as follows +``` +python configs/rotate/tools/prepare_data.py \ + --input_dirs ${DOTA_ROOT}/test/ \ + --output_dir ${OUTPUT_DIR}/test1024/ \ + --coco_json_file DOTA_test1024.json \ + --subsize 1024 \ + --gap 200 \ + --rates 1.0 \ + --image_only +``` + +## Installation +Models of rotated object detection depend on external operators for training, evaluation, etc. In Linux environment, you can execute the following command to compile and install. +``` +cd ppdet/ext_op +python setup.py install +``` +In Windows environment, perform the following steps to install it: + +(1)Visual Studio (version required >= Visual Studio 2015 Update3); + +(2)Go to Start --> Visual Studio 2017 --> X64 native Tools command prompt for VS 2017; + +(3)Setting Environment Variables:set DISTUTILS_USE_SDK=1 + +(4)Enter `ppdet/ext_op` directory,use `python setup.py install` to install。 + +After the installation, you can execute the unittest of `ppdet/ext_op/unittest` to verify whether the external oprators is installed correctly. diff --git a/configs/rotate/s2anet/README.md b/configs/rotate/s2anet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..c76364d8e2b8158638bdca393f4f4a8864759c2b --- /dev/null +++ b/configs/rotate/s2anet/README.md @@ -0,0 +1,96 @@ +# S2ANet模型 + +## 内容 +- [简介](#简介) +- [开始训练](#开始训练) +- [模型库](#模型库) +- [预测部署](#预测部署) + +## 简介 + +[S2ANet](https://arxiv.org/pdf/2008.09397.pdf)是用于检测旋转框的模型,在DOTA 1.0数据集上单尺度训练能达到74.0的mAP. + +## 开始训练 + +### 1. 训练 + +GPU单卡训练 +```bash +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml +``` + +GPU多卡训练 +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml +``` + +可以通过`--eval`开启边训练边测试。 + +### 3. 评估 +```bash +python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams + +# 使用提供训练好的模型评估 +python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams +``` + +### 4. 预测 +执行如下命令,会将图像预测结果保存到`output`文件夹下。 +```bash +python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 +``` +使用提供训练好的模型预测: +```bash +python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 +``` + +### 5. DOTA数据评估 +执行如下命令,会在`output`文件夹下将每个图像预测结果保存到同文件夹名的txt文本中。 +``` +python tools/infer.py -c configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml -o weights=./weights/s2anet_alignconv_2x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output --visualize=False --save_results=True +``` +参考[DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), 评估DOTA数据集需要生成一个包含所有检测结果的zip文件,每一类的检测结果储存在一个txt文件中,txt文件中每行格式为:`image_name score x1 y1 x2 y2 x3 y3 x4 y4`。将生成的zip文件提交到[DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html)的Task1进行评估。你可以执行以下命令生成评估文件 +``` +python configs/rotate/tools/generate_result.py --pred_txt_dir=output/ --output_dir=submit/ --data_type=dota10 +zip -r submit.zip submit +``` + +## 模型库 + +### S2ANet模型 + +| 模型 | Conv类型 | mAP | 模型下载 | 配置文件 | +|:-----------:|:----------:|:--------:| :----------:| :---------: | +| S2ANet | Conv | 71.42 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_conv_2x_dota.yml) | +| S2ANet | AlignConv | 74.0 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) | + +**注意:** 这里使用`multiclass_nms`,与原作者使用nms略有不同。 + + +## 预测部署 + +Paddle中`multiclass_nms`算子的输入支持四边形输入,因此部署时可以不需要依赖旋转框IOU计算算子。 + +部署教程请参考[预测部署](../../deploy/README.md) + + +## Citations +``` +@article{han2021align, + author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}}, + journal={IEEE Transactions on Geoscience and Remote Sensing}, + title={Align Deep Features for Oriented Object Detection}, + year={2021}, + pages={1-11}, + doi={10.1109/TGRS.2021.3062048}} + +@inproceedings{xia2018dota, + title={DOTA: A large-scale dataset for object detection in aerial images}, + author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={3974--3983}, + year={2018} +} +``` diff --git a/configs/rotate/s2anet/README_en.md b/configs/rotate/s2anet/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..70da7660b8b4aca16cdce5f9f8acc1ab4bc1f17b --- /dev/null +++ b/configs/rotate/s2anet/README_en.md @@ -0,0 +1,105 @@ +# S2ANet Model + +## Content +- [S2ANet Model](#s2anet-model) + - [Content](#content) + - [Introduction](#introduction) + - [Start Training](#start-training) + - [1. Train](#1-train) + - [2. Evaluation](#2-evaluation) + - [3. Prediction](#3-prediction) + - [4. DOTA Data evaluation](#4-dota-data-evaluation) + - [Model Library](#model-library) + - [S2ANet Model](#s2anet-model-1) + - [Predict Deployment](#predict-deployment) + - [Citations](#citations) + +## Introduction + +[S2ANet](https://arxiv.org/pdf/2008.09397.pdf) is used to detect rotated objects and acheives 74.0 mAP on DOTA 1.0 dataset. + +## Start Training + +### 2. Train + +Single GPU Training +```bash +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml +``` + +Multiple GPUs Training +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml +``` + +You can use `--eval`to enable train-by-test. + +### 3. Evaluation +```bash +python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams + +# Use a trained model to evaluate +python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams +``` + +### 4. Prediction +Executing the following command will save the image prediction results to the `output` folder. +```bash +python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 +``` +Prediction using models that provide training: +```bash +python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 +``` + +### 5. DOTA Data evaluation +Execute the following command, will save each image prediction result in `output` folder txt text with the same folder name. +``` +python tools/infer.py -c configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml -o weights=./weights/s2anet_alignconv_2x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output --visualize=False --save_results=True +``` +Refering to [DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), You need to submit a zip file containing results for all test images for evaluation. The detection results of each category are stored in a txt file, each line of which is in the following format +`image_id score x1 y1 x2 y2 x3 y3 x4 y4`. To evaluate, you should submit the generated zip file to the Task1 of [DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html). You can execute the following command to generate the file +``` +python configs/rotate/tools/generate_result.py --pred_txt_dir=output/ --output_dir=submit/ --data_type=dota10 +zip -r submit.zip submit +``` + +## Model Library + +### S2ANet Model + +| Model | Conv Type | mAP | Model Download | Configuration File | +|:-----------:|:----------:|:--------:| :----------:| :---------: | +| S2ANet | Conv | 71.42 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_conv_2x_dota.yml) | +| S2ANet | AlignConv | 74.0 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) | + +**Attention:** `multiclass_nms` is used here, which is slightly different from the original author's use of NMS. + + +## Predict Deployment + +The inputs of the `multiclass_nms` operator in Paddle support quadrilateral inputs, so deployment can be done without relying on the rotating frame IOU operator. + +Please refer to the deployment tutorial[Predict deployment](../../deploy/README_en.md) + + +## Citations +``` +@article{han2021align, + author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}}, + journal={IEEE Transactions on Geoscience and Remote Sensing}, + title={Align Deep Features for Oriented Object Detection}, + year={2021}, + pages={1-11}, + doi={10.1109/TGRS.2021.3062048}} + +@inproceedings{xia2018dota, + title={DOTA: A large-scale dataset for object detection in aerial images}, + author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={3974--3983}, + year={2018} +} +``` diff --git a/configs/dota/_base_/s2anet.yml b/configs/rotate/s2anet/_base_/s2anet.yml similarity index 100% rename from configs/dota/_base_/s2anet.yml rename to configs/rotate/s2anet/_base_/s2anet.yml diff --git a/configs/dota/_base_/s2anet_optimizer_1x.yml b/configs/rotate/s2anet/_base_/s2anet_optimizer_1x.yml similarity index 100% rename from configs/dota/_base_/s2anet_optimizer_1x.yml rename to configs/rotate/s2anet/_base_/s2anet_optimizer_1x.yml diff --git a/configs/dota/_base_/s2anet_optimizer_2x.yml b/configs/rotate/s2anet/_base_/s2anet_optimizer_2x.yml similarity index 100% rename from configs/dota/_base_/s2anet_optimizer_2x.yml rename to configs/rotate/s2anet/_base_/s2anet_optimizer_2x.yml diff --git a/configs/dota/_base_/s2anet_reader.yml b/configs/rotate/s2anet/_base_/s2anet_reader.yml similarity index 96% rename from configs/dota/_base_/s2anet_reader.yml rename to configs/rotate/s2anet/_base_/s2anet_reader.yml index 36ac1fd687b53686a06450444f50246c159818b2..7d0fc15e002f8fe0772a7feea241418f9a2ada42 100644 --- a/configs/dota/_base_/s2anet_reader.yml +++ b/configs/rotate/s2anet/_base_/s2anet_reader.yml @@ -2,7 +2,7 @@ worker_num: 4 TrainReader: sample_transforms: - Decode: {} - - Rbox2Poly: {} + - Poly2Array: {} - RandomRFlip: {} - RResize: {target_size: [1024, 1024], keep_ratio: True, interp: 2} - Poly2RBox: {rbox_type: 'le135'} @@ -19,6 +19,7 @@ TrainReader: EvalReader: sample_transforms: - Decode: {} + - Poly2Array: {} - RResize: {target_size: [1024, 1024], keep_ratio: True, interp: 2} - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} - Permute: {} diff --git a/configs/dota/s2anet_1x_spine.yml b/configs/rotate/s2anet/s2anet_1x_spine.yml similarity index 87% rename from configs/dota/s2anet_1x_spine.yml rename to configs/rotate/s2anet/s2anet_1x_spine.yml index 965db4d183467b7c351b9e70c90339f309c689f9..550586f45ce293b2edd082d6fe700b97c53c35f3 100644 --- a/configs/dota/s2anet_1x_spine.yml +++ b/configs/rotate/s2anet/s2anet_1x_spine.yml @@ -1,6 +1,6 @@ _BASE_: [ - '../datasets/spine_coco.yml', - '../runtime.yml', + '../../datasets/spine_coco.yml', + '../../runtime.yml', '_base_/s2anet_optimizer_1x.yml', '_base_/s2anet.yml', '_base_/s2anet_reader.yml', @@ -9,7 +9,7 @@ _BASE_: [ weights: output/s2anet_1x_spine/model_final pretrain_weights: https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams -# for 8 card +# for 4 card LearningRate: base_lr: 0.01 schedulers: diff --git a/configs/dota/s2anet_alignconv_2x_dota.yml b/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml similarity index 83% rename from configs/dota/s2anet_alignconv_2x_dota.yml rename to configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml index f2ecac202e5e617589bb291d16aa214b1ee6fe06..1b3e9eb4636dc56e2cb97142e2a9b3f4c16bb84d 100644 --- a/configs/dota/s2anet_alignconv_2x_dota.yml +++ b/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml @@ -1,6 +1,6 @@ _BASE_: [ - '../datasets/dota.yml', - '../runtime.yml', + '../../datasets/dota.yml', + '../../runtime.yml', '_base_/s2anet_optimizer_2x.yml', '_base_/s2anet.yml', '_base_/s2anet_reader.yml', diff --git a/configs/dota/s2anet_conv_2x_dota.yml b/configs/rotate/s2anet/s2anet_conv_2x_dota.yml similarity index 87% rename from configs/dota/s2anet_conv_2x_dota.yml rename to configs/rotate/s2anet/s2anet_conv_2x_dota.yml index c8e0a1b845de56e901cfc4a9d0ce60b8f6953e6a..34d136d865b5c4692f69356a6a22835248efe970 100644 --- a/configs/dota/s2anet_conv_2x_dota.yml +++ b/configs/rotate/s2anet/s2anet_conv_2x_dota.yml @@ -1,6 +1,6 @@ _BASE_: [ - '../datasets/dota.yml', - '../runtime.yml', + '../../datasets/dota.yml', + '../../runtime.yml', '_base_/s2anet_optimizer_2x.yml', '_base_/s2anet.yml', '_base_/s2anet_reader.yml', diff --git a/configs/rotate/tools/convert.py b/configs/rotate/tools/convert.py new file mode 100644 index 0000000000000000000000000000000000000000..cf5bdd01f9ed024f64df10658ff3e5b91efd82ad --- /dev/null +++ b/configs/rotate/tools/convert.py @@ -0,0 +1,163 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Reference: https://github.com/CAPTAIN-WHU/DOTA_devkit + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import json +import cv2 +from tqdm import tqdm +from multiprocessing import Pool + + +def load_dota_info(image_dir, anno_dir, file_name, ext=None): + base_name, extension = os.path.splitext(file_name) + if ext and (extension != ext and extension not in ext): + return None + info = {'image_file': os.path.join(image_dir, file_name), 'annotation': []} + anno_file = os.path.join(anno_dir, base_name + '.txt') + if not os.path.exists(anno_file): + return info + with open(anno_file, 'r') as f: + for line in f: + items = line.strip().split() + if (len(items) < 9): + continue + + anno = { + 'poly': list(map(float, items[:8])), + 'name': items[8], + 'difficult': '0' if len(items) == 9 else items[9], + } + info['annotation'].append(anno) + + return info + + +def load_dota_infos(root_dir, num_process=8, ext=None): + image_dir = os.path.join(root_dir, 'images') + anno_dir = os.path.join(root_dir, 'labelTxt') + data_infos = [] + if num_process > 1: + pool = Pool(num_process) + results = [] + for file_name in os.listdir(image_dir): + results.append( + pool.apply_async(load_dota_info, (image_dir, anno_dir, + file_name, ext))) + + pool.close() + pool.join() + + for result in results: + info = result.get() + if info: + data_infos.append(info) + + else: + for file_name in os.listdir(image_dir): + info = load_dota_info(image_dir, anno_dir, file_name, ext) + if info: + data_infos.append(info) + + return data_infos + + +def process_single_sample(info, image_id, class_names): + image_file = info['image_file'] + single_image = dict() + single_image['file_name'] = os.path.split(image_file)[-1] + single_image['id'] = image_id + image = cv2.imread(image_file) + height, width, _ = image.shape + single_image['width'] = width + single_image['height'] = height + + # process annotation field + single_objs = [] + objects = info['annotation'] + for obj in objects: + poly, name, difficult = obj['poly'], obj['name'], obj['difficult'] + if difficult == '2': + continue + + single_obj = dict() + single_obj['category_id'] = class_names.index(name) + 1 + single_obj['segmentation'] = [poly] + single_obj['iscrowd'] = 0 + xmin, ymin, xmax, ymax = min(poly[0::2]), min(poly[1::2]), max(poly[ + 0::2]), max(poly[1::2]) + width, height = xmax - xmin, ymax - ymin + single_obj['bbox'] = [xmin, ymin, width, height] + single_obj['area'] = height * width + single_obj['image_id'] = image_id + single_objs.append(single_obj) + + return (single_image, single_objs) + + +def data_to_coco(infos, output_path, class_names, num_process): + data_dict = dict() + data_dict['categories'] = [] + + for i, name in enumerate(class_names): + data_dict['categories'].append({ + 'id': i + 1, + 'name': name, + 'supercategory': name + }) + + pbar = tqdm(total=len(infos), desc='data to coco') + images, annotations = [], [] + if num_process > 1: + pool = Pool(num_process) + results = [] + for i, info in enumerate(infos): + image_id = i + 1 + results.append( + pool.apply_async( + process_single_sample, (info, image_id, class_names), + callback=lambda x: pbar.update())) + + pool.close() + pool.join() + + for result in results: + single_image, single_anno = result.get() + images.append(single_image) + annotations += single_anno + + else: + for i, info in enumerate(infos): + image_id = i + 1 + single_image, single_anno = process_single_sample(info, image_id, + class_names) + images.append(single_image) + annotations += single_anno + pbar.update() + + pbar.close() + + for i, anno in enumerate(annotations): + anno['id'] = i + 1 + + data_dict['images'] = images + data_dict['annotations'] = annotations + + with open(output_path, 'w') as f: + json.dump(data_dict, f) diff --git a/dataset/dota_coco/dota_generate_test_result.py b/configs/rotate/tools/generate_result.py similarity index 82% rename from dataset/dota_coco/dota_generate_test_result.py rename to configs/rotate/tools/generate_result.py index 44c8f1804bcff723e1dbb5df2e172340c4b1e41e..a103b9d63bf43dc134189dcb56ed358a15ef39ee 100644 --- a/dataset/dota_coco/dota_generate_test_result.py +++ b/configs/rotate/tools/generate_result.py @@ -22,21 +22,22 @@ from functools import partial from shapely.geometry import Polygon import argparse -nms_thresh = 0.1 - -class_name_15 = [ +wordname_15 = [ 'plane', 'baseball-diamond', 'bridge', 'ground-track-field', 'small-vehicle', 'large-vehicle', 'ship', 'tennis-court', 'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout', 'harbor', 'swimming-pool', 'helicopter' ] -class_name_16 = [ - 'plane', 'baseball-diamond', 'bridge', 'ground-track-field', - 'small-vehicle', 'large-vehicle', 'ship', 'tennis-court', - 'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout', - 'harbor', 'swimming-pool', 'helicopter', 'container-crane' -] +wordname_16 = wordname_15 + ['container-crane'] + +wordname_18 = wordname_16 + ['airport', 'helipad'] + +DATA_CLASSES = { + 'dota10': wordname_15, + 'dota15': wordname_16, + 'dota20': wordname_18 +} def rbox_iou(g, p): @@ -99,14 +100,11 @@ def py_cpu_nms_poly_fast(dets, thresh): h = np.maximum(0.0, yy2 - yy1) hbb_inter = w * h hbb_ovr = hbb_inter / (areas[i] + areas[order[1:]] - hbb_inter) - # h_keep_inds = np.where(hbb_ovr == 0)[0] h_inds = np.where(hbb_ovr > 0)[0] tmp_order = order[h_inds + 1] for j in range(tmp_order.size): iou = rbox_iou(polys[i], polys[tmp_order[j]]) hbb_ovr[h_inds[j]] = iou - # ovr.append(iou) - # ovr_index.append(tmp_order[j]) try: if math.isnan(ovr[0]): @@ -148,7 +146,7 @@ def nmsbynamedict(nameboxdict, nms, thresh): return nameboxnmsdict -def merge_single(output_dir, nms, pred_class_lst): +def merge_single(output_dir, nms, nms_thresh, pred_class_lst): """ Args: output_dir: output_dir @@ -198,20 +196,20 @@ def merge_single(output_dir, nms, pred_class_lst): f_out.write(outline + '\n') -def dota_generate_test_result(pred_txt_dir, - output_dir='output', - dota_version='v1.0'): +def generate_result(pred_txt_dir, + output_dir='output', + class_names=wordname_15, + nms_thresh=0.1): """ pred_txt_dir: dir of pred txt output_dir: dir of output - dota_version: dota_version v1.0 or v1.5 or v2.0 + class_names: class names of data """ pred_txt_list = glob.glob("{}/*.txt".format(pred_txt_dir)) # step1: summary pred bbox pred_classes = {} - class_lst = class_name_15 if dota_version == 'v1.0' else class_name_16 - for class_name in class_lst: + for class_name in class_names: pred_classes[class_name] = [] for current_txt in pred_txt_list: @@ -233,26 +231,36 @@ def dota_generate_test_result(pred_txt_dir, pred_classes_lst.append((class_name, pred_classes[class_name])) # step2: merge - pool = Pool(len(class_lst)) + pool = Pool(len(class_names)) nms = py_cpu_nms_poly_fast - mergesingle_fn = partial(merge_single, output_dir, nms) + mergesingle_fn = partial(merge_single, output_dir, nms, nms_thresh) pool.map(mergesingle_fn, pred_classes_lst) -if __name__ == '__main__': - parser = argparse.ArgumentParser(description='dota anno to coco') - parser.add_argument('--pred_txt_dir', help='path of pred txt dir') +def parse_args(): + parser = argparse.ArgumentParser(description='generate test results') + parser.add_argument('--pred_txt_dir', type=str, help='path of pred txt dir') + parser.add_argument( + '--output_dir', type=str, default='output', help='path of output dir') parser.add_argument( - '--output_dir', help='path of output dir', default='output') + '--data_type', type=str, default='dota10', help='data type') parser.add_argument( - '--dota_version', - help='dota_version, v1.0 or v1.5 or v2.0', - type=str, - default='v1.0') + '--nms_thresh', + type=float, + default=0.1, + help='nms threshold whild merging results') + + return parser.parse_args() + + +if __name__ == '__main__': + args = parse_args() + + output_dir = args.output_dir + if not os.path.exists(output_dir): + os.makedirs(output_dir) - args = parser.parse_args() + class_names = DATA_CLASSES[args.data_type] - # process - dota_generate_test_result(args.pred_txt_dir, args.output_dir, - args.dota_version) + generate_result(args.pred_txt_dir, output_dir, class_names) print('done!') diff --git a/configs/rotate/tools/prepare_data.py b/configs/rotate/tools/prepare_data.py new file mode 100644 index 0000000000000000000000000000000000000000..7652edae27dc4bacdc30caa56b314b4b2c92188d --- /dev/null +++ b/configs/rotate/tools/prepare_data.py @@ -0,0 +1,128 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import argparse +from convert import load_dota_infos, data_to_coco +from slicebase import SliceBase + +wordname_15 = [ + 'plane', 'baseball-diamond', 'bridge', 'ground-track-field', + 'small-vehicle', 'large-vehicle', 'ship', 'tennis-court', + 'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout', + 'harbor', 'swimming-pool', 'helicopter' +] + +wordname_16 = wordname_15 + ['container-crane'] + +wordname_18 = wordname_16 + ['airport', 'helipad'] + +DATA_CLASSES = { + 'dota10': wordname_15, + 'dota15': wordname_16, + 'dota20': wordname_18 +} + + +def parse_args(): + parser = argparse.ArgumentParser('prepare data for training') + + parser.add_argument( + '--input_dirs', + nargs='+', + type=str, + default=None, + help='input dirs which contain image and labelTxt dir') + + parser.add_argument( + '--output_dir', + type=str, + default=None, + help='output dirs which contain image and labelTxt dir and coco style json file' + ) + + parser.add_argument( + '--coco_json_file', + type=str, + default='', + help='coco json annotation files') + + parser.add_argument('--subsize', type=int, default=1024, help='patch size') + + parser.add_argument('--gap', type=int, default=200, help='step size') + + parser.add_argument( + '--data_type', type=str, default='dota10', help='data type') + + parser.add_argument( + '--rates', + nargs='+', + type=float, + default=[1.], + help='scales for multi-sclace training') + + parser.add_argument( + '--nproc', type=int, default=8, help='the processor number') + + parser.add_argument( + '--iof_thr', + type=float, + default=0.5, + help='the minimal iof between a object and a window') + + parser.add_argument( + '--image_only', + action='store_true', + default=False, + help='only processing image') + + args = parser.parse_args() + return args + + +def load_dataset(input_dir, nproc, data_type): + if 'dota' in data_type.lower(): + infos = load_dota_infos(input_dir, nproc) + else: + raise ValueError('only dota dataset is supported now') + + return infos + + +def main(): + args = parse_args() + infos = [] + for input_dir in args.input_dirs: + infos += load_dataset(input_dir, args.nproc, args.data_type) + + slicer = SliceBase( + args.gap, + args.subsize, + args.iof_thr, + num_process=args.nproc, + image_only=args.image_only) + slicer.slice_data(infos, args.rates, args.output_dir) + if args.coco_json_file: + infos = load_dota_infos(args.output_dir, args.nproc) + coco_json_file = os.path.join(args.output_dir, args.coco_json_file) + class_names = DATA_CLASSES[args.data_type] + data_to_coco(infos, coco_json_file, class_names, args.nproc) + + +if __name__ == '__main__': + main() diff --git a/configs/rotate/tools/slicebase.py b/configs/rotate/tools/slicebase.py new file mode 100644 index 0000000000000000000000000000000000000000..515dd5f8c36d9be3769bcd2050620b976a302840 --- /dev/null +++ b/configs/rotate/tools/slicebase.py @@ -0,0 +1,265 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Reference: https://github.com/CAPTAIN-WHU/DOTA_devkit + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import math +import copy +from numbers import Number +from multiprocessing import Pool + +import cv2 +import numpy as np +from tqdm import tqdm +import shapely.geometry as shgeo + + +def choose_best_pointorder_fit_another(poly1, poly2): + """ + To make the two polygons best fit with each point + """ + x1, y1, x2, y2, x3, y3, x4, y4 = poly1 + combinate = [ + np.array([x1, y1, x2, y2, x3, y3, x4, y4]), + np.array([x2, y2, x3, y3, x4, y4, x1, y1]), + np.array([x3, y3, x4, y4, x1, y1, x2, y2]), + np.array([x4, y4, x1, y1, x2, y2, x3, y3]) + ] + dst_coordinate = np.array(poly2) + distances = np.array( + [np.sum((coord - dst_coordinate)**2) for coord in combinate]) + sorted = distances.argsort() + return combinate[sorted[0]] + + +def cal_line_length(point1, point2): + return math.sqrt( + math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1], 2)) + + +class SliceBase(object): + def __init__(self, + gap=512, + subsize=1024, + thresh=0.7, + choosebestpoint=True, + ext='.png', + padding=True, + num_process=8, + image_only=False): + self.gap = gap + self.subsize = subsize + self.slide = subsize - gap + self.thresh = thresh + self.choosebestpoint = choosebestpoint + self.ext = ext + self.padding = padding + self.num_process = num_process + self.image_only = image_only + + def get_windows(self, height, width): + windows = [] + left, up = 0, 0 + while (left < width): + if (left + self.subsize >= width): + left = max(width - self.subsize, 0) + up = 0 + while (up < height): + if (up + self.subsize >= height): + up = max(height - self.subsize, 0) + right = min(left + self.subsize, width - 1) + down = min(up + self.subsize, height - 1) + windows.append((left, up, right, down)) + if (up + self.subsize >= height): + break + else: + up = up + self.slide + if (left + self.subsize >= width): + break + else: + left = left + self.slide + + return windows + + def slice_image_single(self, image, windows, output_dir, output_name): + image_dir = os.path.join(output_dir, 'images') + for (left, up, right, down) in windows: + image_name = output_name + str(left) + '___' + str(up) + self.ext + subimg = copy.deepcopy(image[up:up + self.subsize, left:left + + self.subsize]) + h, w, c = subimg.shape + if (self.padding): + outimg = np.zeros((self.subsize, self.subsize, 3)) + outimg[0:h, 0:w, :] = subimg + cv2.imwrite(os.path.join(image_dir, image_name), outimg) + else: + cv2.imwrite(os.path.join(image_dir, image_name), subimg) + + def iof(self, poly1, poly2): + inter_poly = poly1.intersection(poly2) + inter_area = inter_poly.area + poly1_area = poly1.area + half_iou = inter_area / poly1_area + return inter_poly, half_iou + + def translate(self, poly, left, up): + n = len(poly) + out_poly = np.zeros(n) + for i in range(n // 2): + out_poly[i * 2] = int(poly[i * 2] - left) + out_poly[i * 2 + 1] = int(poly[i * 2 + 1] - up) + return out_poly + + def get_poly4_from_poly5(self, poly): + distances = [ + cal_line_length((poly[i * 2], poly[i * 2 + 1]), + (poly[(i + 1) * 2], poly[(i + 1) * 2 + 1])) + for i in range(int(len(poly) / 2 - 1)) + ] + distances.append( + cal_line_length((poly[0], poly[1]), (poly[8], poly[9]))) + pos = np.array(distances).argsort()[0] + count = 0 + out_poly = [] + while count < 5: + if (count == pos): + out_poly.append( + (poly[count * 2] + poly[(count * 2 + 2) % 10]) / 2) + out_poly.append( + (poly[(count * 2 + 1) % 10] + poly[(count * 2 + 3) % 10]) / + 2) + count = count + 1 + elif (count == (pos + 1) % 5): + count = count + 1 + continue + + else: + out_poly.append(poly[count * 2]) + out_poly.append(poly[count * 2 + 1]) + count = count + 1 + return out_poly + + def slice_anno_single(self, annos, windows, output_dir, output_name): + anno_dir = os.path.join(output_dir, 'labelTxt') + for (left, up, right, down) in windows: + image_poly = shgeo.Polygon( + [(left, up), (right, up), (right, down), (left, down)]) + anno_file = output_name + str(left) + '___' + str(up) + '.txt' + with open(os.path.join(anno_dir, anno_file), 'w') as f: + for anno in annos: + gt_poly = shgeo.Polygon( + [(anno['poly'][0], anno['poly'][1]), + (anno['poly'][2], anno['poly'][3]), + (anno['poly'][4], anno['poly'][5]), + (anno['poly'][6], anno['poly'][7])]) + if gt_poly.area <= 0: + continue + inter_poly, iof = self.iof(gt_poly, image_poly) + if iof == 1: + final_poly = self.translate(anno['poly'], left, up) + elif iof > 0: + inter_poly = shgeo.polygon.orient(inter_poly, sign=1) + out_poly = list(inter_poly.exterior.coords)[0:-1] + if len(out_poly) < 4 or len(out_poly) > 5: + continue + + final_poly = [] + for p in out_poly: + final_poly.append(p[0]) + final_poly.append(p[1]) + + if len(out_poly) == 5: + final_poly = self.get_poly4_from_poly5(final_poly) + + if self.choosebestpoint: + final_poly = choose_best_pointorder_fit_another( + final_poly, anno['poly']) + + final_poly = self.translate(final_poly, left, up) + final_poly = np.clip(final_poly, 1, self.subsize) + else: + continue + outline = ' '.join(list(map(str, final_poly))) + if iof >= self.thresh: + outline = outline + ' ' + anno['name'] + ' ' + str(anno[ + 'difficult']) + else: + outline = outline + ' ' + anno['name'] + ' ' + '2' + + f.write(outline + '\n') + + def slice_data_single(self, info, rate, output_dir): + file_name = info['image_file'] + base_name = os.path.splitext(os.path.split(file_name)[-1])[0] + base_name = base_name + '__' + str(rate) + '__' + img = cv2.imread(file_name) + if img.shape == (): + return + + if (rate != 1): + resize_img = cv2.resize( + img, None, fx=rate, fy=rate, interpolation=cv2.INTER_CUBIC) + else: + resize_img = img + + height, width, _ = resize_img.shape + windows = self.get_windows(height, width) + self.slice_image_single(resize_img, windows, output_dir, base_name) + if not self.image_only: + self.slice_anno_single(info['annotation'], windows, output_dir, + base_name) + + def check_or_mkdirs(self, path): + if not os.path.exists(path): + os.makedirs(path, exist_ok=True) + + def slice_data(self, infos, rates, output_dir): + """ + Args: + infos (list[dict]): data_infos + rates (float, list): scale rates + output_dir (str): output directory + """ + if isinstance(rates, Number): + rates = [rates, ] + + self.check_or_mkdirs(output_dir) + self.check_or_mkdirs(os.path.join(output_dir, 'images')) + if not self.image_only: + self.check_or_mkdirs(os.path.join(output_dir, 'labelTxt')) + + pbar = tqdm(total=len(rates) * len(infos), desc='slicing data') + + if self.num_process <= 1: + for rate in rates: + for info in infos: + self.slice_data_single(info, rate, output_dir) + pbar.update() + else: + pool = Pool(self.num_process) + for rate in rates: + for info in infos: + pool.apply_async( + self.slice_data_single, (info, rate, output_dir), + callback=lambda x: pbar.update()) + + pool.close() + pool.join() + + pbar.close() diff --git a/dataset/dota/.gitignore b/dataset/dota/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/dota_coco/dota_to_coco.py b/dataset/dota_coco/dota_to_coco.py deleted file mode 100644 index 3aa557b8b3f9bce0a8637ca1a9266ab298ea23fc..0000000000000000000000000000000000000000 --- a/dataset/dota_coco/dota_to_coco.py +++ /dev/null @@ -1,163 +0,0 @@ -# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import sys -import os.path as osp -import json -import glob -import cv2 -import argparse - -# add python path of PadleDetection to sys.path -parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3))) -if parent_path not in sys.path: - sys.path.append(parent_path) - -from ppdet.modeling.bbox_utils import poly2rbox -from ppdet.utils.logger import setup_logger -logger = setup_logger(__name__) - -class_name_15 = [ - 'plane', 'baseball-diamond', 'bridge', 'ground-track-field', - 'small-vehicle', 'large-vehicle', 'ship', 'tennis-court', - 'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout', - 'harbor', 'swimming-pool', 'helicopter' -] - -class_name_16 = [ - 'plane', 'baseball-diamond', 'bridge', 'ground-track-field', - 'small-vehicle', 'large-vehicle', 'ship', 'tennis-court', - 'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout', - 'harbor', 'swimming-pool', 'helicopter', 'container-crane' -] - - -def dota_2_coco(image_dir, - txt_dir, - json_path='dota_coco.json', - is_obb=True, - dota_version='v1.0'): - """ - image_dir: image dir - txt_dir: txt label dir - json_path: json save path - is_obb: is obb or not - dota_version: dota_version v1.0 or v1.5 or v2.0 - """ - - img_lists = glob.glob("{}/*.png".format(image_dir)) - data_dict = {} - data_dict['images'] = [] - data_dict['categories'] = [] - data_dict['annotations'] = [] - inst_count = 0 - - # categories - class_name2id = {} - if dota_version == 'v1.0': - for class_id, class_name in enumerate(class_name_15): - class_name2id[class_name] = class_id + 1 - single_cat = { - 'id': class_id + 1, - 'name': class_name, - 'supercategory': class_name - } - data_dict['categories'].append(single_cat) - - for image_id, img_path in enumerate(img_lists): - single_image = {} - basename = osp.basename(img_path) - single_image['file_name'] = basename - single_image['id'] = image_id - img = cv2.imread(img_path) - height, width, _ = img.shape - single_image['width'] = width - single_image['height'] = height - # add image - data_dict['images'].append(single_image) - - # annotations - anno_txt_path = osp.join(txt_dir, osp.splitext(basename)[0] + '.txt') - if not osp.exists(anno_txt_path): - logger.warning('path of {} not exists'.format(anno_txt_path)) - - for line in open(anno_txt_path): - line = line.strip() - # skip - if line.find('imagesource') >= 0 or line.find('gsd') >= 0: - continue - - # x1,y1,x2,y2,x3,y3,x4,y4 class_name, is_different - single_obj_anno = line.split(' ') - assert len(single_obj_anno) == 10 - single_obj_poly = [float(e) for e in single_obj_anno[0:8]] - single_obj_classname = single_obj_anno[8] - single_obj_different = int(single_obj_anno[9]) - - single_obj = {} - - single_obj['category_id'] = class_name2id[single_obj_classname] - single_obj['segmentation'] = [] - single_obj['segmentation'].append(single_obj_poly) - single_obj['iscrowd'] = 0 - - # rbox or bbox - if is_obb: - polys = [single_obj_poly] - rboxs = poly2rbox(polys) - rbox = rboxs[0].tolist() - single_obj['bbox'] = rbox - single_obj['area'] = rbox[2] * rbox[3] - else: - xmin, ymin, xmax, ymax = min(single_obj_poly[0::2]), min(single_obj_poly[1::2]), \ - max(single_obj_poly[0::2]), max(single_obj_poly[1::2]) - - width, height = xmax - xmin, ymax - ymin - single_obj['bbox'] = xmin, ymin, width, height - single_obj['area'] = width * height - - single_obj['image_id'] = image_id - data_dict['annotations'].append(single_obj) - single_obj['id'] = inst_count - inst_count = inst_count + 1 - # add annotation - data_dict['annotations'].append(single_obj) - - with open(json_path, 'w') as f: - json.dump(data_dict, f) - - -if __name__ == '__main__': - parser = argparse.ArgumentParser(description='dota anno to coco') - parser.add_argument('--images_dir', help='path_to_images') - parser.add_argument('--label_dir', help='path_to_labelTxt', type=str) - parser.add_argument( - '--json_path', - help='save json path', - type=str, - default='dota_coco.json') - parser.add_argument( - '--is_obb', help='is_obb or not', type=bool, default=True) - parser.add_argument( - '--dota_version', - help='dota_version, v1.0 or v1.5 or v2.0', - type=str, - default='v1.0') - - args = parser.parse_args() - - # process - dota_2_coco(args.images_dir, args.label_dir, args.json_path, args.is_obb, - args.dota_version) - print('done!') diff --git a/ppdet/data/source/coco.py b/ppdet/data/source/coco.py index 95a51deeb82a4679d6d658518354afd91abefa52..80bd48a4140b73ba20210f24022ec6a507e1e86e 100644 --- a/ppdet/data/source/coco.py +++ b/ppdet/data/source/coco.py @@ -145,25 +145,14 @@ class COCODataSet(DetDataset): if not any(np.array(inst['bbox'])): continue - # read rbox anno or not - is_rbox_anno = True if len(inst['bbox']) == 5 else False - if is_rbox_anno: - xc, yc, box_w, box_h, angle = inst['bbox'] - x1 = xc - box_w / 2.0 - y1 = yc - box_h / 2.0 - x2 = x1 + box_w - y2 = y1 + box_h - else: - x1, y1, box_w, box_h = inst['bbox'] - x2 = x1 + box_w - y2 = y1 + box_h + x1, y1, box_w, box_h = inst['bbox'] + x2 = x1 + box_w + y2 = y1 + box_h eps = 1e-5 if inst['area'] > 0 and x2 - x1 > eps and y2 - y1 > eps: inst['clean_bbox'] = [ round(float(x), 3) for x in [x1, y1, x2, y2] ] - if is_rbox_anno: - inst['clean_rbox'] = [xc, yc, box_w, box_h, angle] bboxes.append(inst) else: logger.warning( @@ -178,8 +167,6 @@ class COCODataSet(DetDataset): is_empty = True gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32) - if is_rbox_anno: - gt_rbox = np.zeros((num_bbox, 5), dtype=np.float32) gt_class = np.zeros((num_bbox, 1), dtype=np.int32) is_crowd = np.zeros((num_bbox, 1), dtype=np.int32) gt_poly = [None] * num_bbox @@ -189,13 +176,10 @@ class COCODataSet(DetDataset): catid = box['category_id'] gt_class[i][0] = self.catid2clsid[catid] gt_bbox[i, :] = box['clean_bbox'] - # xc, yc, w, h, theta - if is_rbox_anno: - gt_rbox[i, :] = box['clean_rbox'] is_crowd[i][0] = box['iscrowd'] # check RLE format if 'segmentation' in box and box['iscrowd'] == 1: - gt_poly[i] = [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]] + gt_poly[i] = [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]] elif 'segmentation' in box and box['segmentation']: if not np.array(box['segmentation'] ).size > 0 and not self.allow_empty: @@ -212,21 +196,12 @@ class COCODataSet(DetDataset): gt_poly) and not self.allow_empty: continue - if is_rbox_anno: - gt_rec = { - 'is_crowd': is_crowd, - 'gt_class': gt_class, - 'gt_bbox': gt_bbox, - 'gt_rbox': gt_rbox, - 'gt_poly': gt_poly, - } - else: - gt_rec = { - 'is_crowd': is_crowd, - 'gt_class': gt_class, - 'gt_bbox': gt_bbox, - 'gt_poly': gt_poly, - } + gt_rec = { + 'is_crowd': is_crowd, + 'gt_class': gt_class, + 'gt_bbox': gt_bbox, + 'gt_poly': gt_poly, + } for k, v in gt_rec.items(): if k in self.data_fields: diff --git a/ppdet/data/transform/op_helper.py b/ppdet/data/transform/op_helper.py index eeb1525410b053223dd7c7c8c0a888a65cf4eb95..6c400306da8ec3ff605c0efac3e725ffd2e267a3 100644 --- a/ppdet/data/transform/op_helper.py +++ b/ppdet/data/transform/op_helper.py @@ -492,72 +492,3 @@ def get_border(border, size): while size - border // i <= border // i: i *= 2 return border // i - - -def norm_angle(angle, range=[-np.pi / 4, np.pi]): - return (angle - range[0]) % range[1] + range[0] - - -def poly2rbox_le135(poly): - """convert poly to rbox [-pi / 4, 3 * pi / 4] - - Args: - poly: [x1, y1, x2, y2, x3, y3, x4, y4] - - Returns: - rbox: [cx, cy, w, h, angle] - """ - poly = np.array(poly[:8], dtype=np.float32) - - pt1 = (poly[0], poly[1]) - pt2 = (poly[2], poly[3]) - pt3 = (poly[4], poly[5]) - pt4 = (poly[6], poly[7]) - - edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[1]) * - (pt1[1] - pt2[1])) - edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[1]) * - (pt2[1] - pt3[1])) - - width = max(edge1, edge2) - height = min(edge1, edge2) - - rbox_angle = 0 - if edge1 > edge2: - rbox_angle = np.arctan2(float(pt2[1] - pt1[1]), float(pt2[0] - pt1[0])) - elif edge2 >= edge1: - rbox_angle = np.arctan2(float(pt4[1] - pt1[1]), float(pt4[0] - pt1[0])) - - rbox_angle = norm_angle(rbox_angle) - - x_ctr = float(pt1[0] + pt3[0]) / 2 - y_ctr = float(pt1[1] + pt3[1]) / 2 - return x_ctr, y_ctr, width, height, rbox_angle - - -def poly2rbox_oc(poly): - """convert poly to rbox (0, pi / 2] - - Args: - poly: [x1, y1, x2, y2, x3, y3, x4, y4] - - Returns: - rbox: [cx, cy, w, h, angle] - """ - points = np.array(poly, dtype=np.float32).reshape((-1, 2)) - (cx, cy), (w, h), angle = cv2.minAreaRect(points) - # using the new OpenCV Rotated BBox definition since 4.5.1 - # if angle < 0, opencv is older than 4.5.1, angle is in [-90, 0) - if angle < 0: - angle += 90 - w, h = h, w - - # convert angle to [0, 90) - if angle == -0.0: - angle = 0.0 - if angle == 90.0: - angle = 0.0 - w, h = h, w - - angle = angle / 180 * np.pi - return cx, cy, w, h, angle diff --git a/ppdet/data/transform/rotated_operators.py b/ppdet/data/transform/rotated_operators.py index ede34d6399cb0291db82fc3a1d62dd516f11b306..99dfb572e6c730d651b5169c74cdecaffce8fe7b 100644 --- a/ppdet/data/transform/rotated_operators.py +++ b/ppdet/data/transform/rotated_operators.py @@ -29,8 +29,7 @@ import math import copy from .operators import register_op, BaseOperator -from .op_helper import poly2rbox_le135, poly2rbox_oc -from ppdet.modeling import bbox_utils +from ppdet.modeling.rbox_utils import poly2rbox_le135_np, poly2rbox_oc_np, rbox2poly_np from ppdet.utils.logger import setup_logger logger = setup_logger(__name__) @@ -195,7 +194,7 @@ class Poly2RBox(BaseOperator): def __init__(self, filter_threshold=4, filter_mode=None, rbox_type='le135'): super(Poly2RBox, self).__init__() self.filter_fn = lambda size: self.filter(size, filter_threshold, filter_mode) - self.rbox_fn = poly2rbox_le135 if rbox_type == 'le135' else poly2rbox_oc + self.rbox_fn = poly2rbox_le135_np if rbox_type == 'le135' else poly2rbox_oc_np def filter(self, size, threshold, mode): if mode == 'area': @@ -248,7 +247,6 @@ class Poly2Array(BaseOperator): def apply(self, sample, context=None): if 'gt_poly' in sample: - logger.info('gt_poly shape: {}'.format(sample['gt_poly'])) sample['gt_poly'] = np.array( sample['gt_poly'], dtype=np.float32).reshape((-1, 8)) @@ -472,16 +470,10 @@ class Rbox2Poly(BaseOperator): def apply(self, sample, context=None): assert 'gt_rbox' in sample assert sample['gt_rbox'].shape[1] == 5 - rrects = sample['gt_rbox'] - x_ctr = rrects[:, 0] - y_ctr = rrects[:, 1] - width = rrects[:, 2] - height = rrects[:, 3] - x1 = x_ctr - width / 2.0 - y1 = y_ctr - height / 2.0 - x2 = x_ctr + width / 2.0 - y2 = y_ctr + height / 2.0 - sample['gt_bbox'] = np.stack([x1, y1, x2, y2], axis=1) - polys = bbox_utils.rbox2poly_np(rrects) + rboxes = sample['gt_rbox'] + polys = rbox2poly_np(rboxes) sample['gt_poly'] = polys + xmin, ymin = polys[:, 0::2].min(1), polys[:, 1::2].min(1) + xmax, ymax = polys[:, 0::2].max(1), polys[:, 1::2].max(1) + sample['gt_bbox'] = np.stack([xmin, ymin, xmin, ymin], axis=1) return sample diff --git a/ppdet/engine/trainer.py b/ppdet/engine/trainer.py index 803306e284cd03fd171566cbfa45bac48a7a034a..a685613edaa66ff8f88c82ff3ce54bb048dd134d 100644 --- a/ppdet/engine/trainer.py +++ b/ppdet/engine/trainer.py @@ -268,11 +268,7 @@ class Trainer(object): output_eval = self.cfg['output_eval'] \ if 'output_eval' in self.cfg else None save_prediction_only = self.cfg.get('save_prediction_only', False) - - # pass clsid2catid info to metric instance to avoid multiple loading - # annotation file - clsid2catid = {v: k for k, v in self.dataset.catid2clsid.items()} \ - if self.mode == 'eval' else None + imid2path = self.cfg.get('imid2path', None) # when do validation in train, annotation file should be get from # EvalReader instead of self.dataset(which is TrainReader) @@ -285,11 +281,11 @@ class Trainer(object): self._metrics = [ RBoxMetric( anno_file=anno_file, - clsid2catid=clsid2catid, classwise=classwise, output_eval=output_eval, bias=bias, - save_prediction_only=save_prediction_only) + save_prediction_only=save_prediction_only, + imid2path=imid2path) ] elif self.cfg.metric == 'VOC': output_eval = self.cfg['output_eval'] \ @@ -810,10 +806,16 @@ class Trainer(object): images, draw_threshold=0.5, output_dir='output', - save_results=False): + save_results=False, + visualize=True): + if not os.path.exists(output_dir): + os.makedirs(output_dir) + self.dataset.set_images(images) loader = create('TestReader')(self.dataset, 0) + imid2path = self.dataset.get_imid2path() + def setup_metrics_for_loader(): # mem metrics = copy.deepcopy(self._metrics) @@ -827,6 +829,7 @@ class Trainer(object): self.mode = '_test' self.cfg['save_prediction_only'] = True self.cfg['output_eval'] = output_dir + self.cfg['imid2path'] = imid2path self._init_metrics() # restore @@ -839,6 +842,8 @@ class Trainer(object): if output_eval is not None: self.cfg['output_eval'] = output_eval + self.cfg.pop('imid2path') + _metrics = copy.deepcopy(self._metrics) self._metrics = metrics @@ -849,8 +854,6 @@ class Trainer(object): else: metrics = [] - imid2path = self.dataset.get_imid2path() - anno_file = self.dataset.get_anno() clsid2catid, catid2name = get_categories( self.cfg.metric, anno_file=anno_file) @@ -889,46 +892,46 @@ class Trainer(object): _m.accumulate() _m.reset() - for outs in results: - batch_res = get_infer_results(outs, clsid2catid) - bbox_num = outs['bbox_num'] - - start = 0 - for i, im_id in enumerate(outs['im_id']): - image_path = imid2path[int(im_id)] - image = Image.open(image_path).convert('RGB') - image = ImageOps.exif_transpose(image) - self.status['original_image'] = np.array(image.copy()) - - end = start + bbox_num[i] - bbox_res = batch_res['bbox'][start:end] \ - if 'bbox' in batch_res else None - mask_res = batch_res['mask'][start:end] \ - if 'mask' in batch_res else None - segm_res = batch_res['segm'][start:end] \ - if 'segm' in batch_res else None - keypoint_res = batch_res['keypoint'][start:end] \ - if 'keypoint' in batch_res else None - image = visualize_results( - image, bbox_res, mask_res, segm_res, keypoint_res, - int(im_id), catid2name, draw_threshold) - self.status['result_image'] = np.array(image.copy()) - if self._compose_callback: - self._compose_callback.on_step_end(self.status) - # save image with detection - save_name = self._get_save_image_name(output_dir, image_path) - logger.info("Detection bbox results save in {}".format( - save_name)) - image.save(save_name, quality=95) - - start = end + if visualize: + for outs in results: + batch_res = get_infer_results(outs, clsid2catid) + bbox_num = outs['bbox_num'] + + start = 0 + for i, im_id in enumerate(outs['im_id']): + image_path = imid2path[int(im_id)] + image = Image.open(image_path).convert('RGB') + image = ImageOps.exif_transpose(image) + self.status['original_image'] = np.array(image.copy()) + + end = start + bbox_num[i] + bbox_res = batch_res['bbox'][start:end] \ + if 'bbox' in batch_res else None + mask_res = batch_res['mask'][start:end] \ + if 'mask' in batch_res else None + segm_res = batch_res['segm'][start:end] \ + if 'segm' in batch_res else None + keypoint_res = batch_res['keypoint'][start:end] \ + if 'keypoint' in batch_res else None + image = visualize_results( + image, bbox_res, mask_res, segm_res, keypoint_res, + int(im_id), catid2name, draw_threshold) + self.status['result_image'] = np.array(image.copy()) + if self._compose_callback: + self._compose_callback.on_step_end(self.status) + # save image with detection + save_name = self._get_save_image_name(output_dir, + image_path) + logger.info("Detection bbox results save in {}".format( + save_name)) + image.save(save_name, quality=95) + + start = end def _get_save_image_name(self, output_dir, image_path): """ Get save image name from source image path. """ - if not os.path.exists(output_dir): - os.makedirs(output_dir) image_name = os.path.split(image_path)[-1] name, ext = os.path.splitext(image_name) return os.path.join(output_dir, "{}".format(name)) + ext diff --git a/ppdet/metrics/map_utils.py b/ppdet/metrics/map_utils.py index 534c3b4905ddf0764c2f3e1a03da4b4b73bd257c..57f12d9e2d2c2f4001de5eae3477fdabb2a94744 100644 --- a/ppdet/metrics/map_utils.py +++ b/ppdet/metrics/map_utils.py @@ -22,7 +22,7 @@ import sys import numpy as np import itertools import paddle -from ppdet.modeling.bbox_utils import poly2rbox, rbox2poly_np +from ppdet.modeling.rbox_utils import poly2rbox_np from ppdet.utils.logger import setup_logger logger = setup_logger(__name__) @@ -91,15 +91,13 @@ def jaccard_overlap(pred, gt, is_bbox_normalized=False): return overlap -def calc_rbox_iou(pred, gt_rbox): +def calc_rbox_iou(pred, gt_poly): """ calc iou between rotated bbox """ # calc iou of bounding box for speedup - pred = np.array(pred, np.float32).reshape(-1, 8) - pred = pred.reshape(-1, 2) - gt_poly = rbox2poly_np(np.array(gt_rbox).reshape(-1, 5))[0] - gt_poly = gt_poly.reshape(-1, 2) + pred = np.array(pred, np.float32).reshape(-1, 2) + gt_poly = np.array(gt_poly, np.float32).reshape(-1, 2) pred_rect = [ np.min(pred[:, 0]), np.min(pred[:, 1]), np.max(pred[:, 0]), np.max(pred[:, 1]) @@ -114,12 +112,8 @@ def calc_rbox_iou(pred, gt_rbox): return iou # calc rbox iou - pred = pred.reshape(-1, 8) - - pred = np.array(pred, np.float32).reshape(-1, 8) - pred_rbox = poly2rbox(pred) - pred_rbox = pred_rbox.reshape(-1, 5) - pred_rbox = pred_rbox.reshape(-1, 5) + pred_rbox = poly2rbox_np(pred.reshape(-1, 8)).reshape(-1, 5) + gt_rbox = poly2rbox_np(gt_poly.reshape(-1, 8)).reshape(-1, 5) try: from ext_op import rbox_iou except Exception as e: @@ -127,7 +121,6 @@ def calc_rbox_iou(pred, gt_rbox): "following ppdet/ext_op/README.md", e) sys.stdout.flush() sys.exit(-1) - gt_rbox = np.array(gt_rbox, np.float32).reshape(-1, 5) pd_gt_rbox = paddle.to_tensor(gt_rbox, dtype='float32') pd_pred_rbox = paddle.to_tensor(pred_rbox, dtype='float32') iou = rbox_iou(pd_gt_rbox, pd_pred_rbox) @@ -211,7 +204,7 @@ class DetectionMAP(object): max_overlap = -1.0 for i, gl in enumerate(gt_label): if int(gl) == int(l): - if len(gt_box[i]) == 5: + if len(gt_box[i]) == 8: overlap = calc_rbox_iou(pred, gt_box[i]) else: overlap = jaccard_overlap(pred, gt_box[i], diff --git a/ppdet/metrics/metrics.py b/ppdet/metrics/metrics.py index ace0944e3cf98a6098b07af2ef4b309ad7e57755..7f4f5f1fbdffab1d434488bcff13646da95695a8 100644 --- a/ppdet/metrics/metrics.py +++ b/ppdet/metrics/metrics.py @@ -22,12 +22,14 @@ import json import paddle import numpy as np import typing +from collections import defaultdict from pathlib import Path from .map_utils import prune_zero_padding, DetectionMAP from .coco_utils import get_infer_results, cocoapi_eval from .widerface_utils import face_eval_run from ppdet.data.source.category import get_categories +from ppdet.modeling.rbox_utils import poly2rbox_np from ppdet.utils.logger import setup_logger logger = setup_logger(__name__) @@ -356,6 +358,7 @@ class RBoxMetric(Metric): self.overlap_thresh = kwargs.get('overlap_thresh', 0.5) self.map_type = kwargs.get('map_type', '11point') self.evaluate_difficult = kwargs.get('evaluate_difficult', False) + self.imid2path = kwargs.get('imid2path', None) class_num = len(self.catid2name) self.detection_map = DetectionMAP( class_num=class_num, @@ -388,11 +391,21 @@ class RBoxMetric(Metric): if self.save_prediction_only: return - gt_boxes = inputs['gt_rbox'] + gt_boxes = inputs['gt_poly'] gt_labels = inputs['gt_class'] + + if 'scale_factor' in inputs: + scale_factor = inputs['scale_factor'].numpy() if isinstance( + inputs['scale_factor'], + paddle.Tensor) else inputs['scale_factor'] + else: + scale_factor = np.ones((gt_boxes.shape[0], 2)).astype('float32') + for i in range(len(gt_boxes)): gt_box = gt_boxes[i].numpy() if isinstance( gt_boxes[i], paddle.Tensor) else gt_boxes[i] + h, w = scale_factor[i] + gt_box = gt_box / np.array([w, h, w, h, w, h, w, h]) gt_label = gt_labels[i].numpy() if isinstance( gt_labels[i], paddle.Tensor) else gt_labels[i] gt_box, gt_label, _ = prune_zero_padding(gt_box, gt_label) @@ -411,21 +424,41 @@ class RBoxMetric(Metric): ] self.detection_map.update(bbox, score, label, gt_box, gt_label) - def accumulate(self): - if len(self.results) > 0: - output = "bbox.json" - if self.output_eval: - output = os.path.join(self.output_eval, output) + def save_results(self, results, output_dir, imid2path): + if imid2path: + data_dicts = defaultdict(list) + for result in results: + image_id = result['image_id'] + data_dicts[image_id].append(result) + + for image_id, image_path in imid2path.items(): + basename = os.path.splitext(os.path.split(image_path)[-1])[0] + output = os.path.join(output_dir, "{}.txt".format(basename)) + dets = data_dicts.get(image_id, []) + with open(output, 'w') as f: + for det in dets: + catid, bbox, score = det['category_id'], det[ + 'bbox'], det['score'] + bbox_pred = '{} {} '.format(self.catid2name[catid], + score) + ' '.join( + [str(e) for e in bbox]) + f.write(bbox_pred + '\n') + + logger.info('The bbox result is saved to {}.'.format(output_dir)) + else: + output = os.path.join(output_dir, "bbox.json") with open(output, 'w') as f: - json.dump(self.results, f) - logger.info('The bbox result is saved to bbox.json.') + json.dump(results, f) - if self.save_prediction_only: - logger.info('The bbox result is saved to {} and do not ' - 'evaluate the mAP.'.format(output)) - else: - logger.info("Accumulating evaluatation results...") - self.detection_map.accumulate() + logger.info('The bbox result is saved to {}.'.format(output)) + + def accumulate(self): + if self.output_eval: + self.save_results(self.results, self.output_eval, self.imid2path) + + if not self.save_prediction_only: + logger.info("Accumulating evaluatation results...") + self.detection_map.accumulate() def log(self): map_stat = 100. * self.detection_map.get_map() diff --git a/ppdet/modeling/__init__.py b/ppdet/modeling/__init__.py index cdcb5d1bf08d813257dc577366de2efa9da9add7..ded7c8fb8c77aa462b59b051427226120cd80dd1 100644 --- a/ppdet/modeling/__init__.py +++ b/ppdet/modeling/__init__.py @@ -29,6 +29,7 @@ from . import reid from . import mot from . import transformers from . import assigners +from . import rbox_utils from .ops import * from .backbones import * @@ -43,3 +44,4 @@ from .reid import * from .mot import * from .transformers import * from .assigners import * +from .rbox_utils import * diff --git a/ppdet/modeling/bbox_utils.py b/ppdet/modeling/bbox_utils.py index f895340c7e8da8606bfd0f55b1e9b84d36bfd549..abf760abaadcd36a5bbca4df37cf49b6caded30a 100644 --- a/ppdet/modeling/bbox_utils.py +++ b/ppdet/modeling/bbox_utils.py @@ -359,295 +359,6 @@ def bbox_iou(box1, box2, giou=False, diou=False, ciou=False, eps=1e-9): return iou -def rect2rbox(bboxes): - """ - :param bboxes: shape (n, 4) (xmin, ymin, xmax, ymax) - :return: dbboxes: shape (n, 5) (x_ctr, y_ctr, w, h, angle) - """ - bboxes = bboxes.reshape(-1, 4) - num_boxes = bboxes.shape[0] - - x_ctr = (bboxes[:, 2] + bboxes[:, 0]) / 2.0 - y_ctr = (bboxes[:, 3] + bboxes[:, 1]) / 2.0 - edges1 = np.abs(bboxes[:, 2] - bboxes[:, 0]) - edges2 = np.abs(bboxes[:, 3] - bboxes[:, 1]) - angles = np.zeros([num_boxes], dtype=bboxes.dtype) - - inds = edges1 < edges2 - - rboxes = np.stack((x_ctr, y_ctr, edges1, edges2, angles), axis=1) - rboxes[inds, 2] = edges2[inds] - rboxes[inds, 3] = edges1[inds] - rboxes[inds, 4] = np.pi / 2.0 - return rboxes - - -def delta2rbox(rrois, - deltas, - means=[0, 0, 0, 0, 0], - stds=[1, 1, 1, 1, 1], - wh_ratio_clip=1e-6): - """ - :param rrois: (cx, cy, w, h, theta) - :param deltas: (dx, dy, dw, dh, dtheta) - :param means: - :param stds: - :param wh_ratio_clip: - :return: - """ - means = paddle.to_tensor(means) - stds = paddle.to_tensor(stds) - deltas = paddle.reshape(deltas, [-1, deltas.shape[-1]]) - denorm_deltas = deltas * stds + means - - dx = denorm_deltas[:, 0] - dy = denorm_deltas[:, 1] - dw = denorm_deltas[:, 2] - dh = denorm_deltas[:, 3] - dangle = denorm_deltas[:, 4] - - max_ratio = np.abs(np.log(wh_ratio_clip)) - dw = paddle.clip(dw, min=-max_ratio, max=max_ratio) - dh = paddle.clip(dh, min=-max_ratio, max=max_ratio) - - rroi_x = rrois[:, 0] - rroi_y = rrois[:, 1] - rroi_w = rrois[:, 2] - rroi_h = rrois[:, 3] - rroi_angle = rrois[:, 4] - - gx = dx * rroi_w * paddle.cos(rroi_angle) - dy * rroi_h * paddle.sin( - rroi_angle) + rroi_x - gy = dx * rroi_w * paddle.sin(rroi_angle) + dy * rroi_h * paddle.cos( - rroi_angle) + rroi_y - gw = rroi_w * dw.exp() - gh = rroi_h * dh.exp() - ga = np.pi * dangle + rroi_angle - ga = (ga + np.pi / 4) % np.pi - np.pi / 4 - ga = paddle.to_tensor(ga) - - gw = paddle.to_tensor(gw, dtype='float32') - gh = paddle.to_tensor(gh, dtype='float32') - bboxes = paddle.stack([gx, gy, gw, gh, ga], axis=-1) - return bboxes - - -def rbox2delta(proposals, gt, means=[0, 0, 0, 0, 0], stds=[1, 1, 1, 1, 1]): - """ - - Args: - proposals: - gt: - means: 1x5 - stds: 1x5 - - Returns: - - """ - proposals = proposals.astype(np.float64) - - PI = np.pi - - gt_widths = gt[..., 2] - gt_heights = gt[..., 3] - gt_angle = gt[..., 4] - - proposals_widths = proposals[..., 2] - proposals_heights = proposals[..., 3] - proposals_angle = proposals[..., 4] - - coord = gt[..., 0:2] - proposals[..., 0:2] - dx = (np.cos(proposals[..., 4]) * coord[..., 0] + np.sin(proposals[..., 4]) - * coord[..., 1]) / proposals_widths - dy = (-np.sin(proposals[..., 4]) * coord[..., 0] + np.cos(proposals[..., 4]) - * coord[..., 1]) / proposals_heights - dw = np.log(gt_widths / proposals_widths) - dh = np.log(gt_heights / proposals_heights) - da = (gt_angle - proposals_angle) - - da = (da + PI / 4) % PI - PI / 4 - da /= PI - - deltas = np.stack([dx, dy, dw, dh, da], axis=-1) - means = np.array(means, dtype=deltas.dtype) - stds = np.array(stds, dtype=deltas.dtype) - deltas = (deltas - means) / stds - deltas = deltas.astype(np.float32) - return deltas - - -def bbox_decode(bbox_preds, - anchors, - means=[0, 0, 0, 0, 0], - stds=[1, 1, 1, 1, 1]): - """decode bbox from deltas - Args: - bbox_preds: [N,H,W,5] - anchors: [H*W,5] - return: - bboxes: [N,H,W,5] - """ - means = paddle.to_tensor(means) - stds = paddle.to_tensor(stds) - num_imgs, H, W, _ = bbox_preds.shape - bboxes_list = [] - for img_id in range(num_imgs): - bbox_pred = bbox_preds[img_id] - # bbox_pred.shape=[5,H,W] - bbox_delta = bbox_pred - anchors = paddle.to_tensor(anchors) - bboxes = delta2rbox( - anchors, bbox_delta, means, stds, wh_ratio_clip=1e-6) - bboxes = paddle.reshape(bboxes, [H, W, 5]) - bboxes_list.append(bboxes) - return paddle.stack(bboxes_list, axis=0) - - -def poly2rbox(polys): - """ - poly:[x0,y0,x1,y1,x2,y2,x3,y3] - to - rotated_boxes:[x_ctr,y_ctr,w,h,angle] - """ - rotated_boxes = [] - for poly in polys: - poly = np.array(poly[:8], dtype=np.float32) - - pt1 = (poly[0], poly[1]) - pt2 = (poly[2], poly[3]) - pt3 = (poly[4], poly[5]) - pt4 = (poly[6], poly[7]) - - edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[ - 1]) * (pt1[1] - pt2[1])) - edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[ - 1]) * (pt2[1] - pt3[1])) - - width = max(edge1, edge2) - height = min(edge1, edge2) - - rbox_angle = 0 - if edge1 > edge2: - rbox_angle = np.arctan2( - float(pt2[1] - pt1[1]), float(pt2[0] - pt1[0])) - elif edge2 >= edge1: - rbox_angle = np.arctan2( - float(pt4[1] - pt1[1]), float(pt4[0] - pt1[0])) - - def norm_angle(angle, range=[-np.pi / 4, np.pi]): - return (angle - range[0]) % range[1] + range[0] - - rbox_angle = norm_angle(rbox_angle) - - x_ctr = float(pt1[0] + pt3[0]) / 2 - y_ctr = float(pt1[1] + pt3[1]) / 2 - rotated_box = np.array([x_ctr, y_ctr, width, height, rbox_angle]) - rotated_boxes.append(rotated_box) - ret_rotated_boxes = np.array(rotated_boxes) - assert ret_rotated_boxes.shape[1] == 5 - return ret_rotated_boxes - - -def cal_line_length(point1, point2): - import math - return math.sqrt( - math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1], 2)) - - -def get_best_begin_point_single(coordinate): - x1, y1, x2, y2, x3, y3, x4, y4 = coordinate - xmin = min(x1, x2, x3, x4) - ymin = min(y1, y2, y3, y4) - xmax = max(x1, x2, x3, x4) - ymax = max(y1, y2, y3, y4) - combinate = [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], - [[x4, y4], [x1, y1], [x2, y2], [x3, y3]], - [[x3, y3], [x4, y4], [x1, y1], [x2, y2]], - [[x2, y2], [x3, y3], [x4, y4], [x1, y1]]] - dst_coordinate = [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]] - force = 100000000.0 - force_flag = 0 - for i in range(4): - temp_force = cal_line_length(combinate[i][0], dst_coordinate[0]) \ - + cal_line_length(combinate[i][1], dst_coordinate[1]) \ - + cal_line_length(combinate[i][2], dst_coordinate[2]) \ - + cal_line_length(combinate[i][3], dst_coordinate[3]) - if temp_force < force: - force = temp_force - force_flag = i - if force_flag != 0: - pass - return np.array(combinate[force_flag]).reshape(8) - - -def rbox2poly_np(rrects): - """ - rrect:[x_ctr,y_ctr,w,h,angle] - to - poly:[x0,y0,x1,y1,x2,y2,x3,y3] - """ - polys = [] - for i in range(rrects.shape[0]): - rrect = rrects[i] - # x_ctr, y_ctr, width, height, angle = rrect[:5] - x_ctr = rrect[0] - y_ctr = rrect[1] - width = rrect[2] - height = rrect[3] - angle = rrect[4] - tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2 - rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]]) - R = np.array([[np.cos(angle), -np.sin(angle)], - [np.sin(angle), np.cos(angle)]]) - poly = R.dot(rect) - x0, x1, x2, x3 = poly[0, :4] + x_ctr - y0, y1, y2, y3 = poly[1, :4] + y_ctr - poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float32) - poly = get_best_begin_point_single(poly) - polys.append(poly) - polys = np.array(polys) - return polys - - -def rbox2poly(rrects): - """ - rrect:[x_ctr,y_ctr,w,h,angle] - to - poly:[x0,y0,x1,y1,x2,y2,x3,y3] - """ - N = paddle.shape(rrects)[0] - - x_ctr = rrects[:, 0] - y_ctr = rrects[:, 1] - width = rrects[:, 2] - height = rrects[:, 3] - angle = rrects[:, 4] - - tl_x, tl_y, br_x, br_y = -width * 0.5, -height * 0.5, width * 0.5, height * 0.5 - - normal_rects = paddle.stack( - [tl_x, br_x, br_x, tl_x, tl_y, tl_y, br_y, br_y], axis=0) - normal_rects = paddle.reshape(normal_rects, [2, 4, N]) - normal_rects = paddle.transpose(normal_rects, [2, 0, 1]) - - sin, cos = paddle.sin(angle), paddle.cos(angle) - # M.shape=[N,2,2] - M = paddle.stack([cos, -sin, sin, cos], axis=0) - M = paddle.reshape(M, [2, 2, N]) - M = paddle.transpose(M, [2, 0, 1]) - - # polys:[N,8] - polys = paddle.matmul(M, normal_rects) - polys = paddle.transpose(polys, [2, 1, 0]) - polys = paddle.reshape(polys, [-1, N]) - polys = paddle.transpose(polys, [1, 0]) - - tmp = paddle.stack( - [x_ctr, y_ctr, x_ctr, y_ctr, x_ctr, y_ctr, x_ctr, y_ctr], axis=1) - polys = polys + tmp - return polys - - def bbox_iou_np_expand(box1, box2, x1y1x2y2=True, eps=1e-16): """ Calculate the iou of box1 and box2 with numpy. diff --git a/ppdet/modeling/heads/s2anet_head.py b/ppdet/modeling/heads/s2anet_head.py index 53b16f5af05845964e6bd84bf9db04ba6dae4ef1..8abddcff13540fa06a2a7374ed247ebed0915817 100644 --- a/ppdet/modeling/heads/s2anet_head.py +++ b/ppdet/modeling/heads/s2anet_head.py @@ -20,7 +20,6 @@ import paddle.nn as nn import paddle.nn.functional as F from paddle.nn.initializer import Normal, Constant from ppdet.core.workspace import register -from ppdet.modeling.bbox_utils import rbox2poly from ppdet.modeling.proposal_generator.target_layer import RBoxAssigner from ppdet.modeling.proposal_generator.anchor_generator import S2ANetAnchorGenerator from ppdet.modeling.layers import AlignConv @@ -424,7 +423,7 @@ class S2ANetHead(nn.Layer): mlvl_bboxes = paddle.concat(mlvl_bboxes) mlvl_scores = paddle.concat(mlvl_scores) - mlvl_polys = rbox2poly(mlvl_bboxes).unsqueeze(0) + mlvl_polys = self.rbox2poly(mlvl_bboxes).unsqueeze(0) mlvl_scores = paddle.transpose(mlvl_scores, [1, 0]).unsqueeze(0) bbox, bbox_num, _ = self.nms(mlvl_polys, mlvl_scores) @@ -706,3 +705,41 @@ class S2ANetHead(nn.Layer): ga = (ga + np.pi / 4) % np.pi - np.pi / 4 bboxes = paddle.concat([gx, gy, gw, gh, ga], axis=-1) return bboxes + + def rbox2poly(self, rboxes): + """ + rboxes: [x_ctr,y_ctr,w,h,angle] + to + polys: [x0,y0,x1,y1,x2,y2,x3,y3] + """ + N = paddle.shape(rboxes)[0] + + x_ctr = rboxes[:, 0] + y_ctr = rboxes[:, 1] + width = rboxes[:, 2] + height = rboxes[:, 3] + angle = rboxes[:, 4] + + tl_x, tl_y, br_x, br_y = -width * 0.5, -height * 0.5, width * 0.5, height * 0.5 + + normal_rects = paddle.stack( + [tl_x, br_x, br_x, tl_x, tl_y, tl_y, br_y, br_y], axis=0) + normal_rects = paddle.reshape(normal_rects, [2, 4, N]) + normal_rects = paddle.transpose(normal_rects, [2, 0, 1]) + + sin, cos = paddle.sin(angle), paddle.cos(angle) + # M: [N,2,2] + M = paddle.stack([cos, -sin, sin, cos], axis=0) + M = paddle.reshape(M, [2, 2, N]) + M = paddle.transpose(M, [2, 0, 1]) + + # polys: [N,8] + polys = paddle.matmul(M, normal_rects) + polys = paddle.transpose(polys, [2, 1, 0]) + polys = paddle.reshape(polys, [-1, N]) + polys = paddle.transpose(polys, [1, 0]) + + tmp = paddle.stack( + [x_ctr, y_ctr, x_ctr, y_ctr, x_ctr, y_ctr, x_ctr, y_ctr], axis=1) + polys = polys + tmp + return polys diff --git a/ppdet/modeling/post_process.py b/ppdet/modeling/post_process.py index ceee73a79582d88ccaaf6fe92b2e0045669cd03f..9096d124f04f99a598d433e793d3e6d258e3c86d 100644 --- a/ppdet/modeling/post_process.py +++ b/ppdet/modeling/post_process.py @@ -17,7 +17,7 @@ import paddle import paddle.nn as nn import paddle.nn.functional as F from ppdet.core.workspace import register -from ppdet.modeling.bbox_utils import nonempty_bbox, rbox2poly +from ppdet.modeling.bbox_utils import nonempty_bbox from ppdet.modeling.layers import TTFBox from .transformers import bbox_cxcywh_to_xyxy try: diff --git a/ppdet/modeling/rbox_utils.py b/ppdet/modeling/rbox_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..19bca8d8a124b66fde39959ff36db4ff24680a58 --- /dev/null +++ b/ppdet/modeling/rbox_utils.py @@ -0,0 +1,159 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import paddle +import numpy as np +import cv2 + + +def norm_angle(angle, range=[-np.pi / 4, np.pi]): + return (angle - range[0]) % range[1] + range[0] + + +# rbox function implemented using numpy +def poly2rbox_le135_np(poly): + """convert poly to rbox [-pi / 4, 3 * pi / 4] + + Args: + poly: [x1, y1, x2, y2, x3, y3, x4, y4] + + Returns: + rbox: [cx, cy, w, h, angle] + """ + poly = np.array(poly[:8], dtype=np.float32) + + pt1 = (poly[0], poly[1]) + pt2 = (poly[2], poly[3]) + pt3 = (poly[4], poly[5]) + pt4 = (poly[6], poly[7]) + + edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[1]) * + (pt1[1] - pt2[1])) + edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[1]) * + (pt2[1] - pt3[1])) + + width = max(edge1, edge2) + height = min(edge1, edge2) + + rbox_angle = 0 + if edge1 > edge2: + rbox_angle = np.arctan2(float(pt2[1] - pt1[1]), float(pt2[0] - pt1[0])) + elif edge2 >= edge1: + rbox_angle = np.arctan2(float(pt4[1] - pt1[1]), float(pt4[0] - pt1[0])) + + rbox_angle = norm_angle(rbox_angle) + + x_ctr = float(pt1[0] + pt3[0]) / 2 + y_ctr = float(pt1[1] + pt3[1]) / 2 + return [x_ctr, y_ctr, width, height, rbox_angle] + + +def poly2rbox_oc_np(poly): + """convert poly to rbox (0, pi / 2] + + Args: + poly: [x1, y1, x2, y2, x3, y3, x4, y4] + + Returns: + rbox: [cx, cy, w, h, angle] + """ + points = np.array(poly, dtype=np.float32).reshape((-1, 2)) + (cx, cy), (w, h), angle = cv2.minAreaRect(points) + # using the new OpenCV Rotated BBox definition since 4.5.1 + # if angle < 0, opencv is older than 4.5.1, angle is in [-90, 0) + if angle < 0: + angle += 90 + w, h = h, w + + # convert angle to [0, 90) + if angle == -0.0: + angle = 0.0 + if angle == 90.0: + angle = 0.0 + w, h = h, w + + angle = angle / 180 * np.pi + return [cx, cy, w, h, angle] + + +def poly2rbox_np(polys, rbox_type='oc'): + """ + polys: [x0,y0,x1,y1,x2,y2,x3,y3] + to + rboxes: [x_ctr,y_ctr,w,h,angle] + """ + assert rbox_type in ['oc', 'le135'], 'only oc or le135 is supported now' + poly2rbox_fn = poly2rbox_oc_np if rbox_type == 'oc' else poly2rbox_le135_np + rboxes = [] + for poly in polys: + x, y, w, h, angle = poly2rbox_fn(poly) + rbox = np.array([x, y, w, h, angle], dtype=np.float32) + rboxes.append(rbox) + + return np.array(rboxes) + + +def cal_line_length(point1, point2): + return math.sqrt( + math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1], 2)) + + +def get_best_begin_point_single(coordinate): + x1, y1, x2, y2, x3, y3, x4, y4 = coordinate + xmin = min(x1, x2, x3, x4) + ymin = min(y1, y2, y3, y4) + xmax = max(x1, x2, x3, x4) + ymax = max(y1, y2, y3, y4) + combinate = [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], + [[x4, y4], [x1, y1], [x2, y2], [x3, y3]], + [[x3, y3], [x4, y4], [x1, y1], [x2, y2]], + [[x2, y2], [x3, y3], [x4, y4], [x1, y1]]] + dst_coordinate = [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]] + force = 100000000.0 + force_flag = 0 + for i in range(4): + temp_force = cal_line_length(combinate[i][0], dst_coordinate[0]) \ + + cal_line_length(combinate[i][1], dst_coordinate[1]) \ + + cal_line_length(combinate[i][2], dst_coordinate[2]) \ + + cal_line_length(combinate[i][3], dst_coordinate[3]) + if temp_force < force: + force = temp_force + force_flag = i + if force_flag != 0: + pass + return np.array(combinate[force_flag]).reshape(8) + + +def rbox2poly_np(rboxes): + """ + rboxes:[x_ctr,y_ctr,w,h,angle] + to + poly:[x0,y0,x1,y1,x2,y2,x3,y3] + """ + polys = [] + for i in range(len(rboxes)): + x_ctr, y_ctr, width, height, angle = rboxes[i][:5] + tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2 + rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]]) + R = np.array([[np.cos(angle), -np.sin(angle)], + [np.sin(angle), np.cos(angle)]]) + poly = R.dot(rect) + x0, x1, x2, x3 = poly[0, :4] + x_ctr + y0, y1, y2, y3 = poly[1, :4] + y_ctr + poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float32) + poly = get_best_begin_point_single(poly) + polys.append(poly) + polys = np.array(polys) + return polys diff --git a/ppdet/utils/download.py b/ppdet/utils/download.py index 71720f5e058df4335c8cde85d63eb615ff20cfca..006733aa79f896e2ba4ebaa8069248c6ac43c1a0 100644 --- a/ppdet/utils/download.py +++ b/ppdet/utils/download.py @@ -96,8 +96,8 @@ DATASETS = { 'https://paddlemodels.bj.bcebos.com/object_detection/roadsign_coco.tar', '49ce5a9b5ad0d6266163cd01de4b018e', ), ], ['annotations', 'images']), 'spine_coco': ([( - 'https://paddledet.bj.bcebos.com/data/spine_coco.tar', - '03030f42d9b6202a6e425d4becefda0d', ), ], ['annotations', 'images']), + 'https://paddledet.bj.bcebos.com/data/spine.tar', + '8a3a353c2c54a2284ad7d2780b65f6a6', ), ], ['annotations', 'images']), 'mot': (), 'objects365': (), 'coco_ce': ([( diff --git a/tools/infer.py b/tools/infer.py index 311cf8cf098a05f910731a7676750f13b4c292ae..d22f206383cdf2ff39c1a53b60373a963c66496a 100755 --- a/tools/infer.py +++ b/tools/infer.py @@ -27,6 +27,7 @@ sys.path.insert(0, parent_path) import warnings warnings.filterwarnings('ignore') import glob +import ast import paddle from ppdet.core.workspace import load_config, merge_config @@ -114,6 +115,11 @@ def parse_args(): type=str, default='iou', help="Combine method matching metric, choose in ['iou', 'ios'].") + parser.add_argument( + "--visualize", + type=ast.literal_eval, + default=True, + help="Whether to save visualize results to output_dir.") args = parser.parse_args() return args @@ -170,13 +176,15 @@ def run(FLAGS, cfg): match_metric=FLAGS.match_metric, draw_threshold=FLAGS.draw_threshold, output_dir=FLAGS.output_dir, - save_results=FLAGS.save_results) + save_results=FLAGS.save_results, + visualize=FLAGS.visualize) else: trainer.predict( images, draw_threshold=FLAGS.draw_threshold, output_dir=FLAGS.output_dir, - save_results=FLAGS.save_results) + save_results=FLAGS.save_results, + visualize=FLAGS.visualize) def main():