未验证 提交 16b4888b 编写于 作者: qq_30618961's avatar qq_30618961 提交者: GitHub

solve readme link useless (#4276)

* slove_readme_link_useless
上级 bd68a7ef
......@@ -198,40 +198,40 @@ The relationship between COCO mAP and FPS on Tesla V100 of representative models
- `PP-YOLO v2` is optimized version of `PP-YOLO` which has mAP of 49.5% and 68.9FPS on Tesla V100
- All these models can be get in [Model Zoo](#ModelZoo)
- All these models can be get in [Model Zoo](#Model-Zoo)
## Tutorials
### Get Started
- [Installation guide](docs/tutorials/INSTALL_en.md)
- [Prepare dataset](docs/tutorials/PrepareDataSet.md)
- [Quick start on PaddleDetection](docs/tutorials/GETTING_STARTED_cn.md)
- [Installation guide](docs/tutorials/INSTALL.md)
- [Prepare dataset](docs/tutorials/PrepareDataSet_en.md)
- [Quick start on PaddleDetection](docs/tutorials/GETTING_STARTED.md)
### Advanced Tutorials
- Parameter configuration
- [Parameter configuration for RCNN model](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md)
- [Parameter configuration for PP-YOLO model](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md)
- [Parameter configuration for RCNN model](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation_en.md)
- [Parameter configuration for PP-YOLO model](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md)
- Model Compression(Based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim))
- [Prune/Quant/Distill](configs/slim)
- Inference and deployment
- [Export model for inference](deploy/EXPORT_MODEL.md)
- [Paddle Inference](deploy/README.md)
- [Export model for inference](deploy/EXPORT_MODEL_en.md)
- [Paddle Inference](deploy/README_en.md)
- [Python inference](deploy/python)
- [C++ inference](deploy/cpp)
- [Paddle-Lite](deploy/lite)
- [Paddle Serving](deploy/serving)
- [Export ONNX model](deploy/EXPORT_ONNX_MODEL.md)
- [Inference benchmark](deploy/BENCHMARK_INFER.md)
- [Export ONNX model](deploy/EXPORT_ONNX_MODEL_en.md)
- [Inference benchmark](deploy/BENCHMARK_INFER_en.md)
- Advanced development
- [New data augmentations](docs/advanced_tutorials/READER.md)
- [New detection algorithms](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- [New data augmentations](docs/advanced_tutorials/READER_en.md)
- [New detection algorithms](docs/advanced_tutorials/MODEL_TECHNICAL_en.md)
## Model Zoo
......@@ -239,15 +239,15 @@ The relationship between COCO mAP and FPS on Tesla V100 of representative models
- Universal object detection
- [Model library and baselines](docs/MODEL_ZOO_cn.md)
- [PP-YOLO](configs/ppyolo/README.md)
- [Enhanced Anchor Free model--TTFNet](configs/ttfnet/README.md)
- [Mobile models](static/configs/mobile/README.md)
- [676 classes of object detection](static/docs/featured_model/LARGE_SCALE_DET_MODEL.md)
- [Two-stage practical PSS-Det](configs/rcnn_enhance/README.md)
- [Enhanced Anchor Free model--TTFNet](configs/ttfnet/README_en.md)
- [Mobile models](static/configs/mobile/README_en.md)
- [676 classes of object detection](static/docs/featured_model/LARGE_SCALE_DET_MODEL_en.md)
- [Two-stage practical PSS-Det](configs/rcnn_enhance/README_en.md)
- [SSLD pretrained models](docs/feature_models/SSLD_PRETRAINED_MODEL_en.md)
- Universal instance segmentation
- [SOLOv2](configs/solov2/README.md)
- Rotation object detection
- [S2ANet](configs/dota/README.md)
- [S2ANet](configs/dota/README_en.md)
- [Keypoint detection](configs/keypoint)
- HigherHRNet
- HRNet
......@@ -257,12 +257,12 @@ The relationship between COCO mAP and FPS on Tesla V100 of representative models
- [JDE](configs/mot/jde/README.md)
- [FairMOT](configs/mot/fairmot/README.md)
- Vertical field
- [Face detection](configs/face_detection/README.md)
- [Face detection](configs/face_detection/README_en.md)
- [Pedestrian detection](configs/pedestrian/README.md)
- [Vehicle detection](configs/vehicle/README.md)
- Competition Plan
- [Objects365 2019 Challenge champion model](static/docs/featured_model/champion_model/CACascadeRCNN.md)
- [Best single model of Open Images 2019-Object Detection](static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
- [Objects365 2019 Challenge champion model](static/docs/featured_model/champion_model/CACascadeRCNN_en.md)
- [Best single model of Open Images 2019-Object Detection](static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL_en.md)
## Applications
......@@ -270,11 +270,11 @@ The relationship between COCO mAP and FPS on Tesla V100 of representative models
## Updates
v2.2 was released at `08/2021`, release Transformer detection models, release Dark HRNet keypoint detection model, release tracking models of head and vehicle, release optimized S2ANet model, inference with batch size > 1 supported for main architectures. Please refer to [change log](docs/CHANGELOG.md) for details.
v2.2 was released at `08/2021`, release Transformer detection models, release Dark HRNet keypoint detection model, release tracking models of head and vehicle, release optimized S2ANet model, inference with batch size > 1 supported for main architectures. Please refer to [change log](docs/CHANGELOG_en.md) for details.
v2.1 was released at `05/2021`, Release Keypoint Detection and Multi-Object Tracking. Release model compression for PPYOLO series. Update documents such as export ONNX model. Please refer to [change log](docs/CHANGELOG.md) for details.
v2.1 was released at `05/2021`, Release Keypoint Detection and Multi-Object Tracking. Release model compression for PPYOLO series. Update documents such as export ONNX model. Please refer to [change log](docs/CHANGELOG_en.md) for details.
v2.0 was released at `04/2021`, fully support dygraph version, which add BlazeFace, PSS-Det and plenty backbones, release `PP-YOLOv2`, `PP-YOLO tiny` and `S2ANet`, support model distillation and VisualDL, add inference benchmark, etc. Please refer to [change log](docs/CHANGELOG.md) for details.
v2.0 was released at `04/2021`, fully support dygraph version, which add BlazeFace, PSS-Det and plenty backbones, release `PP-YOLOv2`, `PP-YOLO tiny` and `S2ANet`, support model distillation and VisualDL, add inference benchmark, etc. Please refer to [change log](docs/CHANGELOG_en.md) for details.
## License
......
# S2ANet Model
## Content
- [S2ANet Model](#s2anet-model)
- [Content](#content)
- [Introduction](#introduction)
- [Prepare Data](#prepare-data)
- [DOTA data](#dota-data)
- [Customize Data](#customize-data)
- [Start Training](#start-training)
- [1. Install the rotating frame IOU and calculate the OP](#1-install-the-rotating-frame-iou-and-calculate-the-op)
- [2. Train](#2-train)
- [3. Evaluation](#3-evaluation)
- [4. Prediction](#4-prediction)
- [5. DOTA Data evaluation](#5-dota-data-evaluation)
- [Model Library](#model-library)
- [S2ANet Model](#s2anet-model-1)
- [Predict Deployment](#predict-deployment)
- [Citations](#citations)
## Introduction
[S2ANet](https://arxiv.org/pdf/2008.09397.pdf) is used to detect rotating frame's model, required use of PaddlePaddle 2.1.1(can be installed using PIP) or proper [develop version](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release).
## Prepare Data
### DOTA data
[DOTA Dataset] is a dataset of object detection in aerial images, which contains 2806 images with a resolution of 4000x4000 per image.
| Data version | categories | images | size | instances | annotation method |
|:--------:|:-------:|:---------:|:---------:| :---------:| :------------: |
| v1.0 | 15 | 2806 | 800~4000 | 118282 | OBB + HBB |
| v1.5 | 16 | 2806 | 800~4000 | 400000 | OBB + HBB |
Note: OBB annotation is an arbitrary quadrilateral; The vertices are arranged in clockwise order. The HBB annotation mode is the outer rectangle of the indicator note example.
There were 2,806 images in the DOTA dataset, including 1,411 images as a training set, 458 images as an evaluation set, and the remaining 937 images as a test set.
If you need to cut the image data, please refer to the [DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit).
After setting `crop_size=1024, stride=824, gap=200` parameters to cut data, there are 15,749 images in the training set, 5,297 images in the evaluation set, and 10,833 images in the test set.
### Customize Data
There are two ways to annotate data:
- The first is a tagging rotating rectangular, can pass rotating rectangular annotation tool [roLabelImg](https://github.com/cgvict/roLabelImg) to describe rotating rectangular box.
- The second is to mark the quadrilateral, through the script into an external rotating rectangle, so that the obtained mark may have a certain error with the real object frame.
Then convert the annotation result into coco annotation format, where each `bbox` is in the format of `[x_center, y_center, width, height, angle]`, where the angle is expressed in radians.
Reference [spinal disk dataset](https://aistudio.baidu.com/aistudio/datasetdetail/85885), we divide dataset into training set (230), the test set (57), data address is: [spine_coco](https://paddledet.bj.bcebos.com/data/spine_coco.tar). The dataset has a small number of images, which can be used to train the S2ANet model quickly.
## Start Training
### 1. Install the rotating frame IOU and calculate the OP
Rotate box IoU calculate [ext_op](../../ppdet/ext_op) is a reference PaddlePaddle [custom external operator](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/new_custom_op.html).
To use the rotating frame IOU to calculate the OP, the following conditions must be met:
- PaddlePaddle >= 2.1.1
- GCC == 8.2
Docker images are recommended[paddle:2.1.1-gpu-cuda10.1-cudnn7](registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.1-cudnn7)
Run the following command to download the image and start the container:
```
sudo nvidia-docker run -it --name paddle_s2anet -v $PWD:/paddle --network=host registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.1-cudnn7 /bin/bash
```
If the PaddlePaddle are installed in the mirror, go to python3.7 and run the following code to check whether the PaddlePaddle are installed properly:
```
import paddle
print(paddle.__version__)
paddle.utils.run_check()
```
enter `ppdet/ext_op` directory, install:
```
python3.7 setup.py install
```
In Windows, perform the following steps to install it:
(1)Visual Studio (version required >= Visual Studio 2015 Update3);
(2)Go to Start --> Visual Studio 2017 --> X64 native Tools command prompt for VS 2017;
(3)Setting Environment Variables:`set DISTUTILS_USE_SDK=1`
(4)Enter `PaddleDetection/ppdet/ext_op` directory,use `python3.7 setup.py install` to install。
After the installation, test whether the custom OP can compile normally and calculate the results:
```
cd PaddleDetecetion/ppdet/ext_op
python3.7 test.py
```
### 2. Train
**Attention:**
In the configuration file, the learning rate is set based on the eight-card GPU training. If the single-card GPU training is used, set the learning rate to 1/8 of the original value.
Single GPU Training
```bash
export CUDA_VISIBLE_DEVICES=0
python3.7 tools/train.py -c configs/dota/s2anet_1x_spine.yml
```
Multiple GPUs Training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python3.7 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/dota/s2anet_1x_spine.yml
```
You can use `--eval`to enable train-by-test.
### 3. Evaluation
```bash
python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams
# Use a trained model to evaluate
python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams
```
**Attention:**
(1) The DOTA dataset is trained together with train and val data as a training set, and the evaluation dataset configuration needs to be customized when evaluating the DOTA dataset.
(2) Bone dataset is transformed from segmented data. As there is little difference between different types of discs for detection tasks, and the score obtained by S2ANET algorithm is low, the default threshold for evaluation is 0.5, a low mAP is normal. You are advised to view the detection result visually.
### 4. Prediction
Executing the following command will save the image prediction results to the `output` folder.
```bash
python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
```
Prediction using models that provide training:
```bash
python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
```
### 5. DOTA Data evaluation
Execute the following command, will save each image prediction result in `output` folder txt text with the same folder name.
```
python3.7 tools/infer.py -c configs/dota/s2anet_alignconv_2x_dota.yml -o weights=./weights/s2anet_alignconv_2x_dota.pdparams --infer_dir=dota_test_images --draw_threshold=0.05 --save_txt=True --output_dir=output
```
Please refer to [DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit) generate assessment files, Assessment file format, please refer to [DOTA Test](http://captain.whu.edu.cn/DOTAweb/tasks.html), and generate the zip file, each class a txt file, every row in the txt file format for: `image_id score x1 y1 x2 y2 x3 y3 x4 y4` You can also reference the `dataset/dota_coco/dota_generate_test_result.py` script to generate an evaluation file and submit it to the server.
## Model Library
### S2ANet Model
| Model | Conv Type | mAP | Model Download | Configuration File |
|:-----------:|:----------:|:--------:| :----------:| :---------: |
| S2ANet | Conv | 71.42 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_conv_2x_dota.yml) |
| S2ANet | AlignConv | 74.0 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_alignconv_2x_dota.yml) |
**Attention:** `multiclass_nms` is used here, which is slightly different from the original author's use of NMS.
## Predict Deployment
The inputs of the `multiclass_nms` operator in Paddle support quadrilateral inputs, so deployment can be done without relying on the rotating frame IOU operator.
Please refer to the deployment tutorial[Predict deployment](../../deploy/README_en.md)
**Attention:** The `is_training` parameter was added to the configuration file because the `paddle.Detach` function would cause the size error of the exported model when it went quiet, and the exported model would need to be set to `False` to predict deployment
## Citations
```
@article{han2021align,
author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Align Deep Features for Oriented Object Detection},
year={2021},
pages={1-11},
doi={10.1109/TGRS.2021.3062048}}
@inproceedings{xia2018dota,
title={DOTA: A large-scale dataset for object detection in aerial images},
author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={3974--3983},
year={2018}
}
```
# Face Detection Model
## Introduction
`face_detection` High efficiency, high speed face detection solutions, including the most advanced models and classic models.
![](../../docs/images/12_Group_Group_12_Group_Group_12_935.jpg)
## Model Library
#### A mAP on the WIDERFACE dataset
| Network structure | size | images/GPUs | Learning rate strategy | Easy/Medium/Hard Set | Prediction delay(SD855)| Model size(MB) | Download | Configuration File |
|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:|
| BlazeFace | 640 | 8 | 1000e | 0.885 / 0.855 / 0.731 | - | 0.472 |[link](https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/develop/configs/face_detection/blazeface_1000e.yml) |
| BlazeFace-FPN-SSH | 640 | 8 | 1000e | 0.907 / 0.883 / 0.793 | - | 0.479 |[link](https://paddledet.bj.bcebos.com/models/blazeface_fpn_ssh_1000e.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/develop/configs/face_detection/blazeface_fpn_ssh_1000e.yml) |
**Attention:**
- We use a multi-scale evaluation strategy to get the mAP in `Easy/Medium/Hard Set`. Please refer to the [evaluation on the WIDER FACE dataset](#Evaluated-on-the-WIDER-FACE-Dataset) for details.
## Quick Start
### Data preparation
We use [WIDER-FACE dataset](http://shuoyang1213.me/WIDERFACE/) for training and model tests, the official web site provides detailed data is introduced.
- WIDER-Face data source:
- Load a dataset of type `wider_face` using the following directory structure:
```
dataset/wider_face/
├── wider_face_split
│ ├── wider_face_train_bbx_gt.txt
│ ├── wider_face_val_bbx_gt.txt
├── WIDER_train
│ ├── images
│ │ ├── 0--Parade
│ │ │ ├── 0_Parade_marchingband_1_100.jpg
│ │ │ ├── 0_Parade_marchingband_1_381.jpg
│ │ │ │ ...
│ │ ├── 10--People_Marching
│ │ │ ...
├── WIDER_val
│ ├── images
│ │ ├── 0--Parade
│ │ │ ├── 0_Parade_marchingband_1_1004.jpg
│ │ │ ├── 0_Parade_marchingband_1_1045.jpg
│ │ │ │ ...
│ │ ├── 10--People_Marching
│ │ │ ...
```
- Manually download the dataset:
To download the WIDER-FACE dataset, run the following command:
```
cd dataset/wider_face && ./download_wider_face.sh
```
### Parameter configuration
The configuration of the base model can be referenced to `configs/face_detection/_base_/blazeface.yml`
Improved model to add FPN and SSH neck structure, configuration files can be referenced to `configs/face_detection/_base_/blazeface_fpn.yml`, You can configure FPN and SSH as required
```yaml
BlazeNet:
blaze_filters: [[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]]
double_blaze_filters: [[48, 24, 96, 2], [96, 24, 96], [96, 24, 96],
[96, 24, 96, 2], [96, 24, 96], [96, 24, 96]]
act: hard_swish #Configure Blaze Block activation function in Backbone. The basic model is Relu. hard_swish is needed to add FPN and SSH
BlazeNeck:
neck_type : fpn_ssh #only_fpn, only_ssh and fpn_ssh
in_channel: [96,96]
```
### Training and Evaluation
The training process and evaluation process methods are consistent with other algorithms, please refer to [GETTING_STARTED_cn.md](../../docs/tutorials/GETTING_STARTED_cn.md)
**Attention:** Face detection models currently do not support training and evaluation.
#### Evaluated on the WIDER-FACE Dataset
- Step 1: Evaluate and generate a result file:
```shell
python -u tools/eval.py -c configs/face_detection/blazeface_1000e.yml \
-o weights=output/blazeface_1000e/model_final \
multi_scale=True
```
Set `multi_scale=True` for multi-scale evaluation. After evaluation, test results in TXT format will be generated in `output/pred`.
- Step 2: Download the official evaluation script and Ground Truth file:
```
wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
unzip eval_tools.zip && rm -f eval_tools.zip
```
- Step 3: Start the evaluation
Method 1: Python evaluation:
```
git clone https://github.com/wondervictor/WiderFace-Evaluation.git
cd WiderFace-Evaluation
# compile
python3 setup.py build_ext --inplace
# Begin to assess
python3 evaluation.py -p /path/to/PaddleDetection/output/pred -g /path/to/eval_tools/ground_truth
```
Method 2: MatLab evaluation:
```
# Change the name of save result path and draw curve in `eval_tools/wider_eval.m`:
pred_dir = './pred';
legend_name = 'Paddle-BlazeFace';
`wider_eval.m` is the main implementation of the evaluation module. Run the following command:
matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
```
## Citations
```
@article{bazarevsky2019blazeface,
title={BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs},
author={Valentin Bazarevsky and Yury Kartynnik and Andrey Vakunov and Karthik Raveendran and Matthias Grundmann},
year={2019},
eprint={1907.05047},
archivePrefix={arXiv},
```
## Practical Server Side Detection
### Introduction
* In recent years, the object detection task in image has been widely concerned by academia and industry. ResNet50vd pretraining model based on SSLD distillation program training in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) (Top1 on ImageNet1k verification set) Acc is 82.39%), combined with the rich operator of PaddleDetection, PaddlePaddle provides a practical server side detection scheme PSS-DET(Practical Server Side Detection). Based on COCO2017 object detection dataset, V100 single gpu prediction speed is 61FPS, COCO mAP can reach 41.2%.
### Model library
| Backbone | Network type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Mask AP | Download | Configuration File |
| :-------------------- | :----------: | :----------------------: | :--------------------: | :-----------------: | :----: | :-----: | :---------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: |
| ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.5 | - | [link](https://paddledet.bj.bcebos.com/models/faster_rcnn_enhance_3x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml) |
此差异已折叠。
# 1. TTFNet
## Introduction
TTFNet is a network used for real-time object detection and friendly to training time. It improves the slow convergence speed of CenterNet and proposes a new method to generate training samples using Gaussian kernel, which effectively eliminates the fuzziness existing in Anchor Free head. At the same time, the simple and lightweight network structure is also easy to expand the task.
**Characteristics:**
The structure is simple, requiring only two heads to detect target position and size, and eliminating time-consuming post-processing operations
The training time is short. Based on DarkNet53 backbone network, V100 8 cards only need 2 hours of training to achieve better model effect
## Model Zoo
| Backbone | Network type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Download | Configuration File |
| :-------- | :----------- | :----------------------: | :--------------------: | :-----------------: | :----: | :------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: |
| DarkNet53 | TTFNet | 12 | 1x | ---- | 33.5 | [link](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) |
# 2. PAFNet
## Introduction
PAFNet (Paddle Anchor Free) is an optimized model of PaddleDetection based on TTF Net, whose accuracy reaches the SOTA level in the Anchor Free field, and meanwhile produces mobile lightweight model PAFNet-Lite
PAFNet series models optimize TTFNet model from the following aspects:
- [CutMix](https://arxiv.org/abs/1905.04899)
- Better backbone network: ResNet50vd-DCN
- Larger training batch size: 8 GPUs, each GPU batch size=18
- Synchronized Batch Normalization
- [Deformable Convolution](https://arxiv.org/abs/1703.06211)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- Better pretraining model
## Model library
| Backbone | Net type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Download | Configuration File |
| :--------- | :------- | :----------------------: | :--------------------: | :-----------------: | :----: | :---------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: |
| ResNet50vd | PAFNet | 18 | 10x | ---- | 39.8 | [link](https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_10x_coco.yml) |
### PAFNet-Lite
| Backbone | Net type | Number of images per GPU | Learning rate strategy | Box AP | kirin 990 delay(ms) | volume(M) | Download | Configuration File |
| :---------- | :---------- | :----------------------: | :--------------------: | :----: | :-------------------: | :---------: | :---------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: |
| MobileNetv3 | PAFNet-Lite | 12 | 20x | 23.9 | 26.00 | 14 | [link](https://paddledet.bj.bcebos.com/models/pafnet_lite_mobilenet_v3_20x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml) |
**Attention:** Due to the overall upgrade of the dynamic graph framework, the weighting model published by PaddleDetection of PAF Net needs to be evaluated with a --bias field, for example
```bash
# Published weights using Paddle Detection
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/pafnet_10x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams --bias
```
## Citations
```
@article{liu2019training,
title = {Training-Time-Friendly Network for Real-Time Object Detection},
author = {Zili Liu, Tu Zheng, Guodong Xu, Zheng Yang, Haifeng Liu, Deng Cai},
journal = {arXiv preprint arXiv:1909.00700},
year = {2019}
}
```
# Inference Benchmark
## 一、Prepare the Environment
- 1、Test Environment:
- CUDA 10.1
- CUDNN 7.6
- TensorRT-6.0.1
- PaddlePaddle v2.0.1
- The GPUS are Tesla V100 and GTX 1080 Ti and Jetson AGX Xavier
- 2、Test Method:
- In order to compare the inference speed of different models, the input shape is 3x640x640, use `demo/000000014439_640x640.jpg`.
- Batch_size=1
- Delete the warmup time of the first 100 rounds and test the average time of 100 rounds in ms/image, including network calculation time and data copy time to CPU.
- Using Fluid C++ prediction engine: including Fluid C++ prediction, Fluid TensorRT prediction, the following test Float32 (FP32) and Float16 (FP16) inference speed.
**Attention:** For TensorRT, please refer to the [TENSOR tutorial](TENSOR_RT.md) for the difference between fixed and dynamic dimensions. Due to the imperfect support for the two-stage model under fixed size, dynamic size test was adopted for the Faster RCNN model. Fixed size and dynamic size do not support exactly the same OP for fusion, so the performance of the same model tested at fixed size and dynamic size may differ slightly.
## 二、Inferring Speed
### 1、Linux System
#### (1)Tesla V100
| Model | backbone | Fixed size or not | The net size | paddle_inference | trt_fp32 | trt_fp16 |
| --------------- | ------------- | ----------------- | ------------ | ---------------- | -------- | -------- |
| Faster RCNN FPN | ResNet50 | no | 640x640 | 27.99 | 26.15 | 21.92 |
| Faster RCNN FPN | ResNet50 | no | 800x1312 | 32.49 | 25.54 | 21.70 |
| YOLOv3 | Mobilenet\_v1 | yes | 608x608 | 9.74 | 8.61 | 6.28 |
| YOLOv3 | Darknet53 | yes | 608x608 | 17.84 | 15.43 | 9.86 |
| PPYOLO | ResNet50 | yes | 608x608 | 20.77 | 18.40 | 13.53 |
| SSD | Mobilenet\_v1 | yes | 300x300 | 5.17 | 4.43 | 4.29 |
| TTFNet | Darknet53 | yes | 512x512 | 10.14 | 8.71 | 5.55 |
| FCOS | ResNet50 | yes | 640x640 | 35.47 | 35.02 | 34.24 |
#### (2)Jetson AGX Xavier
| Model | backbone | Fixed size or not | The net size | paddle_inference | trt_fp32 | trt_fp16 |
| --------------- | ------------- | ----------------- | ------------ | ---------------- | -------- | -------- |
| Faster RCNN FPN | ResNet50 | no | 640x640 | 169.45 | 158.92 | 119.25 |
| Faster RCNN FPN | ResNet50 | no | 800x1312 | 228.07 | 156.39 | 117.03 |
| YOLOv3 | Mobilenet\_v1 | yes | 608x608 | 48.76 | 43.83 | 18.41 |
| YOLOv3 | Darknet53 | yes | 608x608 | 121.61 | 110.30 | 42.38 |
| PPYOLO | ResNet50 | yes | 608x608 | 111.80 | 99.40 | 48.05 |
| SSD | Mobilenet\_v1 | yes | 300x300 | 10.52 | 8.84 | 8.77 |
| TTFNet | Darknet53 | yes | 512x512 | 73.77 | 64.03 | 31.46 |
| FCOS | ResNet50 | yes | 640x640 | 217.11 | 214.38 | 205.78 |
### 2、Windows System
#### (1)GTX 1080Ti
| Model | backbone | Fixed size or not | The net size | paddle_inference | trt_fp32 | trt_fp16 |
| --------------- | ------------- | ----------------- | ------------ | ---------------- | -------- | -------- |
| Faster RCNN FPN | ResNet50 | no | 640x640 | 50.74 | 57.17 | 62.08 |
| Faster RCNN FPN | ResNet50 | no | 800x1312 | 50.31 | 57.61 | 62.05 |
| YOLOv3 | Mobilenet\_v1 | yes | 608x608 | 14.51 | 11.23 | 11.13 |
| YOLOv3 | Darknet53 | yes | 608x608 | 30.26 | 23.92 | 24.02 |
| PPYOLO | ResNet50 | yes | 608x608 | 38.06 | 31.40 | 31.94 |
| SSD | Mobilenet\_v1 | yes | 300x300 | 16.47 | 13.87 | 13.76 |
| TTFNet | Darknet53 | yes | 512x512 | 21.83 | 17.14 | 17.09 |
| FCOS | ResNet50 | yes | 640x640 | 71.88 | 69.93 | 69.52 |
# PaddleDetection Model Export Tutorial
## 一、Model Export
This section describes how to use the `tools/export_model.py` script to export models.
### Export model input and output description
- Input variables and input shapes are as follows:
| Input Name | Input Shape | Meaning |
| :----------: | --------------- | ------------------------------------------------------------------------------------------------------------------------- |
| image | [None, 3, H, W] | Enter the network image. None indicates the Batch dimension. If the input image size is variable length, H and W are None |
| im_shape | [None, 2] | The size of the image after resize is expressed as H,W, and None represents the Batch dimension |
| scale_factor | [None, 2] | The input image size is larger than the real image size, denoted byscale_y, scale_x |
**Attention**For details about the preprocessing method, see the Test Reader section in the configuration file.
-The output of the dynamic and static derived model in Paddle Detection is unified as follows:
- bbox, the output of NMS, in the shape of [N, 6], where N is the number of prediction boxes, and 6 is [class_id, score, x1, y1, x2, y2].
- bbox\_num, Each picture corresponds to the number of prediction boxes. For example, batch size is 2 and the output is [N1, N2], indicating that the first picture contains N1 prediction boxes and the second picture contains N2 prediction boxes, and the total number of prediction boxes is the same as the first dimension N output by NMS
- mask, If the network contains a mask, the mask branch is printed
**Attention**The model-to-static export does not support cases where numpy operations are included in the model structure.
### 2、Start Parameters
| FLAG | USE | DEFAULT | NOTE |
| :----------: | :-----------------------------: | :------------------: | :-------------------------------------------------------------------: |
| -c | Specifying a configuration file | None | |
| --output_dir | Model save path | `./output_inference` | The model is saved in the `output/default_file_name/` path by default |
### 3、Example
Using the trained model for trial use, the script is as follows:
```bash
# The YOLOv3 model is exported
python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \
-o weights=weights/yolov3_darknet53_270e_coco.pdparams
```
The prediction model will be exported to the `inference_model/yolov3_darknet53_270e_coco` directory. `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel` respectively.
### 4、Sets the input size of the export model
When using Fluid TensorRT for prediction, since <= TensorRT 5.1 only supports fixed-length input, the image size of the `data` layer of the saved model needs to be the same as the actual input image size. Fluid C++ prediction engine does not have this limitation. Setting `image_shape` in Test Reader changes the size of the input image in the saved model. The following is an example:
```bash
#Export the YOLOv3 model with the input 3x640x640
python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \
-o weights=weights/yolov3_darknet53_270e_coco.pdparams TestReader.inputs_def.image_shape=[3,640,640]
```
# PaddleDetection Model Export as ONNX Format Tutorial
PaddleDetection Model support is saved in ONNX format and the list of current test support is as follows
| Model | OP Version | NOTE |
| :---- | :----- | :--- |
| YOLOv3 | 11 | Only batch=1 inferring is supported. Model export needs fixed shape |
| PPYOLO | 11 | Only batch=1 inferring is supported. A MatrixNMS will be converted to an NMS with slightly different precision; Model export needs fixed shape |
| PPYOLOv2 | 11 | Only batch=1 inferring is supported. MatrixNMS will be converted to NMS with slightly different precision; Model export needs fixed shape |
| PPYOLO-Tiny | 11 | Only batch=1 inferring is supported. Model export needs fixed shape |
| FCOS | 11 |Only batch=1 inferring is supported |
| PAFNet | 11 |- |
| TTFNet | 11 |-|
| SSD | 11 |Only batch=1 inferring is supported |
The function of saving ONNX is provided by [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX). If there is feedback on related problems during conversion, Communicate with engineers in Paddle2ONNX's Github project via [ISSUE](https://github.com/PaddlePaddle/Paddle2ONNX/issues).
## Export Tutorial
### Step 1. Export the Paddle deployment model
Export procedure reference document[Tutorial on PaddleDetection deployment model export](./EXPORT_MODEL_en.md), take YOLOv3 of COCO dataset training as an example
```
cd PaddleDetection
python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml \
-o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams \
TestReader.inputs_def.image_shape=[3,608,608] \
--output_dir inference_model
```
The derived models were saved in `inference_model/yolov3_darknet53_270e_coco/`, with the structure as follows
```
yolov3_darknet
├── infer_cfg.yml # Model configuration file information
├── model.pdiparams # Static diagram model parameters
├── model.pdiparams.info # Parameter Information is not required
└── model.pdmodel # Static diagram model file
```
> check`TestReader.inputs_def.image_shape`, For YOLO series models, specify this parameter when exporting; otherwise, the conversion fails
### Step 2. Convert the deployment model to ONNX format
Install Paddle2ONNX (version 0.6 or higher)
```
pip install paddle2onnx
```
Use the following command to convert
```
paddle2onnx --model_dir inference_model/yolov3_darknet53_270e_coco \
--model_filename model.pdmodel \
--params_filename model.pdiparams \
--opset_version 11 \
--save_file yolov3.onnx
```
The transformed model is under the current path`yolov3.onnx`
# PaddleDetection Predict deployment
PaddleDetection provides multiple deployment forms of Paddle Inference, Paddle Serving and Paddle-Lite, supports multiple platforms such as server, mobile and embedded, and provides a complete Python and C++ deployment solution
## PaddleDetection This section describes the supported deployment modes
| formalization | language | Tutorial | Equipment/Platform |
| ---------------- | -------- | ----------- | ------------------------- |
| Paddle Inference | Python | Has perfect | Linux(ARM\X86)、Windows |
| Paddle Inference | C++ | Has perfect | Linux(ARM\X86)、Windows |
| Paddle Serving | Python | Has perfect | Linux(ARM\X86)、Windows |
| Paddle-Lite | C++ | Has perfect | Android、IOS、FPGA、RK... |
## 1.Paddle Inference Deployment
### 1.1 The export model
Use the `tools/export_model.py` script to export the model and the configuration file used during deployment. The configuration file name is `infer_cfg.yml`. The model export script is as follows
```bash
# The YOLOv3 model is derived
python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams
```
The prediction model will be exported to the `output_inference/yolov3_mobilenet_v1_roadsign` directory `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`. For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](EXPORT_MODEL_sh.md).
### 1.2 Use Paddle Inference to Make Predictions
* Python deployment supports `CPU`, `GPU` and `XPU` environments, Windows, Linux, and NV Jetson embedded devices. Reference Documentation [Python Deployment](python/README.md)
* C++ deployment supports `CPU`, `GPU` and `XPU` environments, Windows and Linux systems, and NV Jetson embedded devices. Reference documentation [C++ deployment](cpp/README.md)
* PaddleDetection supports TensorRT acceleration. Please refer to the documentation for [TensorRT Predictive Deployment Tutorial](TENSOR_RT.md)
**Attention:** Paddle prediction library version requires >=2.1, and batch_size>1 only supports YOLOv3 and PP-YOLO.
## 2.PaddleServing Deployment
### 2.1 Export model
If you want to export the model in `PaddleServing` format, set `export_serving_model=True`:
```buildoutcfg
python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams --export_serving_model=True
```
The prediction model will be exported to the `output_inference/yolov3_darknet53_270e_coco` directory `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`, `serving_client/` and `serving_server/` folder.
For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](EXPORT_MODEL_en.md).
### 2.2 Predictions are made using Paddle Serving
* [Install PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md#installation)
* [Use PaddleServing](./serving/README.md)
## 3. PaddleLite Deployment
- [Deploy the PaddleDetection model using PaddleLite](./lite/README.md)
- For details, please refer to [Paddle-Lite-Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo) deployment. For more information, please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)
## 4. Benchmark Test
- Using the exported model, run the Benchmark batch test script:
```shell
sh deploy/benchmark/benchmark.sh {model_dir} {model_name}
```
**Attention** If it is a quantitative model, please use the `deploy/benchmark/benchmark_quant.sh` script.
- Export the test result log to Excel:
```
python deploy/benchmark/log_parser_excel.py --log_path=./output_pipeline --output_name=benchmark_excel.xlsx
```
## 5. FAQ
- 1、Can `Paddle 1.8.4` trained models be deployed with `Paddle2.0`?
Paddle 2.0 is compatible with Paddle 1.8.4, so it is ok. However, some models (such as SOLOv2) use the new OP in Paddle 2.0, which is not allowed.
- 2、When compiling for Windows, the prediction library is compiled with VS2015, will it be a problem to choose VS2017 or VS2019?
For compatibility issues with VS, please refer to: [C++ Visual Studio 2015, 2017 and 2019 binary compatibility](https://docs.microsoft.com/zh-cn/cpp/porting/binary-compat-2015-2017?view=msvc-160)
- 3、Does cuDNN 8.0.4 continuously predict memory leaks?
QA tests show that cuDNN 8 series have memory leakage problems in continuous prediction, and cuDNN 8 performance is worse than cuDNN7. CUDA + cuDNN7.6.4 is recommended for deployment.
# Version Update Information
## Last Version Information
### 2.2(08.10/2021)
- Model richness:
- Publish the Transformer test model: DETR, Deformable DETR, Sparse RCNN
- Key point test new Dark model, release Dark HRNet model
- Publish the MPII dataset HRNet keypoint detection model
- Release head and vehicle tracking vertical model
- Model optimization:
- AlignConv optimization model was released by S2ANet, and DOTA dataset mAP was optimized to 74.0
- Predict deployment
- Mainstream models support batch size>1 predictive deployment, including YOLOv3, PP-YOLO, Faster RCNN, SSD, TTFNet, FCOS
- New addition of target tracking models (JDE, Fair Mot, Deep Sort) Python side prediction deployment support, and support for TensorRT prediction
- FairMot joint key point detection model deployment Python side predictive deployment support
- Added support for key point detection model combined with PP-YOLO prediction deployment
- Documents:
- New TensorRT version notes to Windows Predictive Deployment documentation
- FAQ documents are updated
- Problem fixes:
- Fixed PP-YOLO series model training convergence problem
- Fixed the problem of no label data training when batch_size > 1
### 2.1(05.20/2021)
- Model richness enhancement:
- Key point model: HRNet, HigherHRNet
- Publish the multi-target tracking model: DeepSort, FairMot, JDE
- Basic framework Capabilities:
- Supports training without labels
- Forecast deployment:
- Paddle Inference YOLOv3 series model support batch_size>1 prediction
- Rotating frame detection S2ANet model prediction deployment is open
- Incremental quantization model benchmark
- Add dynamic graph model and static graph model: Paddle-Lite demo
- Detection model compression:
- Release PP-YOLO series model compression model
- Documents:
- Update quick start, forecast deployment and other tutorial documentation
- Added ONNX model export tutorial
- Added the mobile deployment document
### 2.0(04.15/2021)
**Description:** Since version 2.0, dynamic graphs are used as the default version of Paddle Detection, the original `dygraph` directory is switched to the root directory, and the original static graph implementation is moved to the `static` directory.
- Enhancement of dynamic graph model richness:
- PP-YOLOv2 and PP-YOLO tiny models were published. The accuracy of PP-YOLOv2 COCO Test dataset reached 49.5%, and the prediction speed of V100 reached 68.9 FPS
- Release the rotary frame detection model S2ANet
- Release the two-phase utility model PSS-Det
- Publish the face detection model Blazeface
- New basic module:
- Added SENet, GhostNet, and Res2Net backbone networks
- Added VisualDL training visualization support
- Added single precision calculation and PR curve drawing function
- The YOLO models support THE NHWC data format
- Forecast deployment:
- Publish forecast benchmark data for major models
- Adaptive to TensorRT6, support TensorRT dynamic size input, support TensorRT int8 quantitative prediction
- 7 types of models including PP-YOLO, YOLOv3, SSD, TTFNet, FCOS, Faster RCNN are deployed in Python/CPP/TRT prediction on Linux, Windows and NV Jetson platforms
- Detection model compression:
- Distillation: Added dynamic map distillation support and released YOLOv3-MobileNetV1 distillation model
- Joint strategy: new dynamic graph prunning + distillation joint strategy compression scheme, and release YOLOv3-MobileNetV1 prunning + distillation compression model
- Problem fix: Fixed dynamic graph quantization model export problem
- Documents:
- New English document of dynamic graph: including homepage document, getting started, quick start, model algorithm, new dataset, etc
- Added both English and Chinese installation documents of dynamic diagrams
- Added configuration file templates and description documents of dynamic graph RCNN series and YOLO series
## Historical Version Information
### 2.0-rc(02.23/2021)
- Enhancement of dynamic graph model richness:
- Optimize networking and training mode of RCNN models, and improve accuracy of RCNN series models (depending on Paddle Develop or version 2.0.1)
- Added support for SSDLite, FCOS, TTFNet, SOLOv2 series models
- Added pedestrian and vehicle vertical object detection models
- New dynamic graph basic module:
- Added MobileNetV3 and HRNet backbone networks
- Improved roi-align calculation logic for RCNN series models (depending on Paddle Develop or version 2.0.1)
- Added support for Synchronized Batch Norm
- Added support for Modulated Deformable Convolution
- Forecast deployment:
- Publish dynamic diagrams in python, C++, and Serving deployment solution and documentation. Support Faster RCNN, Mask RCNN, YOLOv3, PPYOLO, SSD, TTFNet, FCOS, SOLOv2 and other models to predict deployment
- Dynamic graph prediction deployment supports TensorRT mode FP32, FP16 inference acceleration
- Detection model compression:
- Prunning: Added dynamic graph prunning support, and released YOLOv3-MobileNetV1 prunning model
- Quantization: Added quantization support of dynamic graph, and released quantization models of YOLOv3-MobileNetV1 and YOLOv3-MobileNetV3
- Documents:
- New Dynamic Diagram tutorial documentation: includes installation instructions, quick start, data preparation, and training/evaluation/prediction process documentation
- New advanced tutorial documentation for dynamic diagrams: includes documentation for model compression and inference deployment
- Added dynamic graph model library documentation
### v2.0-beta(12.20/2020)
- Dynamic graph support:
- Support for Faster-RCNN, Mask-RCNN, FPN, Cascade Faster/Mask RCNN, YOLOv3 and SSD models, trial version.
- Model upgrade:
- Updated PP-YOLO Mobile-Netv3 large and small models with improved accuracy, and added prunning and distillation models.
- New features:
- Support VisualDL visual data preprocessing pictures.
- Bug fix:
- Fix Blaze Face keypoint prediction bug.
### v0.5.0(11/2020)
- Model richness enhancement:
- SOLOv2 series models were released, in which the SOLOv2-Light-R50-VD-DCN-FPN model achieved 38.6 FPS on a single gpu V100, accelerating by 24%, and the accuracy of COCO verification set reached 38.8%, improving by 2.4 absolute percentage points.
- Added Android mobile terminal detection demo, including SSD, YOLO series model, can directly scan code installation experience.
- Mobile terminal model optimization:
- Added to PACT's new quantization strategy, YOLOv3 Mobilenetv3 is 0.7% better than normal quantization on COCO datasets.
- Ease of use and functional components:
- Enhance the function of generate_proposal_labels operator to avoid nan risk of the model.
- Fixed several problems with deploy python and C++ prediction.
- Unified COCO and VOC datasets under the evaluation process, support the output of a single class of AP and P-R curves.
- PP-YOLO supports rectangular input images.
- Documents:
- Added object detection whole process tutorial, added Jetson platform deployment tutorial.
### v0.4.0(07/2020)
- Model richness enhancement:
- The PPYOLO model was released. The accuracy of COCO dataset reached 45.2%, and the prediction speed of single gpu V100 reached 72.9 FPS, which was better than that of YOL Ov4 model.
- New TTFNet model, base version aligned with competing products, COCO dataset accuracy up to 32.9%.
- New HTC model, base version aligned with competing products, COCO dataset accuracy up to 42.2%.
- BlazeFace key point detection model was added, with an accuracy of 85.2% in Wider-Face's Easy-Set.
- ACFPN model was added, and the accuracy of COCO dataset reached 39.6%.
- General object detection model (including 676 classes) on the publisher side. On the COCO dataset with the same strategy, when V100 is 19.5FPS, the COCO mAP can reach 49.4%.
- Mobile terminal model optimization:
- Added SSD Lite series optimization models, including Ghost Net Backbone, FPN components, etc., with accuracy improved by 0.5% and 1.5%.
- Ease of use and functional components:
- Add GridMask, Random Erasing data enhancement method.
- Added support for Matrix NMS.
- EMA(Exponential Moving Average) training support.
- The new multi-machine training method, the average acceleration ratio of two machines to single machine is 80%, multi-machine training support needs to be further verified.
### v0.3.0(05/2020)
- Model richness enhancement:
- Efficientdet-D0 model added, speed and accuracy is better than competing products.
- Added YOLOv4 prediction model, precision aligned with competing products; Added YOLOv4 fine tuning training on Pascal VOC datasets with accuracy of 85.5%.
- YOLOv3 added MobileNetV3 backbone network, COCO dataset accuracy reached 31.6%.
- Add Anchor-free model FCOS, the accuracy is better than competing products.
- Anchor-free model Cornernet Squeeze was added, the accuracy was better than competing products, and the accuracy of COCO dataset of optimized model was 38.2% and +3.7%, 5% faster than YOL Ov3 Darknet53.
- The CascadeRCNN-ResNet50vd model, which is a practical object detection model on the server side, is added, and its speed and accuracy are better than that of the competitive EfficientDet.
- Mobile terminal launched three models:
- SSSDLite model: SSDLite-Mobilenetv3 small/large model, with better accuracy than competitors.
- YOLOv3 Mobile solution: The YOLOv3-MobileNetv3 model accelerates 3.5 times after compression, which is faster and more accurate than the SSD Lite model of competing products.
- RCNN Mobile terminal scheme: CascadeRCNN-MobileNetv3, after series optimization, launched models with input images of 320x320 and 640x640 respectively, with high cost performance for speed and accuracy.
- Anticipate deployment refactoring:
- New Python prediction deployment process, support for RCNN, YOLO, SSD, Retina Net, face models, support for video prediction.
- Refactoring C++ predictive deployment to improve ease of use.
- Ease of use and functional components:
- Added Auto Augment data enhancement.
- Upgrade the detection library document structure.
- Support shape matching automatically by transfer learning.
- Optimize memory footprint during mask branch evaluation.
### v0.2.0(02/2020)
- The new model:
- Added CBResNet model.
- Added LibraRCNN model.
- The accuracy of YOLOv3 model was further improved, and the accuracy based on COCO data reached 43.2%, 1.4% higher than the previous version.
- New Basic module:
- Trunk network: CBResNet is added.
- Loss module: Loss of YOLOv3 supports fine-grained OP combinations.
- Regular module: Added the Drop Block module.
- Function optimization and improvement:
- Accelerate YOLOv3 data preprocessing and increase the overall training speed by 40%.
- Optimize data preprocessing logic to improve ease of use.
- dd face detection prediction benchmark data.
- Added C++ prediction engine Python API prediction example.
- Detection model compression:
- prunning: Release MobileNet-YOLOv3 prunning scheme and model, based on VOC data FLOPs 69.6%, mAP + 1.4%, based on COCO DATA FLOPS 28.8%, mAP + 0.9%; Release ResNet50vd-DCN-YOLOv3 clipped solution and model based on COCO datasets 18.4%, mAP + 0.8%.
- Distillation: Release MobileNet-YOLOv3 distillation scheme and model, based on VOC data mAP + 2.8%, COCO data mAP + 2.1%.
- Quantification: Release quantification models of YOLOv3 Mobile Net and Blaze Face.
- Prunning + distillation: release MobileNet-YOLOv3 prunning + distillation solution and model, 69.6% based on COCO DATA FLOPS, 64.5% based on TensorRT prediction acceleration, 0.3% mAP; Release ResNet50vd-DCN-YOLOv3 tailoring + distillation solution and model, 43.7% based on COCO Data FLOPS, 24.0% based on TensorRT prediction acceleration, mAP + 0.6%.
- Search: Open source Blaze Face Nas complete search solution.
- Predict deployment:
- Integrated TensorRT, support FP16, FP32, INT8 quantitative inference acceleration.
- Document:
- Add detailed data preprocessing module to introduce documents and implement custom data Reader documents.
- Added documentation on how to add algorithm models.
- Document deployment to the web site: https://paddledetection.readthedocs.io
### 12/2019
- Add Res2Net model.
- Add HRNet model.
- Add GIOU loss and DIOU loss。
### 21/11/2019
- Add CascadeClsAware RCNN model.
- Add CBNet, ResNet200 and Non-local model.
- Add SoftNMS.
- Add Open Image V5 dataset and Objects365 dataset model
### 10/2019
- Added enhanced YOLOv3 model with accuracy up to 41.4%.
- Added Face detection models BlazeFace and Faceboxes.
- Rich COCO based models, accuracy up to 51.9%.
- Added CA-Cascade-RCNN, one of the best single models to win on Objects365 2019 Challenge.
- Add pedestrian detection and vehicle detection pre-training models.
- Support FP16 training.
- Added cross-platform C++ inference deployment scheme.
- Add model compression examples.
### 2/9/2019
- Add GroupNorm model.
- Add CascadeRCNN+Mask model.
### 5/8/2019
- Add Modulated Deformable Convolution series model
### 29/7/2019
- Add detection library Chinese document
- Fixed an issue where R-CNN series model training was evaluated simultaneously
- Add ResNext101-vd + Mask R-CNN + FPN models
- Added YOLOv3 model based on VOC dataset
### 3/7/2019
- First release of PaddleDetection Detection library and Detection model library
- models:Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD.
# Model Libraries and Baselines
## Test Environment
- Python 3.7
- PaddlePaddle Daily version
- CUDA 10.1
- cuDNN 7.5
- NCCL 2.4.8
## General Settings
- All models were trained and tested in the COCO17 dataset.
- Unless special instructions, all the ResNet backbone network using [ResNet-B](https://arxiv.org/pdf/1812.01187) structure.
- **Inference time (FPS)**: The reasoning time was calculated on a Tesla V100 GPU by `tools/eval.py` testing all validation sets in FPS (number of pictures/second). CuDNN version is 7.5, including data loading, network forward execution and post-processing, and Batch size is 1.
## Training strategy
- We adopt and [Detectron](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#training-schedules) in the same training strategy.
- 1x strategy indicates that when the total batch size is 8, the initial learning rate is 0.01, and the learning rate decreases by 10 times after 8 epoch and 11 epoch, respectively, and the final training is 12 epoch.
- 2X strategy is twice as much as strategy 1X, and the learning rate adjustment position is twice as much as strategy 1X.
## ImageNet pretraining model
Paddle provides a skeleton network pretraining model based on ImageNet. All pre-training models were trained by standard Imagenet 1K dataset. Res Net and Mobile Net are high-precision pre-training models obtained by cosine learning rate adjustment strategy or SSLD knowledge distillation training. Model details are available at [PaddleClas](https://github.com/PaddlePaddle/PaddleClas).
## Baseline
### Faster R-CNN
Please refer to[Faster R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/faster_rcnn/)
### Mask R-CNN
Please refer to[Mask R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn/)
### Cascade R-CNN
Please refer to[Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn)
### YOLOv3
Please refer to[YOLOv3](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/)
### SSD
Please refer to[SSD](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ssd/)
### FCOS
Please refer to[FCOS](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/fcos/)
### SOLOv2
Please refer to[SOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/solov2/)
### PP-YOLO
Please refer to[PP-YOLO](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/)
### TTFNet
请参考[TTFNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/)
### Group Normalization
Please refer to[Group Normalization](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/gn/)
### Deformable ConvNets v2
Please refer to[Deformable ConvNets v2](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dcn/)
### HRNets
Please refer to[HRNets](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/hrnet/)
### Res2Net
Please refer to[Res2Net](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/res2net/)
### GFL
Please refer to[GFL](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/gfl)
### PicoDet
Please refer to[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet)
## Rotating frame detection
### S2ANet
Please refer to[S2ANet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/)
# How to Create Model Algorithm
In order to make better use of PaddleDetection, we will introduce the main model technical details and application of PaddleDetection in this document
## Directory
- [How to Create Model Algorithm](#how-to-create-model-algorithm)
- [Directory](#directory)
- [1. Introduction](#1-introduction)
- [2. Create Model](#2-create-model)
- [2.1 Create Model Structure](#21-create-model-structure)
- [2.1.1 Create Backbone](#211-create-backbone)
- [2.1.2 Create Neck](#212-create-neck)
- [2.1.3 Create Head](#213-create-head)
- [2.1.4 Create Loss](#214-create-loss)
- [2.1.5 Create Post-processing Module](#215-create-post-processing-module)
- [2.1.6 Create Architecture](#216-create-architecture)
- [2.2 Create Configuration File](#22-create-configuration-file)
- [2.2.1 Network Structure Configuration File](#221-network-structure-configuration-file)
- [2.2.2 Optimizer configuration file](#222-optimizer-configuration-file)
- [2.2.3 Reader Configuration File](#223-reader-configuration-file)
### 1. Introduction
Each model in the PaddleDetecion corresponds to a folder. In the case of Yolov3, models in the Yolov3 family correspond to the `configs/yolov3` folder. Yolov3 Darknet's general configuration file `configs/yolov3/yolov3_darknet53_270e_coco.yml`.
```
_BASE_: [
'../datasets/coco_detection.yml', # Dataset configuration file shared by all models
'../runtime.yml', # Runtime configuration
'_base_/optimizer_270e.yml', # Optimizer related configuration
'_base_/yolov3_darknet53.yml', # yolov3 Network structure configuration file
'_base_/yolov3_reader.yml', # yolov3 Reader module configuration
]
# The relevant configuration defined here can override the configuration of the same name in the above file
snapshot_epoch: 5
weights: output/yolov3_darknet53_270e_coco/model_final
```
As you can see, the modules in the configuration file are clearly divided into optimizer, network structure, and reader modules, with the exception of the common dataset configuration and runtime configuration. Rich optimizers, learning rate adjustment strategies, preprocessing operators, etc., are supported in PaddleDetection, so most of the time you don't need to write the optimizer and reader-related code, just configure it in the configuration file. Therefore, the main purpose of adding a new model is to build the network structure.
In `ppdet/modeling/`, all of the Paddle Detection network structures are defined and combined in the form of components. The main components of the network structure are as follows:
```
ppdet/modeling/
├── architectures
│ ├── faster_rcnn.py # Faster Rcnn model
│ ├── ssd.py # SSD model
│ ├── yolo.py # YOLOv3 model
│ │ ...
├── heads # detection head module
│ ├── xxx_head.py # define various detection heads
│ ├── roi_extractor.py # detection of region of interest extraction
├── backbones # backbone network module
│ ├── resnet.py # ResNet network
│ ├── mobilenet.py # MobileNet network
│ │ ...
├── losses # loss function module
│ ├── xxx_loss.py # define and register various loss functions
├── necks # feature fusion module
│ ├── xxx_fpn.py # define various FPN modules
├── proposal_generator # anchor & proposal generate and match modules
│ ├── anchor_generator.py # anchor generate modules
│ ├── proposal_generator.py # proposal generate modules
│ ├── target.py # anchor & proposal Matching function
│ ├── target_layer.py # anchor & proposal Matching function
├── tests # unit test module
│ ├── test_xxx.py # the operator and module structure in the network are unit tested
├── ops.py # encapsulates all kinds of common detection components/operators related to the detection of PaddlePaddle objects
├── layers.py # encapsulates and register all kinds of PaddlePaddle object detection related public detection components/operators
├── bbox_utils.py # encapsulates the box-related functions
├── post_process.py # encapsulate and process related modules after registration
├── shape_spec.py # defines a class for the module to output shape
```
![](../images/model_figure.png)
### 2. Create Model
Next, the modeling process is described in detail by taking the single-stage detector YOLOv3 as an example, so that you can quickly build a new model according to this idea.
#### 2.1 Create Model Structure
##### 2.1.1 Create Backbone
All existing Backbone network code in PaddleDetection is placed under `ppdet/modeling/backbones` directory, so we created `darknet.py` as follows:
```python
import paddle.nn as nn
from ppdet.core.workspace import register, serializable
@register
@serializable
class DarkNet(nn.Layer):
__shared__ = ['norm_type']
def __init__(self,
depth=53,
return_idx=[2, 3, 4],
norm_type='bn',
norm_decay=0.):
super(DarkNet, self).__init__()
# Omit the content
def forward(self, inputs):
# Ellipsis processing logic
pass
@property
def out_shape(self):
# Omit the content
pass
```
Then add a reference to `backbones/__init__.py`:
```python
from . import darknet
from .darknet import *
```
**A few notes:**
- To flexibly configure networks in the YAML configuration file, all backbone nodes need to register in `ppdet.core.workspace` as shown in the preceding example. In addition, `serializable` can be used to enable backbone to support serialization;
- All backbone needs to inherit the `paddle.nn.Layer` class and implement the forward function. In addition, it is necessary to implement the out shape attribute to define the channel information of the output feature map. For details, please refer to the source code.
- `__shared__` To realize global sharing of configuration parameters, these parameters can be shared by all registration modules, such as backbone, neck, head, and loss.
##### 2.1.2 Create Neck
The feature fusion module is placed under the `ppdet/modeling/necks` directory and we create the following `yolo_fpn.py`:
``` python
import paddle.nn as nn
from ppdet.core.workspace import register, serializable
@register
@serializable
class YOLOv3FPN(nn.Layer):
__shared__ = ['norm_type']
def __init__(self,
in_channels=[256, 512, 1024],
norm_type='bn'):
super(YOLOv3FPN, self).__init__()
# Omit the content
def forward(self, blocks):
# Omit the content
pass
@classmethod
def from_config(cls, cfg, input_shape):
# Omit the content
pass
@property
def out_shape(self):
# Omit the content
pass
```
Then add a reference to `necks/__init__.py`:
```python
from . import yolo_fpn
from .yolo_fpn import *
```
**A few notes:**
- The neck module needs to be registered with `register` and can be serialized with `serializable`.
- The neck module needs to inherit the `paddle.nn.Layer` class and implement the forward function. In addition, the `out_shape` attribute needs to be implemented to define the channel information of the output feature map, and the class function `from_config` needs to be implemented to deduce the input channel in the configuration file and initialize `YOLOv3FPN`.
- The neck module can use `shared` to implement global sharing of configuration parameters.
##### 2.1.3 Create Head
The head module is all stored in the `ppdet/modeling/heads` directory, where we create `yolo_head.py` as follows
``` python
import paddle.nn as nn
from ppdet.core.workspace import register
@register
class YOLOv3Head(nn.Layer):
__shared__ = ['num_classes']
__inject__ = ['loss']
def __init__(self,
anchors=[[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45],[59, 119],
[116, 90], [156, 198], [373, 326]],
anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
num_classes=80,
loss='YOLOv3Loss',
iou_aware=False,
iou_aware_factor=0.4):
super(YOLOv3Head, self).__init__()
# Omit the content
def forward(self, feats, targets=None):
# Omit the content
pass
```
Then add a reference to `heads/__init__.py`:
```python
from . import yolo_head
from .yolo_head import *
```
**A few notes:**
- The head module needs to register with `register`.
- The head module needs to inherit the `paddle.nn.Layer` class and implement the forward function.
- `__inject__` indicates that the module encapsulated in the global dictionary is imported. Such as loss, etc.
##### 2.1.4 Create Loss
The loss modules are all stored under `ppdet/modeling/losses` directory, where we created `yolo_loss.py`
```python
import paddle.nn as nn
from ppdet.core.workspace import register
@register
class YOLOv3Loss(nn.Layer):
__inject__ = ['iou_loss', 'iou_aware_loss']
__shared__ = ['num_classes']
def __init__(self,
num_classes=80,
ignore_thresh=0.7,
label_smooth=False,
downsample=[32, 16, 8],
scale_x_y=1.,
iou_loss=None,
iou_aware_loss=None):
super(YOLOv3Loss, self).__init__()
# Omit the content
def forward(self, inputs, targets, anchors):
# Omit the content
pass
```
Then add a reference to `losses/__init__.py`:
```python
from . import yolo_loss
from .yolo_loss import *
```
**A few notes:**
- The loss module needs to register with `register`.
- The loss module needs to inherit the `paddle.nn.Layer` class and implement the forward function.
- `__inject__` modules that have been encapsulated in the global dictionary can be used. Some parameters can be globally shared with `__shared__` configuration.
##### 2.1.5 Create Post-processing Module
The post-processing module is defined in `ppdet/modeling/post_process.py`, where the `BBoxPostProcess` class is defined for post-processing operations, as follows:
``` python
from ppdet.core.workspace import register
@register
class BBoxPostProcess(object):
__shared__ = ['num_classes']
__inject__ = ['decode', 'nms']
def __init__(self, num_classes=80, decode=None, nms=None):
# Omit the content
pass
def __call__(self, head_out, rois, im_shape, scale_factor):
# Omit the content
pass
```
**A few notes:**
- Post-processing modules need to register with `register`
- `__inject__` modules encapsulated in the global dictionary, such as decode and NMS. Decode and NMS are defined in `ppdet/modeling/layers.py`.
##### 2.1.6 Create Architecture
All architecture network code is placed in `ppdet/modeling/architectures` directory, `meta_arch.py` defines the `BaseArch` class, the code is as follows:
``` python
import paddle.nn as nn
from ppdet.core.workspace import register
@register
class BaseArch(nn.Layer):
def __init__(self):
super(BaseArch, self).__init__()
def forward(self, inputs):
self.inputs = inputs
self.model_arch()
if self.training:
out = self.get_loss()
else:
out = self.get_pred()
return out
def model_arch(self, ):
pass
def get_loss(self, ):
raise NotImplementedError("Should implement get_loss method!")
def get_pred(self, ):
raise NotImplementedError("Should implement get_pred method!")
```
All architecture needs to inherit from the `BaseArch` class, as defined by `yolo.py` in `YOLOv3` as follows:
``` python
@register
class YOLOv3(BaseArch):
__category__ = 'architecture'
__inject__ = ['post_process']
def __init__(self,
backbone='DarkNet',
neck='YOLOv3FPN',
yolo_head='YOLOv3Head',
post_process='BBoxPostProcess'):
super(YOLOv3, self).__init__()
self.backbone = backbone
self.neck = neck
self.yolo_head = yolo_head
self.post_process = post_process
@classmethod
def from_config(cls, cfg, *args, **kwargs):
# Omit the content
pass
def get_loss(self):
# Omit the content
pass
def get_pred(self):
# Omit the content
pass
```
**A few notes:**
- All architecture needs to be registered using a `register`
- When constructing a complete network, `__category__ = 'architecture'` must be set to represent a complete object detection model;
- Backbone, neck, YOLO head, post-process and other inspection components are passed into the architecture to form the final network. Modularization of detection like this improves the reusability of detection models, and multiple models can be obtained by combining different detection components.
- The from config class function implements the automatic configuration of channels when modules are combined.
#### 2.2 Create Configuration File
##### 2.2.1 Network Structure Configuration File
The configuration of the yolov3 network structure is defined in the `configs/yolov3/_base_/` folder. For example, `yolov3_darknet53.yml` defines the network structure of Yolov3 Darknet as follows:
```
architecture: YOLOv3
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
norm_type: sync_bn
YOLOv3:
backbone: DarkNet
neck: YOLOv3FPN
yolo_head: YOLOv3Head
post_process: BBoxPostProcess
DarkNet:
depth: 53
return_idx: [2, 3, 4]
# use default config
# YOLOv3FPN:
YOLOv3Head:
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
loss: YOLOv3Loss
YOLOv3Loss:
ignore_thresh: 0.7
downsample: [32, 16, 8]
label_smooth: false
BBoxPostProcess:
decode:
name: YOLOBox
conf_thresh: 0.005
downsample_ratio: 32
clip_bbox: true
nms:
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.01
nms_threshold: 0.45
nms_top_k: 1000
```
In the configuration file, you need to specify the network architecture, pretrain weights to specify the URL or path of the training model, and norm type to share as global parameters. The definition of the model is defined in the file from top to bottom, corresponding to the model components in the previous section. For some model components, if the default parameters are used, you do not need to configure them, such as `yolo_fpn` above. By changing related configuration, we can easily combine another model, such as `configs/yolov3/_base_/yolov3_mobilenet_v1.yml` to switch backbone from Darknet to MobileNet.
##### 2.2.2 Optimizer configuration file
The optimizer profile defines the optimizer used by the model and the learning rate scheduling strategy. Currently, a variety of optimizers and learning rate strategies have been integrated in PaddleDetection, as described in the code `ppdet/optimizer.py`. For example, the optimizer configuration file for yolov3 is defined in `configs/yolov3/_base_/optimizer_270e.yml` as follows:
```
epoch: 270
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
# epoch number
- 216
- 243
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
```
**A few notes:**
- Optimizer builder. Optimizer specifies the type and parameters of the Optimizer. Currently support the optimizer can reference [PaddlePaddle official documentation](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html)
- The `LearningRate.schedulers` sets the combination of different Learning Rate adjustment strategies. Paddle currently supports a variety of Learning Rate adjustment strategies. Specific also can reference [Paddle Paddle official documentation](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html). It is important to note that you need to simply package the learning rate adjustment strategy in Paddle, which can be found in the source code `ppdet/optimizer.py`.
##### 2.2.3 Reader Configuration File
For Reader configuration, see [Reader configuration documentation](./READER_en.md#5.Configuration-and-Operation).
> After reading this document, you should have some experience in model construction and configuration of Paddle Detection, and you will understand it more thoroughly with the source code. If you have other questions or suggestions about model technology, please send us an issue. We welcome your feedback.
此差异已折叠。
......@@ -137,8 +137,8 @@ list below can be viewed by `--help`
## Deployment
Please refer to [depolyment](../../deploy/README.md)
Please refer to [depolyment](../../deploy/README_en.md)
## Model Compression
Please refer to [slim](../../configs/slim/README.md)
Please refer to [slim](../../configs/slim/README_en.md)
# How to Prepare Training Data
## Directory
- [How to Prepare Training Data](#how-to-prepare-training-data)
- [Directory](#directory)
- [Description of Object Detection Data](#description-of-object-detection-data)
- [Prepare Training Data](#prepare-training-data)
- [VOC Data](#voc-data)
- [VOC Dataset Download](#voc-dataset-download)
- [Introduction to VOC Data Annotation File](#introduction-to-voc-data-annotation-file)
- [COCO Data](#coco-data)
- [COCO Data Download](#coco-data-download)
- [Description of COCO Data Annotation](#description-of-coco-data-annotation)
- [User Data](#user-data)
- [Convert User Data to VOC Data](#convert-user-data-to-voc-data)
- [Convert User Data to COCO Data](#convert-user-data-to-coco-data)
- [Reader of User Define Data](#reader-of-user-define-data)
- [Example of User Data Conversion](#example-of-user-data-conversion)
### Description of Object Detection Data
The data of object detection is more complex than classification. In an image, it is necessary to mark the position and category of each object.
The general object position is represented by a rectangular box, which is generally expressed in the following three ways
| Expression | Explanation |
| :---------: | :----------------------------------------------------------------------------: |
| x1,y1,x2,y2 | (x1,y1)is the top left coordinate, (x2,y2)is the bottom right coordonate |
| x1,y1,w,h | (x1,y1)is the top left coordinate, w is width of object, h is height of object |
| xc,yc,w,h | (xc,yc)is center of object, w is width of object, h is height of object |
Common object detection datasets such as Pascal VOC, adopting `[x1,y1,x2,y2]` to express the bounding box of object. COCO uses `[x1,y1,w,h]` , [format](https://cocodataset.org/#format-data).
### Prepare Training Data
PaddleDetection is supported [COCO](http://cocodataset.org) and [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) and [WIDER-FACE](http://shuoyang1213.me/WIDERFACE/) datasets by default.
It also supports custom data sources including:
(1) Convert custom data to VOC format;
(2) Convert custom data to COOC format;
(3) Customize a new data source, and add custom reader;
firstly, enter `PaddleDetection` root directory
```
cd PaddleDetection/
ppdet_root=$(pwd)
```
#### VOC Data
VOC data is used in [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) competition. Pascal VOC competition not only contains image classification task, but also contains object detection and object segmentation et al., the annotation file contains the ground truth of multiple tasks.
VOC dataset denotes the data of PAscal VOC competition. when customizeing VOC data, For non mandatory fields in the XML file, please select whether to label or use the default value according to the actual situation.
##### VOC Dataset Download
- Download VOC datasets through code automation. The datasets are large and take a long time to download
```
# Execute code to automatically download VOC dataset
python dataset/voc/download_voc.py
```
After code execution, the VOC dataset file organization structure is:
```
>>cd dataset/voc/
>>tree
├── create_list.py
├── download_voc.py
├── generic_det_label_list.txt
├── generic_det_label_list_zh.txt
├── label_list.txt
├── VOCdevkit/VOC2007
│ ├── annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.jpg
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 2011_003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 2011_003876.jpg
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
Description of each document
```
# label_list.txt is list of classes name,filename must be label_list.txt. If using VOC dataset, when `use_default_label=true` in config file, this file is not required.
>>cat label_list.txt
aeroplane
bicycle
...
# trainval.txt is file list of trainset
>>cat trainval.txt
VOCdevkit/VOC2007/JPEGImages/007276.jpg VOCdevkit/VOC2007/Annotations/007276.xml
VOCdevkit/VOC2012/JPEGImages/2011_002612.jpg VOCdevkit/VOC2012/Annotations/2011_002612.xml
...
# test.txt is file list of testset
>>cat test.txt
VOCdevkit/VOC2007/JPEGImages/000001.jpg VOCdevkit/VOC2007/Annotations/000001.xml
...
# label_list.txt voc list of classes name
>>cat label_list.txt
aeroplane
bicycle
...
```
- If the VOC dataset has been downloaded
You can organize files according to the above data file organization structure.
##### Introduction to VOC Data Annotation File
In VOC dataset, Each image file corresponds to an XML file with the same name, the coordinates and categories of the marked object frame in the XML file, such as `2007_002055.jpg`:
![](../images/2007_002055.jpg)
The XML file corresponding to the image contains the basic information of the corresponding image, such as file name, source, image size, object area information and category information contained in the image.
The XML file contains the following fields:
- filename, indicating the image name.
- size, indicating the image size, including: image width, image height and image depth
```
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
```
- object field, indict each object, including:
| Label | Explanation |
| :--------------: | :------------------------------------------------------------------------------------------------------------------------: |
| name | name of object class |
| pose | attitude description of the target object (non required field) |
| truncated | If the occlusion of the object exceeds 15-20% and is outside the bounding box,mark it as `truncated` (non required field) |
| difficult | objects that are difficult to recognize are marked as`difficult` (non required field) |
| bndbox son laebl | (xmin,ymin) top left coordinate, (xmax,ymax) bottom right coordinate |
#### COCO Data
COOC data is used in [COCO](http://cocodataset.org) competition. alike, Coco competition also contains multiple competition tasks, and its annotation file contains the annotation contents of multiple tasks.
The coco dataset refers to the data used in the coco competition. Customizing coco data, some fields in JSON file, please select whether to label or use the default value according to the actual situation.
##### COCO Data Download
- The coco dataset is downloaded automatically through the code. The dataset is large and takes a long time to download
```
# automatically download coco datasets by executing code
python dataset/coco/download_coco.py
```
after code execution, the organization structure of coco dataset file is:
```
>>cd dataset/coco/
>>tree
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
- If the coco dataset has been downloaded
The files can be organized according to the above data file organization structure.
##### Description of COCO Data Annotation
Coco data annotation is to store the annotations of all training images in a JSON file. Data is stored in the form of nested dictionaries.
The JSON file contains the following keys:
- info,indicating the annotation file info。
- licenses, indicating the label file licenses。
- images, indicating the list of image information in the annotation file, and each element is the information of an image. The following is the information of one of the images:
```
{
'license': 3, # license
'file_name': '000000391895.jpg', # file_name
# coco_url
'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg',
'height': 360, # image height
'width': 640, # image width
'date_captured': '2013-11-14 11:18:45', # date_captured
# flickr_url
'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg',
'id': 391895 # image id
}
```
- annotations: indicating the annotation information list of the target object in the annotation file. Each element is the annotation information of a target object. The following is the annotation information of one of the target objects:
```
{
'segmentation': # object segmentation annotation
'area': 2765.1486500000005, # object area
'iscrowd': 0, # iscrowd
'image_id': 558840, # image id
'bbox': [199.84, 200.46, 77.71, 70.88], # bbox [x1,y1,w,h]
'category_id': 58, # category_id
'id': 156 # image id
}
```
```
# Viewing coco annotation files
import json
coco_anno = json.load(open('./annotations/instances_train2017.json'))
# coco_anno.keys
print('\nkeys:', coco_anno.keys())
# Viewing categories information
print('\ncategories:', coco_anno['categories'])
# Viewing the number of images
print('\nthe number of images:', len(coco_anno['images']))
# Viewing the number of obejcts
print('\nthe number of annotation:', len(coco_anno['annotations']))
# View object annotation information
print('\nobject annotation information: ', coco_anno['annotations'][0])
```
Coco data is prepared as follows.
`dataset/coco/`Initial document organization
```
>>cd dataset/coco/
>>tree
├── download_coco.py
```
#### User Data
There are three processing methods for user data:
(1) Convert user data into VOC data (only include labels necessary for object detection as required)
(2) Convert user data into coco data (only include labels necessary for object detection as required)
(3) Customize a reader for user data (for complex data, you need to customize the reader)
##### Convert User Data to VOC Data
After the user dataset is converted to VOC data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems):
```
dataset/xxx/
├── annotations
│ ├── xxx1.xml
│ ├── xxx2.xml
│ ├── xxx3.xml
│ | ...
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
├── label_list.txt (Must be provided and the file name must be label_list.txt )
├── train.txt (list of trainset ./images/xxx1.jpg ./annotations/xxx1.xml)
└── valid.txt (list of valid file)
```
Description of each document
```
# label_list.txt is a list of category names. The file name must be this
>>cat label_list.txt
classname1
classname2
...
# train.txt is list of trainset
>>cat train.txt
./images/xxx1.jpg ./annotations/xxx1.xml
./images/xxx2.jpg ./annotations/xxx2.xml
...
# valid.txt is list of validset
>>cat valid.txt
./images/xxx3.jpg ./annotations/xxx3.xml
...
```
##### Convert User Data to COCO Data
`x2coco.py` is provided in `./tools/` to convert VOC dataset, labelme labeled dataset or cityscape dataset into coco data, for example:
(1)Conversion of labelme data to coco data:
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
(2)Convert VOC data to coco data:
```bash
python tools/x2coco.py \
--dataset_type voc \
--voc_anno_dir path/to/VOCdevkit/VOC2007/Annotations/ \
--voc_anno_list path/to/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt \
--voc_label_list dataset/voc/label_list.txt \
--voc_out_name voc_train.json
```
After the user dataset is converted to coco data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems):
```
dataset/xxx/
├── annotations
│ ├── train.json # Annotation file of coco data
│ ├── valid.json # Annotation file of coco data
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```
##### Reader of User Define Data
If new data in the dataset needs to be added to paddedetection, you can refer to the [add new data source] (../advanced_tutorials/READER.md#2.3_Customizing_Dataset) document section in the data processing document to develop corresponding code to complete the new data source support. At the same time, you can read the [data processing document] (../advanced_tutorials/READER.md) for specific code analysis of data processing
#### Example of User Data Conversion
Take [Kaggle Dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) competition data as an example to illustrate how to prepare custom data. The dataset of Kaggle [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) competition contains 877 images, four categories:crosswalk,speedlimit,stop,trafficlight. Available for download from kaggle, also available from [link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
Example diagram of road sign dataset:
![](../images/road554.png)
```
# Downing and unziping data
>>cd $(ppdet_root)/dataset
# Download and unzip the kaggle dataset. The current file organization is as follows
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
```
The data is divided into training set and test set
```
# Generating label_list.txt
>>echo -e "speedlimit\ncrosswalk\ntrafficlight\nstop" > label_list.txt
# Generating train.txt, valid.txt and test.txt
>>ls images/*.png | shuf > all_image_list.txt
>>awk -F"/" '{print $2}' all_image_list.txt | awk -F".png" '{print $1}' | awk -F"\t" '{print "images/"$1".png annotations/"$1".xml"}' > all_list.txt
# The proportion of training set, verification set and test set is about 80%, 10% and 10% respectively.
>>head -n 88 all_list.txt > test.txt
>>head -n 176 all_list.txt | tail -n 88 > valid.txt
>>tail -n 701 all_list.txt > train.txt
# Deleting unused files
>>rm -rf all_image_list.txt all_list.txt
The organization structure of the final dataset file is:
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
├── label_list.txt
├── test.txt
├── train.txt
└── valid.txt
# label_list.txt is list of file name, file name must be label_list.txt
>>cat label_list.txt
crosswalk
speedlimit
stop
trafficlight
# train.txt is the list of training dataset files, and each line is an image path and the corresponding annotation file path, separated by spaces. Note that the path here is a relative path within the dataset folder.
>>cat train.txt
./images/road839.png ./annotations/road839.xml
./images/road363.png ./annotations/road363.xml
...
# valid.txt is the list of validation dataset files. Each line is an image path and the corresponding annotation file path, separated by spaces. Note that the path here is a relative path within the dataset folder.
>>cat valid.txt
./images/road218.png ./annotations/road218.xml
./images/road681.png ./annotations/road681.xml
```
You can also download [the prepared data](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar), unzip to `dataset/roadsign_voc/`
After preparing the data, we should generally understand the data, such as image quantity, image size, number of target areas of each type, target area size, etc. If necessary, clean the data.
Roadsign dataset statistics:
| data | number of images |
| :---: | :--------------: |
| train | 701 |
| valid | 176 |
**Explanation:**
(1) For user data, it is recommended to carefully check the data before training to avoid crash during training due to wrong data annotation format or incomplete image data
(2) If the image size is too large, it will occupy more memory without limiting the read data size, which will cause memory / video memory overflow. Please set batch reasonably_ Size, you can try from small to large
# RCNN series model parameter configuration tutorial
Tag: Model parameter configuration
Take `faster_rcnn_r50_fpn_1x_coco.yml` as an example. The model consists of five sub-profiles:
- Data profile `coco_detection.yml`
```yaml
# Data evaluation type
metric: COCO
# The number of categories in the dataset
num_classes: 80
# TrainDataset
TrainDataset:
!COCODataSet
# Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir)
image_dir: train2017
# Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_train2017.json
# data file
dataset_dir: dataset/coco
# data_fields
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
# Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir)
image_dir: val2017
# Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_val2017.json
# data file file os.path.join(dataset_dir, anno_path)
dataset_dir: dataset/coco
TestDataset:
!ImageFolder
# Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_val2017.json
```
- Optimizer configuration file `optimizer_1x.yml`
```yaml
# Total training epoches
epoch: 12
# learning rate setting
LearningRate:
# Default is 8 Gpus training learning rate
base_lr: 0.01
# Learning rate adjustment strategy
schedulers:
- !PiecewiseDecay
gamma: 0.1
# Position of change in learning rate (number of epoches)
milestones: [8, 11]
- !LinearWarmup
start_factor: 0.1
steps: 1000
# Optimizer
OptimizerBuilder:
# Optimizer
optimizer:
momentum: 0.9
type: Momentum
# Regularization
regularizer:
factor: 0.0001
type: L2
```
- Data reads configuration files `faster_fpn_reader.yml`
```yaml
# Number of PROCESSES per GPU Reader
worker_num: 2
# training data
TrainReader:
# Training data transforms
sample_transforms:
- Decode: {}
- RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True}
- RandomFlip: {prob: 0.5}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
# Since the model has FPN structure, the input image needs a multiple of 32 padding
- PadBatch: {pad_to_stride: 32}
# Batch_size during training
batch_size: 1
# Read data is out of order
shuffle: true
# Whether to discard data that does not complete the batch
drop_last: true
# Set it to false. Then you have a sequence of values for GT: List [Tensor]
collate_batch: false
# Evaluate data
EvalReader:
# Evaluate data transforms
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
# Since the model has FPN structure, the input image needs a multiple of 32 padding
- PadBatch: {pad_to_stride: 32}
# batch_size of evaluation
batch_size: 1
# Read data is out of order
shuffle: false
# Whether to discard data that does not complete the batch
drop_last: false
# Whether to discard unlabeled data
drop_empty: false
# test data
TestReader:
# test data transforms
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
# Since the model has FPN structure, the input image needs a multiple of 32 padding
- PadBatch: {pad_to_stride: 32}
# batch_size of test
batch_size: 1
# Read data is out of order
shuffle: false
# Whether to discard data that does not complete the batch
drop_last: false
```
- Model profile `faster_rcnn_r50_fpn.yml`
```yaml
# Model structure type
architecture: FasterRCNN
# Pretrain model address
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
# FasterRCNN
FasterRCNN:
# backbone
backbone: ResNet
# neck
neck: FPN
# rpn_head
rpn_head: RPNHead
# bbox_head
bbox_head: BBoxHead
# post process
bbox_post_process: BBoxPostProcess
# backbone
ResNet:
# index 0 stands for res2
depth: 50
# norm_type, Configurable parameter: bn or sync_bn
norm_type: bn
# freeze_at index, 0 represent res2
freeze_at: 0
# return_idx
return_idx: [0,1,2,3]
# num_stages
num_stages: 4
# FPN
FPN:
# channel of FPN
out_channel: 256
# RPNHead
RPNHead:
# anchor generator
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
anchor_sizes: [[32], [64], [128], [256], [512]]
strides: [4, 8, 16, 32, 64]
# rpn_target_assign
rpn_target_assign:
batch_size_per_im: 256
fg_fraction: 0.5
negative_overlap: 0.3
positive_overlap: 0.7
use_random: True
# The parameters of the proposal are generated during training
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 1000
topk_after_collect: True
# The parameters of the proposal are generated during evaluation
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
# BBoxHead
BBoxHead:
# TwoFCHead as BBoxHead
head: TwoFCHead
# roi align
roi_extractor:
resolution: 7
sampling_ratio: 0
aligned: True
# bbox_assigner
bbox_assigner: BBoxAssigner
# BBoxAssigner
BBoxAssigner:
# batch_size_per_im
batch_size_per_im: 512
# Background the threshold
bg_thresh: 0.5
# Prospects for threshold
fg_thresh: 0.5
# Prospects of proportion
fg_fraction: 0.25
# Random sampling
use_random: True
# TwoFCHead
TwoFCHead:
# TwoFCHead feature dimension
out_channel: 1024
# BBoxPostProcess
BBoxPostProcess:
# decode
decode: RCNNBox
# nms
nms:
# use MultiClassNMS
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.05
nms_threshold: 0.5
```
- runtime configuration file `runtime.yml`
```yaml
# Whether to use gpu
use_gpu: true
# Log Printing interval
log_iter: 20
# save_dir
save_dir: output
# Model save interval
snapshot_epoch: 1
```
# YOLO series model parameter configuration tutorial
Tag: Model parameter configuration
Take `ppyolo_r50vd_dcn_1x_coco.yml` as an example, The model consists of five sub-profiles:
- Data profile `coco_detection.yml`
```yaml
# Data evaluation type
metric: COCO
# The number of categories in the dataset
num_classes: 80
# TrainDataset
TrainDataset:
!COCODataSet
# Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir)
image_dir: train2017
# Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_train2017.json
# data file
dataset_dir: dataset/coco
# data_fields
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
# Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir)
image_dir: val2017
# Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_val2017.json
# data file os.path.join(dataset_dir, anno_path)
dataset_dir: dataset/coco
TestDataset:
!ImageFolder
# Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_val2017.json
```
- Optimizer configuration file `optimizer_1x.yml`
```yaml
# Total training epoches
epoch: 405
# learning rate setting
LearningRate:
# Default is 8 Gpus training learning rate
base_lr: 0.01
# Learning rate adjustment strategy
schedulers:
- !PiecewiseDecay
gamma: 0.1
# Position of change in learning rate (number of epoches)
milestones:
- 243
- 324
# Warmup
- !LinearWarmup
start_factor: 0.
steps: 4000
# Optimizer
OptimizerBuilder:
# Optimizer
optimizer:
momentum: 0.9
type: Momentum
# Regularization
regularizer:
factor: 0.0005
type: L2
```
- Data reads configuration files `ppyolo_reader.yml`
```yaml
# Number of PROCESSES per GPU Reader
worker_num: 2
# training data
TrainReader:
inputs_def:
num_max_boxes: 50
# Training data transforms
sample_transforms:
- Decode: {}
- Mixup: {alpha: 1.5, beta: 1.5}
- RandomDistort: {}
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
- RandomCrop: {}
- RandomFlip: {}
# batch_transforms
batch_transforms:
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeBox: {}
- PadBox: {num_max_boxes: 50}
- BboxXYXY2XYWH: {}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
- Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
# Batch size during training
batch_size: 24
# Read data is out of order
shuffle: true
# Whether to discard data that does not complete the batch
drop_last: true
# mixup_epoch,Greater than maximum epoch, Indicates that the training process has been augmented with mixup data
mixup_epoch: 25000
# Whether to use the shared memory to accelerate data reading, ensure that the shared memory size (such as /dev/shm) is greater than 1 GB
use_shared_memory: true
# Evaluate data
EvalReader:
# Evaluating data transforms
sample_transforms:
- Decode: {}
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
# Batch_size during evaluation
batch_size: 8
# Whether to discard unlabeled data
drop_empty: false
# test data
TestReader:
inputs_def:
image_shape: [3, 608, 608]
# test data transforms
sample_transforms:
- Decode: {}
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
# batch_size during training
batch_size: 1
```
- Model profile `ppyolo_r50vd_dcn.yml`
```yaml
# Model structure type
architecture: YOLOv3
# Pretrain model address
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
# norm_type
norm_type: sync_bn
# Whether to use EMA
use_ema: true
# ema_decay
ema_decay: 0.9998
# YOLOv3
YOLOv3:
# backbone
backbone: ResNet
# neck
neck: PPYOLOFPN
# yolo_head
yolo_head: YOLOv3Head
# post_process
post_process: BBoxPostProcess
# backbone
ResNet:
# depth
depth: 50
# variant
variant: d
# return_idx, 0 represent res2
return_idx: [1, 2, 3]
# dcn_v2_stages
dcn_v2_stages: [3]
# freeze_at
freeze_at: -1
# freeze_norm
freeze_norm: false
# norm_decay
norm_decay: 0.
# PPYOLOFPN
PPYOLOFPN:
# whether coord_conv or not
coord_conv: true
# whether drop_block or not
drop_block: true
# block_size
block_size: 3
# keep_prob
keep_prob: 0.9
# whether spp or not
spp: true
# YOLOv3Head
YOLOv3Head:
# anchors
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
# anchor_masks
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
# loss
loss: YOLOv3Loss
# whether to use iou_aware
iou_aware: true
# iou_aware_factor
iou_aware_factor: 0.4
# YOLOv3Loss
YOLOv3Loss:
# ignore_thresh
ignore_thresh: 0.7
# downsample
downsample: [32, 16, 8]
# whether label_smooth or not
label_smooth: false
# scale_x_y
scale_x_y: 1.05
# iou_loss
iou_loss: IouLoss
# iou_aware_loss
iou_aware_loss: IouAwareLoss
# IouLoss
IouLoss:
loss_weight: 2.5
loss_square: true
# IouAwareLoss
IouAwareLoss:
loss_weight: 1.0
# BBoxPostProcess
BBoxPostProcess:
decode:
name: YOLOBox
conf_thresh: 0.01
downsample_ratio: 32
clip_bbox: true
scale_x_y: 1.05
# nms setting
nms:
name: MatrixNMS
keep_top_k: 100
score_threshold: 0.01
post_threshold: 0.01
nms_top_k: -1
background_label: -1
```
- Runtime file `runtime.yml`
```yaml
# Whether to use gpu
use_gpu: true
# Log Printing interval
log_iter: 20
# save_dir
save_dir: output
# Model save interval
snapshot_epoch: 1
```
......@@ -13,7 +13,7 @@ cd "$DIR"
# Download the data.
echo "Downloading..."
# external link to the Faces in the Wild data set and annotations file
# external link to the Faces in the Wild dataset and annotations file
wget http://tamaraberg.com/faceDataset/originalPics.tar.gz
wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz
wget http://vis-www.cs.umass.edu/fddb/evaluation.tgz
......
......@@ -197,7 +197,7 @@ matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
```
#### Evaluate on the FDDB
We provide a FDDB data set evaluation process (currently only supports Linux systems),
We provide a FDDB dataset evaluation process (currently only supports Linux systems),
please refer to [FDDB official website](http://vis-www.cs.umass.edu/fddb/) for other specific details.
- 1)Download and install OpenCV:
......
# CACascade RCNN
## Intorduction
Objects365 2019 Challenge CACascade RCNN is one of the best single models won by Baidu Visual Technology Department in Objects365 2019 Challenge. Objects365 is a new dataset in the field of universal object detection, which aims to promote detection research on different objects in natural scenes. Objects365 marks 365 object classes on 630,000 images, and there are more than 10 million boundary boxes in the training set. This is one of the best single models in Full Track.
![](../../images/obj365_gt.png)
## Methods described
According to the characteristics of large-scale object detection algorithm, we propose a Class Aware Sampling method based on the number of object categories contained in the image. Training model based on this method can make the model converge to a better effect in a shorter time.
![](../../images/cas.png)
The best single model published this time is a two-stage detection model based on Cascade RCNN, which replaces Backbone with a more powerful SENet154 model, Deformable Conv module and a more complex two-stage network structure. As the Batch Size is relatively small, Group Normalization operation is added and multi-scale training is used, which has achieved very good results. The pre-training model was trained on ImageNet and COCO dataset successively, among which Mask branch was added in COCO dataset training, and the rest structure was the same as CACascade RCNN, which was automatically downloaded when the training started.
## Method of use
1.Data preparation
Data need to be [Objects365 official website](https://www.objects365.org/download.html) to apply for download, download data after placing data in a dataset directory.
```
${THIS REPO ROOT}
\--dataset
\-- objects365
\-- annotations
|-- train.json
|-- val.json
\-- train
\-- val
```
2.Priming training model
```bash
python tools/train.py -c configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas.yml
```
3.Model prediction results
| Model | Val set mAP | Download link | Configuration File |
| :-----------------: | :--------: | :----------------------------------------------------------: | :--------: |
| CACascadeRCNN SE154 | 31.6 | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas_obj365.tar) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas.yml) |
## Model effect
![](../../images/obj365_pred.png)
# CascadeCA RCNN
## Introduction
CascadeCA RCNN is the best single model of Baidu Visual Technology Department in Google AI Open Images 2019 Object Detction competition. This single model helped the team win the second place among more than 500 parameter teams. Open Images Dataset V5(OIDV5) contains 500 categories, 173W training Images and more than 1400W labeled borders. It is the largest Open Dataset of object detection known at present. Dataset address [https://storage.googleapis.com/openimages/web/index.html](https://storage.googleapis.com/openimages/web/index.html), Address of team's technical proposal report in competition [https://arxiv.org/pdf/1911.07171.pdf](https://arxiv.org/pdf/1911.07171.pdf)
![](../../images/oidv5_gt.png)
## Methods described
This model combines the current better detection methods. Specifically, it uses ResNet200-vd as the backbone of the detection model, The imagenet classification the training model in [here](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/README_en.md) download; CascadeCA RCNN, Feature Pyramid Networks, Non-local, Deformable-V2 and other methods are combined. It should be noted here that the standard CascadeRCNN only predicts two boxes (foreground and background, using the score information to determine the category to which the final foreground belongs), while this model separately predicts one box (Cascade Class Aware) for each category. The final block diagram of the model is shown in the figure below.
![](../../images/oidv5_model_framework.png)
Due to the serious category imbalance of OIDV5, the strategy of dynamic sampling is adopted to select samples and carry out training. Multi-scale training is used to solve the problem of large border area. In addition, the team used Libra Loss instead of Smooth L1 Loss to calculate the loss of the prediction box; In the prediction, SoftNMS method is used for post-processing to ensure that more boxes can be recalled.
About 189 categories of Objects365 Dataset and OIDV5 are repeated, so the two datasets are combined for training to expand the training data of OIDV5. Finally, the model and its performance indicators are shown in the following table. More specific model training and integration strategies can be seen: [OIDV5 technical report](https://arxiv.org/pdf/1911.07171.pdf)
The training results of OIDV5 model are as follows.
| Model structure | Public/Private Score | Download link | Configuration File |
| :-----------------: | :--------: | :----------------------------------------------------------: | :--------: |
| CascadeCARCNN-FPN-Dcnv2-Nonlocal ResNet200-vd | 0.62690/0.59459 | [model](https://paddlemodels.bj.bcebos.com/object_detection/oidv5_cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.tar) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/oidv5/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml) |
In addition, to verify the performance of the model, Paddle Detection also trained models for COCO2017 and Objects365 Dataset based on the model structure. The model and validation set indicators are shown in the following table.
| Model structure | Dataset | val set mAP | Download link | Configuration File |
| :-----------------: | :--------: | :--------: | :----------------------------------------------------------: | :--------: |
| CascadeCARCNN-FPN-Dcnv2-Nonlocal ResNet200-vd | COCO2017 | 51.7% | [Model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.tar) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/dcn/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml) |
| CascadeCARCNN-FPN-Dcnv2-Nonlocal ResNet200-vd | Objects365 | 34.5% | [Model](https://paddlemodels.bj.bcebos.com/object_detection/obj365_cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.tar) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/obj365/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml) |
COCO and Objects365 Dataset have the same data format. Currently, they only support prediction and evaluation.
## Method of use
OIDV5 dataset format is different from COCO, currently only single image prediction is supported. OIDV5 model evaluation method can be referred to [documentation](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/challenge_evaluation.md)
1. Download the model and unzip it.
2. Run the prediction program.
```bash
python -u tools/infer.py -c configs/oidv5/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml -o weights=./oidv5_cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/ --infer_img=demo/000000570688.jpg
```
The folder where the model is located needs to be modified according to its position.
Detection result images can be viewed in the `output` folder.
## Model detection effect
![](../../images/oidv5_pred.jpg)
English | [简体中文](QUICK_STARTED_cn.md)
# Quick Start
In order to enable users to quickly produce models in a short time and master the use of PaddleDetection, this tutorial uses a pre-trained detection model to finetune small data sets. A good model can be produced in a short period of time. In actual business, it is recommended that users select a suitable model configuration file for adaptation according to their needs.
In order to enable users to quickly produce models in a short time and master the use of PaddleDetection, this tutorial uses a pre-trained detection model to finetune small datasets. A good model can be produced in a short period of time. In actual business, it is recommended that users select a suitable model configuration file for adaptation according to their needs.
- **Set GPU**
```bash
export CUDA_VISIBLE_DEVICES=0
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册