From 16b4888b1af466394d4926f7d8e41c3c8d1a1d9f Mon Sep 17 00:00:00 2001 From: Jewel <2413914266@qq.com> Date: Fri, 29 Oct 2021 13:28:33 +0800 Subject: [PATCH] solve readme link useless (#4276) * slove_readme_link_useless --- README_en.md | 46 +- configs/dota/README_en.md | 186 ++++++++ configs/face_detection/README_en.md | 123 +++++ configs/rcnn_enhance/README_en.md | 12 + configs/slim/README_en.md | 166 +++++++ configs/ttfnet/README_en.md | 69 +++ deploy/BENCHMARK_INFER_en.md | 61 +++ deploy/EXPORT_MODEL_en.md | 53 +++ deploy/EXPORT_ONNX_MODEL_en.md | 51 +++ deploy/README_en.md | 73 +++ docs/CHANGELOG_en.md | 253 +++++++++++ docs/MODEL_ZOO_en.md | 94 ++++ docs/advanced_tutorials/MODEL_TECHNICAL_en.md | 409 +++++++++++++++++ docs/advanced_tutorials/READER_en.md | 328 ++++++++++++++ docs/tutorials/GETTING_STARTED.md | 4 +- docs/tutorials/PrepareDataSet_en.md | 423 ++++++++++++++++++ ...ster_rcnn_r50_fpn_1x_coco_annotation_en.md | 263 +++++++++++ .../ppyolo_r50vd_dcn_1x_coco_annotation_en.md | 266 +++++++++++ static/dataset/fddb/download.sh | 2 +- .../docs/featured_model/FACE_DETECTION_en.md | 2 +- .../champion_model/CACascadeRCNN_en.md | 45 ++ .../champion_model/OIDV5_BASELINE_MODEL_en.md | 52 +++ static/docs/tutorials/QUICK_STARTED.md | 2 +- 23 files changed, 2955 insertions(+), 28 deletions(-) create mode 100644 configs/dota/README_en.md create mode 100644 configs/face_detection/README_en.md create mode 100644 configs/rcnn_enhance/README_en.md create mode 100755 configs/slim/README_en.md create mode 100644 configs/ttfnet/README_en.md create mode 100644 deploy/BENCHMARK_INFER_en.md create mode 100644 deploy/EXPORT_MODEL_en.md create mode 100644 deploy/EXPORT_ONNX_MODEL_en.md create mode 100644 deploy/README_en.md create mode 100644 docs/CHANGELOG_en.md create mode 100644 docs/MODEL_ZOO_en.md create mode 100644 docs/advanced_tutorials/MODEL_TECHNICAL_en.md create mode 100644 docs/advanced_tutorials/READER_en.md create mode 100644 docs/tutorials/PrepareDataSet_en.md create mode 100644 docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation_en.md create mode 100644 docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md create mode 100644 static/docs/featured_model/champion_model/CACascadeRCNN_en.md create mode 100644 static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL_en.md diff --git a/README_en.md b/README_en.md index 18c752d52..64795e460 100644 --- a/README_en.md +++ b/README_en.md @@ -198,40 +198,40 @@ The relationship between COCO mAP and FPS on Tesla V100 of representative models - `PP-YOLO v2` is optimized version of `PP-YOLO` which has mAP of 49.5% and 68.9FPS on Tesla V100 -- All these models can be get in [Model Zoo](#ModelZoo) +- All these models can be get in [Model Zoo](#Model-Zoo) ## Tutorials ### Get Started -- [Installation guide](docs/tutorials/INSTALL_en.md) -- [Prepare dataset](docs/tutorials/PrepareDataSet.md) -- [Quick start on PaddleDetection](docs/tutorials/GETTING_STARTED_cn.md) +- [Installation guide](docs/tutorials/INSTALL.md) +- [Prepare dataset](docs/tutorials/PrepareDataSet_en.md) +- [Quick start on PaddleDetection](docs/tutorials/GETTING_STARTED.md) ### Advanced Tutorials - Parameter configuration - - [Parameter configuration for RCNN model](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md) - - [Parameter configuration for PP-YOLO model](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md) + - [Parameter configuration for RCNN model](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation_en.md) + - [Parameter configuration for PP-YOLO model](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md) - Model Compression(Based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)) - [Prune/Quant/Distill](configs/slim) - Inference and deployment - - [Export model for inference](deploy/EXPORT_MODEL.md) - - [Paddle Inference](deploy/README.md) + - [Export model for inference](deploy/EXPORT_MODEL_en.md) + - [Paddle Inference](deploy/README_en.md) - [Python inference](deploy/python) - [C++ inference](deploy/cpp) - [Paddle-Lite](deploy/lite) - [Paddle Serving](deploy/serving) - - [Export ONNX model](deploy/EXPORT_ONNX_MODEL.md) - - [Inference benchmark](deploy/BENCHMARK_INFER.md) + - [Export ONNX model](deploy/EXPORT_ONNX_MODEL_en.md) + - [Inference benchmark](deploy/BENCHMARK_INFER_en.md) - Advanced development - - [New data augmentations](docs/advanced_tutorials/READER.md) - - [New detection algorithms](docs/advanced_tutorials/MODEL_TECHNICAL.md) + - [New data augmentations](docs/advanced_tutorials/READER_en.md) + - [New detection algorithms](docs/advanced_tutorials/MODEL_TECHNICAL_en.md) ## Model Zoo @@ -239,15 +239,15 @@ The relationship between COCO mAP and FPS on Tesla V100 of representative models - Universal object detection - [Model library and baselines](docs/MODEL_ZOO_cn.md) - [PP-YOLO](configs/ppyolo/README.md) - - [Enhanced Anchor Free model--TTFNet](configs/ttfnet/README.md) - - [Mobile models](static/configs/mobile/README.md) - - [676 classes of object detection](static/docs/featured_model/LARGE_SCALE_DET_MODEL.md) - - [Two-stage practical PSS-Det](configs/rcnn_enhance/README.md) + - [Enhanced Anchor Free model--TTFNet](configs/ttfnet/README_en.md) + - [Mobile models](static/configs/mobile/README_en.md) + - [676 classes of object detection](static/docs/featured_model/LARGE_SCALE_DET_MODEL_en.md) + - [Two-stage practical PSS-Det](configs/rcnn_enhance/README_en.md) - [SSLD pretrained models](docs/feature_models/SSLD_PRETRAINED_MODEL_en.md) - Universal instance segmentation - [SOLOv2](configs/solov2/README.md) - Rotation object detection - - [S2ANet](configs/dota/README.md) + - [S2ANet](configs/dota/README_en.md) - [Keypoint detection](configs/keypoint) - HigherHRNet - HRNet @@ -257,12 +257,12 @@ The relationship between COCO mAP and FPS on Tesla V100 of representative models - [JDE](configs/mot/jde/README.md) - [FairMOT](configs/mot/fairmot/README.md) - Vertical field - - [Face detection](configs/face_detection/README.md) + - [Face detection](configs/face_detection/README_en.md) - [Pedestrian detection](configs/pedestrian/README.md) - [Vehicle detection](configs/vehicle/README.md) - Competition Plan - - [Objects365 2019 Challenge champion model](static/docs/featured_model/champion_model/CACascadeRCNN.md) - - [Best single model of Open Images 2019-Object Detection](static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md) + - [Objects365 2019 Challenge champion model](static/docs/featured_model/champion_model/CACascadeRCNN_en.md) + - [Best single model of Open Images 2019-Object Detection](static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL_en.md) ## Applications @@ -270,11 +270,11 @@ The relationship between COCO mAP and FPS on Tesla V100 of representative models ## Updates -v2.2 was released at `08/2021`, release Transformer detection models, release Dark HRNet keypoint detection model, release tracking models of head and vehicle, release optimized S2ANet model, inference with batch size > 1 supported for main architectures. Please refer to [change log](docs/CHANGELOG.md) for details. +v2.2 was released at `08/2021`, release Transformer detection models, release Dark HRNet keypoint detection model, release tracking models of head and vehicle, release optimized S2ANet model, inference with batch size > 1 supported for main architectures. Please refer to [change log](docs/CHANGELOG_en.md) for details. -v2.1 was released at `05/2021`, Release Keypoint Detection and Multi-Object Tracking. Release model compression for PPYOLO series. Update documents such as export ONNX model. Please refer to [change log](docs/CHANGELOG.md) for details. +v2.1 was released at `05/2021`, Release Keypoint Detection and Multi-Object Tracking. Release model compression for PPYOLO series. Update documents such as export ONNX model. Please refer to [change log](docs/CHANGELOG_en.md) for details. -v2.0 was released at `04/2021`, fully support dygraph version, which add BlazeFace, PSS-Det and plenty backbones, release `PP-YOLOv2`, `PP-YOLO tiny` and `S2ANet`, support model distillation and VisualDL, add inference benchmark, etc. Please refer to [change log](docs/CHANGELOG.md) for details. +v2.0 was released at `04/2021`, fully support dygraph version, which add BlazeFace, PSS-Det and plenty backbones, release `PP-YOLOv2`, `PP-YOLO tiny` and `S2ANet`, support model distillation and VisualDL, add inference benchmark, etc. Please refer to [change log](docs/CHANGELOG_en.md) for details. ## License diff --git a/configs/dota/README_en.md b/configs/dota/README_en.md new file mode 100644 index 000000000..947efacf8 --- /dev/null +++ b/configs/dota/README_en.md @@ -0,0 +1,186 @@ +# S2ANet Model + +## Content +- [S2ANet Model](#s2anet-model) + - [Content](#content) + - [Introduction](#introduction) + - [Prepare Data](#prepare-data) + - [DOTA data](#dota-data) + - [Customize Data](#customize-data) + - [Start Training](#start-training) + - [1. Install the rotating frame IOU and calculate the OP](#1-install-the-rotating-frame-iou-and-calculate-the-op) + - [2. Train](#2-train) + - [3. Evaluation](#3-evaluation) + - [4. Prediction](#4-prediction) + - [5. DOTA Data evaluation](#5-dota-data-evaluation) + - [Model Library](#model-library) + - [S2ANet Model](#s2anet-model-1) + - [Predict Deployment](#predict-deployment) + - [Citations](#citations) + +## Introduction + +[S2ANet](https://arxiv.org/pdf/2008.09397.pdf) is used to detect rotating frame's model, required use of PaddlePaddle 2.1.1(can be installed using PIP) or proper [develop version](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release). + + +## Prepare Data + +### DOTA data +[DOTA Dataset] is a dataset of object detection in aerial images, which contains 2806 images with a resolution of 4000x4000 per image. + +| Data version | categories | images | size | instances | annotation method | +|:--------:|:-------:|:---------:|:---------:| :---------:| :------------: | +| v1.0 | 15 | 2806 | 800~4000 | 118282 | OBB + HBB | +| v1.5 | 16 | 2806 | 800~4000 | 400000 | OBB + HBB | + +Note: OBB annotation is an arbitrary quadrilateral; The vertices are arranged in clockwise order. The HBB annotation mode is the outer rectangle of the indicator note example. + +There were 2,806 images in the DOTA dataset, including 1,411 images as a training set, 458 images as an evaluation set, and the remaining 937 images as a test set. + +If you need to cut the image data, please refer to the [DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit). + +After setting `crop_size=1024, stride=824, gap=200` parameters to cut data, there are 15,749 images in the training set, 5,297 images in the evaluation set, and 10,833 images in the test set. + +### Customize Data + +There are two ways to annotate data: + +- The first is a tagging rotating rectangular, can pass rotating rectangular annotation tool [roLabelImg](https://github.com/cgvict/roLabelImg) to describe rotating rectangular box. + +- The second is to mark the quadrilateral, through the script into an external rotating rectangle, so that the obtained mark may have a certain error with the real object frame. + +Then convert the annotation result into coco annotation format, where each `bbox` is in the format of `[x_center, y_center, width, height, angle]`, where the angle is expressed in radians. + +Reference [spinal disk dataset](https://aistudio.baidu.com/aistudio/datasetdetail/85885), we divide dataset into training set (230), the test set (57), data address is: [spine_coco](https://paddledet.bj.bcebos.com/data/spine_coco.tar). The dataset has a small number of images, which can be used to train the S2ANet model quickly. + + +## Start Training + +### 1. Install the rotating frame IOU and calculate the OP + +Rotate box IoU calculate [ext_op](../../ppdet/ext_op) is a reference PaddlePaddle [custom external operator](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/new_custom_op.html). + +To use the rotating frame IOU to calculate the OP, the following conditions must be met: +- PaddlePaddle >= 2.1.1 +- GCC == 8.2 + +Docker images are recommended[paddle:2.1.1-gpu-cuda10.1-cudnn7](registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.1-cudnn7)。 + +Run the following command to download the image and start the container: +``` +sudo nvidia-docker run -it --name paddle_s2anet -v $PWD:/paddle --network=host registry.baidubce.com/paddlepaddle/paddle:2.1.1-gpu-cuda10.1-cudnn7 /bin/bash +``` + +If the PaddlePaddle are installed in the mirror, go to python3.7 and run the following code to check whether the PaddlePaddle are installed properly: +``` +import paddle +print(paddle.__version__) +paddle.utils.run_check() +``` + +enter `ppdet/ext_op` directory, install: +``` +python3.7 setup.py install +``` + +In Windows, perform the following steps to install it: + +(1)Visual Studio (version required >= Visual Studio 2015 Update3); + +(2)Go to Start --> Visual Studio 2017 --> X64 native Tools command prompt for VS 2017; + +(3)Setting Environment Variables:`set DISTUTILS_USE_SDK=1` + +(4)Enter `PaddleDetection/ppdet/ext_op` directory,use `python3.7 setup.py install` to install。 + +After the installation, test whether the custom OP can compile normally and calculate the results: +``` +cd PaddleDetecetion/ppdet/ext_op +python3.7 test.py +``` + +### 2. Train +**Attention:** +In the configuration file, the learning rate is set based on the eight-card GPU training. If the single-card GPU training is used, set the learning rate to 1/8 of the original value. + +Single GPU Training +```bash +export CUDA_VISIBLE_DEVICES=0 +python3.7 tools/train.py -c configs/dota/s2anet_1x_spine.yml +``` + +Multiple GPUs Training +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python3.7 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/dota/s2anet_1x_spine.yml +``` + +You can use `--eval`to enable train-by-test. + +### 3. Evaluation +```bash +python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams + +# Use a trained model to evaluate +python3.7 tools/eval.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams +``` +**Attention:** +(1) The DOTA dataset is trained together with train and val data as a training set, and the evaluation dataset configuration needs to be customized when evaluating the DOTA dataset. + +(2) Bone dataset is transformed from segmented data. As there is little difference between different types of discs for detection tasks, and the score obtained by S2ANET algorithm is low, the default threshold for evaluation is 0.5, a low mAP is normal. You are advised to view the detection result visually. + +### 4. Prediction +Executing the following command will save the image prediction results to the `output` folder. +```bash +python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 +``` +Prediction using models that provide training: +```bash +python3.7 tools/infer.py -c configs/dota/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3 +``` + +### 5. DOTA Data evaluation +Execute the following command, will save each image prediction result in `output` folder txt text with the same folder name. +``` +python3.7 tools/infer.py -c configs/dota/s2anet_alignconv_2x_dota.yml -o weights=./weights/s2anet_alignconv_2x_dota.pdparams --infer_dir=dota_test_images --draw_threshold=0.05 --save_txt=True --output_dir=output +``` +Please refer to [DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit) generate assessment files, Assessment file format, please refer to [DOTA Test](http://captain.whu.edu.cn/DOTAweb/tasks.html), and generate the zip file, each class a txt file, every row in the txt file format for: `image_id score x1 y1 x2 y2 x3 y3 x4 y4` You can also reference the `dataset/dota_coco/dota_generate_test_result.py` script to generate an evaluation file and submit it to the server. + +## Model Library + +### S2ANet Model + +| Model | Conv Type | mAP | Model Download | Configuration File | +|:-----------:|:----------:|:--------:| :----------:| :---------: | +| S2ANet | Conv | 71.42 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_conv_2x_dota.yml) | +| S2ANet | AlignConv | 74.0 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/s2anet_alignconv_2x_dota.yml) | + +**Attention:** `multiclass_nms` is used here, which is slightly different from the original author's use of NMS. + + +## Predict Deployment + +The inputs of the `multiclass_nms` operator in Paddle support quadrilateral inputs, so deployment can be done without relying on the rotating frame IOU operator. + +Please refer to the deployment tutorial[Predict deployment](../../deploy/README_en.md) + +**Attention:** The `is_training` parameter was added to the configuration file because the `paddle.Detach` function would cause the size error of the exported model when it went quiet, and the exported model would need to be set to `False` to predict deployment + +## Citations +``` +@article{han2021align, + author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}}, + journal={IEEE Transactions on Geoscience and Remote Sensing}, + title={Align Deep Features for Oriented Object Detection}, + year={2021}, + pages={1-11}, + doi={10.1109/TGRS.2021.3062048}} + +@inproceedings{xia2018dota, + title={DOTA: A large-scale dataset for object detection in aerial images}, + author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei}, + booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, + pages={3974--3983}, + year={2018} +} +``` diff --git a/configs/face_detection/README_en.md b/configs/face_detection/README_en.md new file mode 100644 index 000000000..f3a51bf20 --- /dev/null +++ b/configs/face_detection/README_en.md @@ -0,0 +1,123 @@ +# Face Detection Model + +## Introduction +`face_detection` High efficiency, high speed face detection solutions, including the most advanced models and classic models. + +![](../../docs/images/12_Group_Group_12_Group_Group_12_935.jpg) + +## Model Library + +#### A mAP on the WIDERFACE dataset + +| Network structure | size | images/GPUs | Learning rate strategy | Easy/Medium/Hard Set | Prediction delay(SD855)| Model size(MB) | Download | Configuration File | +|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:| +| BlazeFace | 640 | 8 | 1000e | 0.885 / 0.855 / 0.731 | - | 0.472 |[link](https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/develop/configs/face_detection/blazeface_1000e.yml) | +| BlazeFace-FPN-SSH | 640 | 8 | 1000e | 0.907 / 0.883 / 0.793 | - | 0.479 |[link](https://paddledet.bj.bcebos.com/models/blazeface_fpn_ssh_1000e.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/release/develop/configs/face_detection/blazeface_fpn_ssh_1000e.yml) | + +**Attention:** +- We use a multi-scale evaluation strategy to get the mAP in `Easy/Medium/Hard Set`. Please refer to the [evaluation on the WIDER FACE dataset](#Evaluated-on-the-WIDER-FACE-Dataset) for details. + +## Quick Start + +### Data preparation +We use [WIDER-FACE dataset](http://shuoyang1213.me/WIDERFACE/) for training and model tests, the official web site provides detailed data is introduced. +- WIDER-Face data source: +- Load a dataset of type `wider_face` using the following directory structure: + ``` + dataset/wider_face/ + ├── wider_face_split + │ ├── wider_face_train_bbx_gt.txt + │ ├── wider_face_val_bbx_gt.txt + ├── WIDER_train + │ ├── images + │ │ ├── 0--Parade + │ │ │ ├── 0_Parade_marchingband_1_100.jpg + │ │ │ ├── 0_Parade_marchingband_1_381.jpg + │ │ │ │ ... + │ │ ├── 10--People_Marching + │ │ │ ... + ├── WIDER_val + │ ├── images + │ │ ├── 0--Parade + │ │ │ ├── 0_Parade_marchingband_1_1004.jpg + │ │ │ ├── 0_Parade_marchingband_1_1045.jpg + │ │ │ │ ... + │ │ ├── 10--People_Marching + │ │ │ ... + ``` + +- Manually download the dataset: +To download the WIDER-FACE dataset, run the following command: +``` +cd dataset/wider_face && ./download_wider_face.sh +``` + +### Parameter configuration +The configuration of the base model can be referenced to `configs/face_detection/_base_/blazeface.yml`; +Improved model to add FPN and SSH neck structure, configuration files can be referenced to `configs/face_detection/_base_/blazeface_fpn.yml`, You can configure FPN and SSH as required +```yaml +BlazeNet: + blaze_filters: [[24, 24], [24, 24], [24, 48, 2], [48, 48], [48, 48]] + double_blaze_filters: [[48, 24, 96, 2], [96, 24, 96], [96, 24, 96], + [96, 24, 96, 2], [96, 24, 96], [96, 24, 96]] + act: hard_swish #Configure Blaze Block activation function in Backbone. The basic model is Relu. hard_swish is needed to add FPN and SSH + +BlazeNeck: + neck_type : fpn_ssh #only_fpn, only_ssh and fpn_ssh + in_channel: [96,96] +``` + + + +### Training and Evaluation +The training process and evaluation process methods are consistent with other algorithms, please refer to [GETTING_STARTED_cn.md](../../docs/tutorials/GETTING_STARTED_cn.md)。 +**Attention:** Face detection models currently do not support training and evaluation. + +#### Evaluated on the WIDER-FACE Dataset +- Step 1: Evaluate and generate a result file: +```shell +python -u tools/eval.py -c configs/face_detection/blazeface_1000e.yml \ + -o weights=output/blazeface_1000e/model_final \ + multi_scale=True +``` +Set `multi_scale=True` for multi-scale evaluation. After evaluation, test results in TXT format will be generated in `output/pred`. + +- Step 2: Download the official evaluation script and Ground Truth file: +``` +wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip +unzip eval_tools.zip && rm -f eval_tools.zip +``` + +- Step 3: Start the evaluation + +Method 1: Python evaluation: +``` +git clone https://github.com/wondervictor/WiderFace-Evaluation.git +cd WiderFace-Evaluation +# compile +python3 setup.py build_ext --inplace +# Begin to assess +python3 evaluation.py -p /path/to/PaddleDetection/output/pred -g /path/to/eval_tools/ground_truth +``` + +Method 2: MatLab evaluation: +``` +# Change the name of save result path and draw curve in `eval_tools/wider_eval.m`: +pred_dir = './pred'; +legend_name = 'Paddle-BlazeFace'; + +`wider_eval.m` is the main implementation of the evaluation module. Run the following command: +matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;" +``` + + +## Citations + +``` +@article{bazarevsky2019blazeface, + title={BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs}, + author={Valentin Bazarevsky and Yury Kartynnik and Andrey Vakunov and Karthik Raveendran and Matthias Grundmann}, + year={2019}, + eprint={1907.05047}, + archivePrefix={arXiv}, +``` diff --git a/configs/rcnn_enhance/README_en.md b/configs/rcnn_enhance/README_en.md new file mode 100644 index 000000000..2f0bdc4c4 --- /dev/null +++ b/configs/rcnn_enhance/README_en.md @@ -0,0 +1,12 @@ +## Practical Server Side Detection + +### Introduction + +* In recent years, the object detection task in image has been widely concerned by academia and industry. ResNet50vd pretraining model based on SSLD distillation program training in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) (Top1 on ImageNet1k verification set) Acc is 82.39%), combined with the rich operator of PaddleDetection, PaddlePaddle provides a practical server side detection scheme PSS-DET(Practical Server Side Detection). Based on COCO2017 object detection dataset, V100 single gpu prediction speed is 61FPS, COCO mAP can reach 41.2%. + + +### Model library + +| Backbone | Network type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Mask AP | Download | Configuration File | +| :-------------------- | :----------: | :----------------------: | :--------------------: | :-----------------: | :----: | :-----: | :---------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | +| ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.5 | - | [link](https://paddledet.bj.bcebos.com/models/faster_rcnn_enhance_3x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml) | diff --git a/configs/slim/README_en.md b/configs/slim/README_en.md new file mode 100755 index 000000000..8d2b39c91 --- /dev/null +++ b/configs/slim/README_en.md @@ -0,0 +1,166 @@ +# Model Compression + +In PaddleDetection, a complete tutorial and benchmarks for model compression based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) are provided. Currently supported methods: + +- [prunning](prune) +- [quantitative](quant) +- [distillation](distill) +- [The joint strategy](extensions) + +It is recommended that you use a combination of prunning and distillation training, or use prunning and quantization for test model compression. The following takes YOLOv3 as an example to carry out cutting, distillation and quantization experiments. + +## Experimental Environment + +- Python 3.7+ +- PaddlePaddle >= 2.1.0 +- PaddleSlim >= 2.1.0 +- CUDA 10.1+ +- cuDNN >=7.6.5 + +**Version Dependency between PaddleDetection, Paddle and PaddleSlim Version** +| PaddleDetection Version | PaddlePaddle Version | PaddleSlim Version | Note | +| :---------------------: | :------------------: | :----------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| release/2.1 | >= 2.1.0 | 2.1 | Quantitative model exports rely on the latest Paddle Develop branch, available in[PaddlePaddle Daily version](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev) | +| release/2.0 | >= 2.0.1 | 2.0 | Quantization depends on Paddle 2.1 and PaddleSlim 2.1 | + + +#### Install PaddleSlim +- Method 1: Install it directly: +``` +pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple +``` +- Method 2: Compile and install: +``` +git clone https://github.com/PaddlePaddle/PaddleSlim.git +cd PaddleSlim +python setup.py install +``` + +## Quick Start + +### Train + +```shell +python tools/train.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} +``` + +- `-c`: Specify the model configuration file. +- `--slim_config`: Specify the compression policy profile. + + +### Evaluation + +```shell +python tools/eval.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final +``` + +- `-c`: Specify the model configuration file. +- `--slim_config`: Specify the compression policy profile. +- `-o weights`: Specifies the path of the model trained by the compression algorithm. + +### Test + +```shell +python tools/infer.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} \ + -o weights=output/{SLIM_CONFIG}/model_final + --infer_img={IMAGE_PATH} +``` + +- `-c`: Specify the model configuration file. +- `--slim_config`: Specify the compression policy profile. +- `-o weights`: Specifies the path of the model trained by the compression algorithm. +- `--infer_img`: Specifies the test image path. + + +## Full Chain Deployment + +### the model is derived from moving to static + +```shell +python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final +``` + +- `-c`: Specify the model configuration file. +- `--slim_config`: Specify the compression policy profile. +- `-o weights`: Specifies the path of the model trained by the compression algorithm. + +### prediction and deployment + +- Paddle-Inference Prediction: + - [Python Deployment](../../deploy/python/README.md) + - [C++ Deployment](../../deploy/cpp/README.md) + - [TensorRT Predictive Deployment Tutorial](../../deploy/TENSOR_RT.md) +- Server deployment: Used[PaddleServing](../../deploy/serving/README.md) +- Mobile deployment: Use[Paddle-Lite](../../deploy/lite/README.md) Deploy it on the mobile terminal. + +## Benchmark + +### Prunning + +#### Pascal VOC Benchmark + +| Model | Compression Strategy | GFLOPs | Model Volume(MB) | Input Size | Predict Delay(SD855) | Box AP | Download | Model Configuration File | Compression Algorithm Configuration File | +| :----------------: | :-------------------: | :------------: | :--------------: | :--------: | :------------------: | :--------: | :------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------: | +| YOLOv3-MobileNetV1 | baseline | 24.13 | 93 | 608 | 332.0ms | 75.1 | [link](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) | - | +| YOLOv3-MobileNetV1 | 剪裁-l1_norm(sensity) | 15.78(-34.49%) | 66(-29%) | 608 | - | 78.4(+3.3) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_voc_prune_l1_norm.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) | [slim configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/prune/yolov3_prune_l1_norm.yml) | + +#### COCO Benchmark +| Mode | Compression Strategy | GFLOPs | Model Volume(MB) | Input Size | Predict Delay(SD855) | Box AP | Download | Model Configuration File | Compression Algorithm Configuration File | +| :-----------------------: | :------------------: | :----: | :--------------: | :--------: | :------------------: | :----: | :---------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------: | +| PP-YOLO-MobileNetV3_large | baseline | -- | 18.5 | 608 | 25.1ms | 23.2 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | - | +| PP-YOLO-MobileNetV3_large | 剪裁-FPGM | -37% | 12.6 | 608 | - | 22.3 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_prune_fpgm.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | [slim configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/prune/ppyolo_mbv3_large_prune_fpgm.yml) | +| YOLOv3-DarkNet53 | baseline | -- | 238.2 | 608 | - | 39.0 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) | - | +| YOLOv3-DarkNet53 | 剪裁-FPGM | -24% | - | 608 | - | 37.6 | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_prune_fpgm.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) | [slim configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/prune/yolov3_darknet_prune_fpgm.yml) | +| PP-YOLO_R50vd | baseline | -- | 183.3 | 608 | - | 44.8 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | +| PP-YOLO_R50vd | 剪裁-FPGM | -35% | - | 608 | - | 42.1 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_prune_fpgm.pdparams) | [configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [slim configuration file](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/prune/ppyolo_r50vd_prune_fpgm.yml) | + +Description: +- Currently, all models except RCNN series models are supported. +- The SD855 predicts the delay for deployment using Paddle Lite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay. + +### Quantitative + +#### COCO Benchmark + +| Model | Compression Strategy | Input Size | Model Volume(MB) | Prediction Delay(V100) | Prediction Delay(SD855) | Box AP | Download | Download of Inference Model | Model Configuration File | Compression Algorithm Configuration File | +| ------------------------- | -------------------------- | ----------- | :--------------: | :--------------------: | :---------------------: | :-------------------: | :-----------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | +| PP-YOLOv2_R50vd | baseline | 640 | 208.6 | 19.1ms | -- | 49.1 | [link](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_365e_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | +| PP-YOLOv2_R50vd | PACT Online quantitative | 640 | -- | 17.3ms | -- | 48.1 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolov2_r50vd_dcn_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/ppyolov2_r50vd_dcn_qat.yml) | +| PP-YOLO_R50vd | baseline | 608 | 183.3 | 17.4ms | -- | 44.8 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_dcn_1x_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | - | +| PP-YOLO_R50vd | PACT Online quantitative | 608 | 67.3 | 13.8ms | -- | 44.3 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_qat_pact.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_r50vd_qat_pact.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/ppyolo_r50vd_qat_pact.yml) | +| PP-YOLO-MobileNetV3_large | baseline | 320 | 18.5 | 2.7ms | 27.9ms | 23.2 | [link](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | - | +| PP-YOLO-MobileNetV3_large | Common Online quantitative | 320 | 5.6 | -- | 25.1ms | 24.3 | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ppyolo_mbv3_large_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/ppyolo_mbv3_large_coco.yml) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/ppyolo_mbv3_large_qat.yml) | +| YOLOv3-MobileNetV1 | baseline | 608 | 94.2 | 8.9ms | 332ms | 29.4 | [link](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_270e_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | - | +| YOLOv3-MobileNetV1 | Common Online quantitative | 608 | 25.4 | 6.6ms | 248ms | 30.5 | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slim Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/yolov3_mobilenet_v1_qat.yml) | +| YOLOv3-MobileNetV3 | baseline | 608 | 90.3 | 9.4ms | 367.2ms | 31.4 | [link](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_large_270e_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) | - | +| YOLOv3-MobileNetV3 | PACT Online quantitative | 608 | 24.4 | 8.0ms | 280.0ms | 31.1 | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_coco_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_coco_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) | [slim Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/yolov3_mobilenet_v3_qat.yml) | +| YOLOv3-DarkNet53 | baseline | 608 | 238.2 | 16.0ms | -- | 39.0 | [link](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet53_270e_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) | - | +| YOLOv3-DarkNet53 | Common Online quantitative | 608 | 78.8 | 12.4ms | -- | 38.8 | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_coco_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_coco_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) | [slim Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/yolov3_darknet_qat.yml) | +| SSD-MobileNet_v1 | baseline | 300 | 22.5 | 4.4ms | 26.6ms | 73.8 | [link](https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ssd_mobilenet_v1_300_120e_voc.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) | - | +| SSD-MobileNet_v1 | Common Online quantitative | 300 | 7.1 | -- | 21.5ms | 72.9 | [link](https://paddledet.bj.bcebos.com/models/slim/ssd_mobilenet_v1_300_voc_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/ssd_mobilenet_v1_300_voc_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) | [slim Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/ssd_mobilenet_v1_qat.yml) | +| Mask-ResNet50-FPN | baseline | (800, 1333) | 174.1 | 359.5ms | -- | 39.2/35.6 | [link](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_coco.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) | - | +| Mask-ResNet50-FPN | Common Online quantitative | (800, 1333) | -- | -- | -- | 39.7(+0.5)/35.9(+0.3) | [link](https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_qat.pdparams) | [link](https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_qat.tar) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) | [slim Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml) | + +Description: +- The above V100 prediction delay non-quantified model is tested by TensorRT FP32, and the quantified model is tested by TensorRT INT8, and both of them include NMS time. +- The SD855 predicts the delay for deployment using PaddleLite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay. + +### Distillation + +#### COCO Benchmark + +| Model | Compression Strategy | Input Size | Box AP | Download | Model Configuration File | Compression Strategy Configuration File | +| ------------------ | -------------------- | ---------- | :--------: | :-------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | +| YOLOv3-MobileNetV1 | baseline | 608 | 29.4 | [link](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | - | +| YOLOv3-MobileNetV1 | Distillation | 608 | 31.0(+1.6) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill.pdparams) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slimConfiguration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml) | + +- Please refer to the specific distillation method[Distillation Policy Document](distill/README.md) + +### Distillation Prunning Combined Strategy + +#### COCO Benchmark + +| Model | Compression Strategy | Input Size | GFLOPs | Model Volume(MB) | Prediction Delay(SD855) | Box AP | Download | Model Configuration File | Compression Algorithm Configuration File | +| ------------------ | ------------------------ | ---------- | :----------: | :--------------: | :---------------------: | :--------: | :-------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------: | +| YOLOv3-MobileNetV1 | baseline | 608 | 24.65 | 94.2 | 332.0ms | 29.4 | [link](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | - | +| YOLOv3-MobileNetV1 | Distillation + Tailoring | 608 | 7.54(-69.4%) | 30.9(-67.2%) | 166.1ms | 28.4(-1.0) | [link](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill_prune.pdparams) | [Configuration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slimConfiguration File ](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml) | diff --git a/configs/ttfnet/README_en.md b/configs/ttfnet/README_en.md new file mode 100644 index 000000000..77af509a8 --- /dev/null +++ b/configs/ttfnet/README_en.md @@ -0,0 +1,69 @@ +# 1. TTFNet + +## Introduction + +TTFNet is a network used for real-time object detection and friendly to training time. It improves the slow convergence speed of CenterNet and proposes a new method to generate training samples using Gaussian kernel, which effectively eliminates the fuzziness existing in Anchor Free head. At the same time, the simple and lightweight network structure is also easy to expand the task. + + +**Characteristics:** + +The structure is simple, requiring only two heads to detect target position and size, and eliminating time-consuming post-processing operations +The training time is short. Based on DarkNet53 backbone network, V100 8 cards only need 2 hours of training to achieve better model effect + +## Model Zoo + +| Backbone | Network type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Download | Configuration File | +| :-------- | :----------- | :----------------------: | :--------------------: | :-----------------: | :----: | :------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | +| DarkNet53 | TTFNet | 12 | 1x | ---- | 33.5 | [link](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) | + + + + + +# 2. PAFNet + +## Introduction + +PAFNet (Paddle Anchor Free) is an optimized model of PaddleDetection based on TTF Net, whose accuracy reaches the SOTA level in the Anchor Free field, and meanwhile produces mobile lightweight model PAFNet-Lite + +PAFNet series models optimize TTFNet model from the following aspects: + +- [CutMix](https://arxiv.org/abs/1905.04899) +- Better backbone network: ResNet50vd-DCN +- Larger training batch size: 8 GPUs, each GPU batch size=18 +- Synchronized Batch Normalization +- [Deformable Convolution](https://arxiv.org/abs/1703.06211) +- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) +- Better pretraining model + + +## Model library + +| Backbone | Net type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Download | Configuration File | +| :--------- | :------- | :----------------------: | :--------------------: | :-----------------: | :----: | :---------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | +| ResNet50vd | PAFNet | 18 | 10x | ---- | 39.8 | [link](https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_10x_coco.yml) | + + + +### PAFNet-Lite + +| Backbone | Net type | Number of images per GPU | Learning rate strategy | Box AP | kirin 990 delay(ms) | volume(M) | Download | Configuration File | +| :---------- | :---------- | :----------------------: | :--------------------: | :----: | :-------------------: | :---------: | :---------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: | +| MobileNetv3 | PAFNet-Lite | 12 | 20x | 23.9 | 26.00 | 14 | [link](https://paddledet.bj.bcebos.com/models/pafnet_lite_mobilenet_v3_20x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml) | + +**Attention:** Due to the overall upgrade of the dynamic graph framework, the weighting model published by PaddleDetection of PAF Net needs to be evaluated with a --bias field, for example + +```bash +# Published weights using Paddle Detection +CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/pafnet_10x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams --bias +``` + +## Citations +``` +@article{liu2019training, + title = {Training-Time-Friendly Network for Real-Time Object Detection}, + author = {Zili Liu, Tu Zheng, Guodong Xu, Zheng Yang, Haifeng Liu, Deng Cai}, + journal = {arXiv preprint arXiv:1909.00700}, + year = {2019} +} +``` diff --git a/deploy/BENCHMARK_INFER_en.md b/deploy/BENCHMARK_INFER_en.md new file mode 100644 index 000000000..b0b92b6cc --- /dev/null +++ b/deploy/BENCHMARK_INFER_en.md @@ -0,0 +1,61 @@ +# Inference Benchmark + +## 一、Prepare the Environment +- 1、Test Environment: + - CUDA 10.1 + - CUDNN 7.6 + - TensorRT-6.0.1 + - PaddlePaddle v2.0.1 + - The GPUS are Tesla V100 and GTX 1080 Ti and Jetson AGX Xavier +- 2、Test Method: + - In order to compare the inference speed of different models, the input shape is 3x640x640, use `demo/000000014439_640x640.jpg`. + - Batch_size=1 + - Delete the warmup time of the first 100 rounds and test the average time of 100 rounds in ms/image, including network calculation time and data copy time to CPU. + - Using Fluid C++ prediction engine: including Fluid C++ prediction, Fluid TensorRT prediction, the following test Float32 (FP32) and Float16 (FP16) inference speed. + +**Attention:** For TensorRT, please refer to the [TENSOR tutorial](TENSOR_RT.md) for the difference between fixed and dynamic dimensions. Due to the imperfect support for the two-stage model under fixed size, dynamic size test was adopted for the Faster RCNN model. Fixed size and dynamic size do not support exactly the same OP for fusion, so the performance of the same model tested at fixed size and dynamic size may differ slightly. + + +## 二、Inferring Speed + +### 1、Linux System +#### (1)Tesla V100 + +| Model | backbone | Fixed size or not | The net size | paddle_inference | trt_fp32 | trt_fp16 | +| --------------- | ------------- | ----------------- | ------------ | ---------------- | -------- | -------- | +| Faster RCNN FPN | ResNet50 | no | 640x640 | 27.99 | 26.15 | 21.92 | +| Faster RCNN FPN | ResNet50 | no | 800x1312 | 32.49 | 25.54 | 21.70 | +| YOLOv3 | Mobilenet\_v1 | yes | 608x608 | 9.74 | 8.61 | 6.28 | +| YOLOv3 | Darknet53 | yes | 608x608 | 17.84 | 15.43 | 9.86 | +| PPYOLO | ResNet50 | yes | 608x608 | 20.77 | 18.40 | 13.53 | +| SSD | Mobilenet\_v1 | yes | 300x300 | 5.17 | 4.43 | 4.29 | +| TTFNet | Darknet53 | yes | 512x512 | 10.14 | 8.71 | 5.55 | +| FCOS | ResNet50 | yes | 640x640 | 35.47 | 35.02 | 34.24 | + + +#### (2)Jetson AGX Xavier + +| Model | backbone | Fixed size or not | The net size | paddle_inference | trt_fp32 | trt_fp16 | +| --------------- | ------------- | ----------------- | ------------ | ---------------- | -------- | -------- | +| Faster RCNN FPN | ResNet50 | no | 640x640 | 169.45 | 158.92 | 119.25 | +| Faster RCNN FPN | ResNet50 | no | 800x1312 | 228.07 | 156.39 | 117.03 | +| YOLOv3 | Mobilenet\_v1 | yes | 608x608 | 48.76 | 43.83 | 18.41 | +| YOLOv3 | Darknet53 | yes | 608x608 | 121.61 | 110.30 | 42.38 | +| PPYOLO | ResNet50 | yes | 608x608 | 111.80 | 99.40 | 48.05 | +| SSD | Mobilenet\_v1 | yes | 300x300 | 10.52 | 8.84 | 8.77 | +| TTFNet | Darknet53 | yes | 512x512 | 73.77 | 64.03 | 31.46 | +| FCOS | ResNet50 | yes | 640x640 | 217.11 | 214.38 | 205.78 | + +### 2、Windows System +#### (1)GTX 1080Ti + +| Model | backbone | Fixed size or not | The net size | paddle_inference | trt_fp32 | trt_fp16 | +| --------------- | ------------- | ----------------- | ------------ | ---------------- | -------- | -------- | +| Faster RCNN FPN | ResNet50 | no | 640x640 | 50.74 | 57.17 | 62.08 | +| Faster RCNN FPN | ResNet50 | no | 800x1312 | 50.31 | 57.61 | 62.05 | +| YOLOv3 | Mobilenet\_v1 | yes | 608x608 | 14.51 | 11.23 | 11.13 | +| YOLOv3 | Darknet53 | yes | 608x608 | 30.26 | 23.92 | 24.02 | +| PPYOLO | ResNet50 | yes | 608x608 | 38.06 | 31.40 | 31.94 | +| SSD | Mobilenet\_v1 | yes | 300x300 | 16.47 | 13.87 | 13.76 | +| TTFNet | Darknet53 | yes | 512x512 | 21.83 | 17.14 | 17.09 | +| FCOS | ResNet50 | yes | 640x640 | 71.88 | 69.93 | 69.52 | diff --git a/deploy/EXPORT_MODEL_en.md b/deploy/EXPORT_MODEL_en.md new file mode 100644 index 000000000..d2828edeb --- /dev/null +++ b/deploy/EXPORT_MODEL_en.md @@ -0,0 +1,53 @@ +# PaddleDetection Model Export Tutorial + +## 一、Model Export +This section describes how to use the `tools/export_model.py` script to export models. +### Export model input and output description +- Input variables and input shapes are as follows: + + | Input Name | Input Shape | Meaning | + | :----------: | --------------- | ------------------------------------------------------------------------------------------------------------------------- | + | image | [None, 3, H, W] | Enter the network image. None indicates the Batch dimension. If the input image size is variable length, H and W are None | + | im_shape | [None, 2] | The size of the image after resize is expressed as H,W, and None represents the Batch dimension | + | scale_factor | [None, 2] | The input image size is larger than the real image size, denoted byscale_y, scale_x | + +**Attention**For details about the preprocessing method, see the Test Reader section in the configuration file. + + +-The output of the dynamic and static derived model in Paddle Detection is unified as follows: + + - bbox, the output of NMS, in the shape of [N, 6], where N is the number of prediction boxes, and 6 is [class_id, score, x1, y1, x2, y2]. + - bbox\_num, Each picture corresponds to the number of prediction boxes. For example, batch size is 2 and the output is [N1, N2], indicating that the first picture contains N1 prediction boxes and the second picture contains N2 prediction boxes, and the total number of prediction boxes is the same as the first dimension N output by NMS + - mask, If the network contains a mask, the mask branch is printed + +**Attention**The model-to-static export does not support cases where numpy operations are included in the model structure. + + +### 2、Start Parameters + +| FLAG | USE | DEFAULT | NOTE | +| :----------: | :-----------------------------: | :------------------: | :-------------------------------------------------------------------: | +| -c | Specifying a configuration file | None | | +| --output_dir | Model save path | `./output_inference` | The model is saved in the `output/default_file_name/` path by default | + +### 3、Example + +Using the trained model for trial use, the script is as follows: + +```bash +# The YOLOv3 model is exported +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \ + -o weights=weights/yolov3_darknet53_270e_coco.pdparams +``` +The prediction model will be exported to the `inference_model/yolov3_darknet53_270e_coco` directory. `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel` respectively. + + +### 4、Sets the input size of the export model +When using Fluid TensorRT for prediction, since <= TensorRT 5.1 only supports fixed-length input, the image size of the `data` layer of the saved model needs to be the same as the actual input image size. Fluid C++ prediction engine does not have this limitation. Setting `image_shape` in Test Reader changes the size of the input image in the saved model. The following is an example: + + +```bash +#Export the YOLOv3 model with the input 3x640x640 +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \ + -o weights=weights/yolov3_darknet53_270e_coco.pdparams TestReader.inputs_def.image_shape=[3,640,640] +``` diff --git a/deploy/EXPORT_ONNX_MODEL_en.md b/deploy/EXPORT_ONNX_MODEL_en.md new file mode 100644 index 000000000..6f0a16664 --- /dev/null +++ b/deploy/EXPORT_ONNX_MODEL_en.md @@ -0,0 +1,51 @@ +# PaddleDetection Model Export as ONNX Format Tutorial + +PaddleDetection Model support is saved in ONNX format and the list of current test support is as follows +| Model | OP Version | NOTE | +| :---- | :----- | :--- | +| YOLOv3 | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | +| PPYOLO | 11 | Only batch=1 inferring is supported. A MatrixNMS will be converted to an NMS with slightly different precision; Model export needs fixed shape | +| PPYOLOv2 | 11 | Only batch=1 inferring is supported. MatrixNMS will be converted to NMS with slightly different precision; Model export needs fixed shape | +| PPYOLO-Tiny | 11 | Only batch=1 inferring is supported. Model export needs fixed shape | +| FCOS | 11 |Only batch=1 inferring is supported | +| PAFNet | 11 |- | +| TTFNet | 11 |-| +| SSD | 11 |Only batch=1 inferring is supported | + +The function of saving ONNX is provided by [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX). If there is feedback on related problems during conversion, Communicate with engineers in Paddle2ONNX's Github project via [ISSUE](https://github.com/PaddlePaddle/Paddle2ONNX/issues). + +## Export Tutorial + +### Step 1. Export the Paddle deployment model +Export procedure reference document[Tutorial on PaddleDetection deployment model export](./EXPORT_MODEL_en.md), take YOLOv3 of COCO dataset training as an example +``` +cd PaddleDetection +python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml \ + -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams \ + TestReader.inputs_def.image_shape=[3,608,608] \ + --output_dir inference_model +``` +The derived models were saved in `inference_model/yolov3_darknet53_270e_coco/`, with the structure as follows +``` +yolov3_darknet + ├── infer_cfg.yml # Model configuration file information + ├── model.pdiparams # Static diagram model parameters + ├── model.pdiparams.info # Parameter Information is not required + └── model.pdmodel # Static diagram model file +``` +> check`TestReader.inputs_def.image_shape`, For YOLO series models, specify this parameter when exporting; otherwise, the conversion fails + +### Step 2. Convert the deployment model to ONNX format +Install Paddle2ONNX (version 0.6 or higher) +``` +pip install paddle2onnx +``` +Use the following command to convert +``` +paddle2onnx --model_dir inference_model/yolov3_darknet53_270e_coco \ + --model_filename model.pdmodel \ + --params_filename model.pdiparams \ + --opset_version 11 \ + --save_file yolov3.onnx +``` +The transformed model is under the current path`yolov3.onnx` diff --git a/deploy/README_en.md b/deploy/README_en.md new file mode 100644 index 000000000..ef7148f0b --- /dev/null +++ b/deploy/README_en.md @@ -0,0 +1,73 @@ +# PaddleDetection Predict deployment + +PaddleDetection provides multiple deployment forms of Paddle Inference, Paddle Serving and Paddle-Lite, supports multiple platforms such as server, mobile and embedded, and provides a complete Python and C++ deployment solution + +## PaddleDetection This section describes the supported deployment modes +| formalization | language | Tutorial | Equipment/Platform | +| ---------------- | -------- | ----------- | ------------------------- | +| Paddle Inference | Python | Has perfect | Linux(ARM\X86)、Windows | +| Paddle Inference | C++ | Has perfect | Linux(ARM\X86)、Windows | +| Paddle Serving | Python | Has perfect | Linux(ARM\X86)、Windows | +| Paddle-Lite | C++ | Has perfect | Android、IOS、FPGA、RK... | + + +## 1.Paddle Inference Deployment + +### 1.1 The export model + +Use the `tools/export_model.py` script to export the model and the configuration file used during deployment. The configuration file name is `infer_cfg.yml`. The model export script is as follows + +```bash +# The YOLOv3 model is derived +python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams +``` +The prediction model will be exported to the `output_inference/yolov3_mobilenet_v1_roadsign` directory `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`. For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](EXPORT_MODEL_sh.md). + +### 1.2 Use Paddle Inference to Make Predictions +* Python deployment supports `CPU`, `GPU` and `XPU` environments, Windows, Linux, and NV Jetson embedded devices. Reference Documentation [Python Deployment](python/README.md) +* C++ deployment supports `CPU`, `GPU` and `XPU` environments, Windows and Linux systems, and NV Jetson embedded devices. Reference documentation [C++ deployment](cpp/README.md) +* PaddleDetection supports TensorRT acceleration. Please refer to the documentation for [TensorRT Predictive Deployment Tutorial](TENSOR_RT.md) + +**Attention:** Paddle prediction library version requires >=2.1, and batch_size>1 only supports YOLOv3 and PP-YOLO. + +## 2.PaddleServing Deployment +### 2.1 Export model + +If you want to export the model in `PaddleServing` format, set `export_serving_model=True`: +```buildoutcfg +python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=output/yolov3_mobilenet_v1_roadsign/best_model.pdparams --export_serving_model=True +``` +The prediction model will be exported to the `output_inference/yolov3_darknet53_270e_coco` directory `infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`, `serving_client/` and `serving_server/` folder. + +For details on model export, please refer to the documentation [Tutorial on Paddle Detection MODEL EXPORT](EXPORT_MODEL_en.md). + +### 2.2 Predictions are made using Paddle Serving +* [Install PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md#installation) +* [Use PaddleServing](./serving/README.md) + + +## 3. PaddleLite Deployment +- [Deploy the PaddleDetection model using PaddleLite](./lite/README.md) +- For details, please refer to [Paddle-Lite-Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo) deployment. For more information, please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) + + +## 4. Benchmark Test +- Using the exported model, run the Benchmark batch test script: +```shell +sh deploy/benchmark/benchmark.sh {model_dir} {model_name} +``` +**Attention** If it is a quantitative model, please use the `deploy/benchmark/benchmark_quant.sh` script. +- Export the test result log to Excel: +``` +python deploy/benchmark/log_parser_excel.py --log_path=./output_pipeline --output_name=benchmark_excel.xlsx +``` + +## 5. FAQ +- 1、Can `Paddle 1.8.4` trained models be deployed with `Paddle2.0`? + Paddle 2.0 is compatible with Paddle 1.8.4, so it is ok. However, some models (such as SOLOv2) use the new OP in Paddle 2.0, which is not allowed. + +- 2、When compiling for Windows, the prediction library is compiled with VS2015, will it be a problem to choose VS2017 or VS2019? + For compatibility issues with VS, please refer to: [C++ Visual Studio 2015, 2017 and 2019 binary compatibility](https://docs.microsoft.com/zh-cn/cpp/porting/binary-compat-2015-2017?view=msvc-160) + +- 3、Does cuDNN 8.0.4 continuously predict memory leaks? + QA tests show that cuDNN 8 series have memory leakage problems in continuous prediction, and cuDNN 8 performance is worse than cuDNN7. CUDA + cuDNN7.6.4 is recommended for deployment. diff --git a/docs/CHANGELOG_en.md b/docs/CHANGELOG_en.md new file mode 100644 index 000000000..dd349d08e --- /dev/null +++ b/docs/CHANGELOG_en.md @@ -0,0 +1,253 @@ +# Version Update Information + +## Last Version Information + +### 2.2(08.10/2021) + +- Model richness: + - Publish the Transformer test model: DETR, Deformable DETR, Sparse RCNN + - Key point test new Dark model, release Dark HRNet model + - Publish the MPII dataset HRNet keypoint detection model + - Release head and vehicle tracking vertical model + +- Model optimization: + - AlignConv optimization model was released by S2ANet, and DOTA dataset mAP was optimized to 74.0 + +- Predict deployment + - Mainstream models support batch size>1 predictive deployment, including YOLOv3, PP-YOLO, Faster RCNN, SSD, TTFNet, FCOS + - New addition of target tracking models (JDE, Fair Mot, Deep Sort) Python side prediction deployment support, and support for TensorRT prediction + - FairMot joint key point detection model deployment Python side predictive deployment support + - Added support for key point detection model combined with PP-YOLO prediction deployment + +- Documents: + - New TensorRT version notes to Windows Predictive Deployment documentation + - FAQ documents are updated + +- Problem fixes: + - Fixed PP-YOLO series model training convergence problem + - Fixed the problem of no label data training when batch_size > 1 + + +### 2.1(05.20/2021) +- Model richness enhancement: + - Key point model: HRNet, HigherHRNet + - Publish the multi-target tracking model: DeepSort, FairMot, JDE + +- Basic framework Capabilities: + - Supports training without labels + +- Forecast deployment: + - Paddle Inference YOLOv3 series model support batch_size>1 prediction + - Rotating frame detection S2ANet model prediction deployment is open + - Incremental quantization model benchmark + - Add dynamic graph model and static graph model: Paddle-Lite demo + +- Detection model compression: + - Release PP-YOLO series model compression model + +- Documents: + - Update quick start, forecast deployment and other tutorial documentation + - Added ONNX model export tutorial + - Added the mobile deployment document + + +### 2.0(04.15/2021) + + **Description:** Since version 2.0, dynamic graphs are used as the default version of Paddle Detection, the original `dygraph` directory is switched to the root directory, and the original static graph implementation is moved to the `static` directory. + + - Enhancement of dynamic graph model richness: + - PP-YOLOv2 and PP-YOLO tiny models were published. The accuracy of PP-YOLOv2 COCO Test dataset reached 49.5%, and the prediction speed of V100 reached 68.9 FPS + - Release the rotary frame detection model S2ANet + - Release the two-phase utility model PSS-Det + - Publish the face detection model Blazeface + + - New basic module: + - Added SENet, GhostNet, and Res2Net backbone networks + - Added VisualDL training visualization support + - Added single precision calculation and PR curve drawing function + - The YOLO models support THE NHWC data format + + - Forecast deployment: + - Publish forecast benchmark data for major models + - Adaptive to TensorRT6, support TensorRT dynamic size input, support TensorRT int8 quantitative prediction + - 7 types of models including PP-YOLO, YOLOv3, SSD, TTFNet, FCOS, Faster RCNN are deployed in Python/CPP/TRT prediction on Linux, Windows and NV Jetson platforms + + - Detection model compression: + - Distillation: Added dynamic map distillation support and released YOLOv3-MobileNetV1 distillation model + - Joint strategy: new dynamic graph prunning + distillation joint strategy compression scheme, and release YOLOv3-MobileNetV1 prunning + distillation compression model + - Problem fix: Fixed dynamic graph quantization model export problem + + - Documents: + - New English document of dynamic graph: including homepage document, getting started, quick start, model algorithm, new dataset, etc + - Added both English and Chinese installation documents of dynamic diagrams + - Added configuration file templates and description documents of dynamic graph RCNN series and YOLO series + + +## Historical Version Information + +### 2.0-rc(02.23/2021) + - Enhancement of dynamic graph model richness: + - Optimize networking and training mode of RCNN models, and improve accuracy of RCNN series models (depending on Paddle Develop or version 2.0.1) + - Added support for SSDLite, FCOS, TTFNet, SOLOv2 series models + - Added pedestrian and vehicle vertical object detection models + + - New dynamic graph basic module: + - Added MobileNetV3 and HRNet backbone networks + - Improved roi-align calculation logic for RCNN series models (depending on Paddle Develop or version 2.0.1) + - Added support for Synchronized Batch Norm + - Added support for Modulated Deformable Convolution + + - Forecast deployment: + - Publish dynamic diagrams in python, C++, and Serving deployment solution and documentation. Support Faster RCNN, Mask RCNN, YOLOv3, PPYOLO, SSD, TTFNet, FCOS, SOLOv2 and other models to predict deployment + - Dynamic graph prediction deployment supports TensorRT mode FP32, FP16 inference acceleration + + - Detection model compression: + - Prunning: Added dynamic graph prunning support, and released YOLOv3-MobileNetV1 prunning model + - Quantization: Added quantization support of dynamic graph, and released quantization models of YOLOv3-MobileNetV1 and YOLOv3-MobileNetV3 + + - Documents: + - New Dynamic Diagram tutorial documentation: includes installation instructions, quick start, data preparation, and training/evaluation/prediction process documentation + - New advanced tutorial documentation for dynamic diagrams: includes documentation for model compression and inference deployment + - Added dynamic graph model library documentation + +### v2.0-beta(12.20/2020) + - Dynamic graph support: + - Support for Faster-RCNN, Mask-RCNN, FPN, Cascade Faster/Mask RCNN, YOLOv3 and SSD models, trial version. + - Model upgrade: + - Updated PP-YOLO Mobile-Netv3 large and small models with improved accuracy, and added prunning and distillation models. + - New features: + - Support VisualDL visual data preprocessing pictures. + + - Bug fix: + - Fix Blaze Face keypoint prediction bug. + + +### v0.5.0(11/2020) + - Model richness enhancement: + - SOLOv2 series models were released, in which the SOLOv2-Light-R50-VD-DCN-FPN model achieved 38.6 FPS on a single gpu V100, accelerating by 24%, and the accuracy of COCO verification set reached 38.8%, improving by 2.4 absolute percentage points. + - Added Android mobile terminal detection demo, including SSD, YOLO series model, can directly scan code installation experience. + + - Mobile terminal model optimization: + - Added to PACT's new quantization strategy, YOLOv3 Mobilenetv3 is 0.7% better than normal quantization on COCO datasets. + + - Ease of use and functional components: + - Enhance the function of generate_proposal_labels operator to avoid nan risk of the model. + - Fixed several problems with deploy python and C++ prediction. + - Unified COCO and VOC datasets under the evaluation process, support the output of a single class of AP and P-R curves. + - PP-YOLO supports rectangular input images. + + - Documents: + - Added object detection whole process tutorial, added Jetson platform deployment tutorial. + + +### v0.4.0(07/2020) + - Model richness enhancement: + - The PPYOLO model was released. The accuracy of COCO dataset reached 45.2%, and the prediction speed of single gpu V100 reached 72.9 FPS, which was better than that of YOL Ov4 model. + - New TTFNet model, base version aligned with competing products, COCO dataset accuracy up to 32.9%. + - New HTC model, base version aligned with competing products, COCO dataset accuracy up to 42.2%. + - BlazeFace key point detection model was added, with an accuracy of 85.2% in Wider-Face's Easy-Set. + - ACFPN model was added, and the accuracy of COCO dataset reached 39.6%. + - General object detection model (including 676 classes) on the publisher side. On the COCO dataset with the same strategy, when V100 is 19.5FPS, the COCO mAP can reach 49.4%. + + - Mobile terminal model optimization: + - Added SSD Lite series optimization models, including Ghost Net Backbone, FPN components, etc., with accuracy improved by 0.5% and 1.5%. + + - Ease of use and functional components: + - Add GridMask, Random Erasing data enhancement method. + - Added support for Matrix NMS. + - EMA(Exponential Moving Average) training support. + - The new multi-machine training method, the average acceleration ratio of two machines to single machine is 80%, multi-machine training support needs to be further verified. + +### v0.3.0(05/2020) + - Model richness enhancement: + - Efficientdet-D0 model added, speed and accuracy is better than competing products. + - Added YOLOv4 prediction model, precision aligned with competing products; Added YOLOv4 fine tuning training on Pascal VOC datasets with accuracy of 85.5%. + - YOLOv3 added MobileNetV3 backbone network, COCO dataset accuracy reached 31.6%. + - Add Anchor-free model FCOS, the accuracy is better than competing products. + - Anchor-free model Cornernet Squeeze was added, the accuracy was better than competing products, and the accuracy of COCO dataset of optimized model was 38.2% and +3.7%, 5% faster than YOL Ov3 Darknet53. + - The CascadeRCNN-ResNet50vd model, which is a practical object detection model on the server side, is added, and its speed and accuracy are better than that of the competitive EfficientDet. + + - Mobile terminal launched three models: + - SSSDLite model: SSDLite-Mobilenetv3 small/large model, with better accuracy than competitors. + - YOLOv3 Mobile solution: The YOLOv3-MobileNetv3 model accelerates 3.5 times after compression, which is faster and more accurate than the SSD Lite model of competing products. + - RCNN Mobile terminal scheme: CascadeRCNN-MobileNetv3, after series optimization, launched models with input images of 320x320 and 640x640 respectively, with high cost performance for speed and accuracy. + + - Anticipate deployment refactoring: + - New Python prediction deployment process, support for RCNN, YOLO, SSD, Retina Net, face models, support for video prediction. + - Refactoring C++ predictive deployment to improve ease of use. + + - Ease of use and functional components: + - Added Auto Augment data enhancement. + - Upgrade the detection library document structure. + - Support shape matching automatically by transfer learning. + - Optimize memory footprint during mask branch evaluation. + +### v0.2.0(02/2020) + - The new model: + - Added CBResNet model. + - Added LibraRCNN model. + - The accuracy of YOLOv3 model was further improved, and the accuracy based on COCO data reached 43.2%, 1.4% higher than the previous version. + - New Basic module: + - Trunk network: CBResNet is added. + - Loss module: Loss of YOLOv3 supports fine-grained OP combinations. + - Regular module: Added the Drop Block module. + - Function optimization and improvement: + - Accelerate YOLOv3 data preprocessing and increase the overall training speed by 40%. + - Optimize data preprocessing logic to improve ease of use. + - dd face detection prediction benchmark data. + - Added C++ prediction engine Python API prediction example. + - Detection model compression: + - prunning: Release MobileNet-YOLOv3 prunning scheme and model, based on VOC data FLOPs 69.6%, mAP + 1.4%, based on COCO DATA FLOPS 28.8%, mAP + 0.9%; Release ResNet50vd-DCN-YOLOv3 clipped solution and model based on COCO datasets 18.4%, mAP + 0.8%. + - Distillation: Release MobileNet-YOLOv3 distillation scheme and model, based on VOC data mAP + 2.8%, COCO data mAP + 2.1%. + - Quantification: Release quantification models of YOLOv3 Mobile Net and Blaze Face. + - Prunning + distillation: release MobileNet-YOLOv3 prunning + distillation solution and model, 69.6% based on COCO DATA FLOPS, 64.5% based on TensorRT prediction acceleration, 0.3% mAP; Release ResNet50vd-DCN-YOLOv3 tailoring + distillation solution and model, 43.7% based on COCO Data FLOPS, 24.0% based on TensorRT prediction acceleration, mAP + 0.6%. + - Search: Open source Blaze Face Nas complete search solution. + - Predict deployment: + - Integrated TensorRT, support FP16, FP32, INT8 quantitative inference acceleration. + - Document: + - Add detailed data preprocessing module to introduce documents and implement custom data Reader documents. + - Added documentation on how to add algorithm models. + - Document deployment to the web site: https://paddledetection.readthedocs.io + +### 12/2019 +- Add Res2Net model. +- Add HRNet model. +- Add GIOU loss and DIOU loss。 + + +### 21/11/2019 +- Add CascadeClsAware RCNN model. +- Add CBNet, ResNet200 and Non-local model. +- Add SoftNMS. +- Add Open Image V5 dataset and Objects365 dataset model + +### 10/2019 +- Added enhanced YOLOv3 model with accuracy up to 41.4%. +- Added Face detection models BlazeFace and Faceboxes. +- Rich COCO based models, accuracy up to 51.9%. +- Added CA-Cascade-RCNN, one of the best single models to win on Objects365 2019 Challenge. +- Add pedestrian detection and vehicle detection pre-training models. +- Support FP16 training. +- Added cross-platform C++ inference deployment scheme. +- Add model compression examples. + + +### 2/9/2019 +- Add GroupNorm model. +- Add CascadeRCNN+Mask model. + +### 5/8/2019 +- Add Modulated Deformable Convolution series model + +### 29/7/2019 + +- Add detection library Chinese document +- Fixed an issue where R-CNN series model training was evaluated simultaneously +- Add ResNext101-vd + Mask R-CNN + FPN models +- Added YOLOv3 model based on VOC dataset + +### 3/7/2019 + +- First release of PaddleDetection Detection library and Detection model library +- models:Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask + R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD. diff --git a/docs/MODEL_ZOO_en.md b/docs/MODEL_ZOO_en.md new file mode 100644 index 000000000..4e8975e88 --- /dev/null +++ b/docs/MODEL_ZOO_en.md @@ -0,0 +1,94 @@ +# Model Libraries and Baselines + +## Test Environment + +- Python 3.7 +- PaddlePaddle Daily version +- CUDA 10.1 +- cuDNN 7.5 +- NCCL 2.4.8 + +## General Settings + +- All models were trained and tested in the COCO17 dataset. +- Unless special instructions, all the ResNet backbone network using [ResNet-B](https://arxiv.org/pdf/1812.01187) structure. +- **Inference time (FPS)**: The reasoning time was calculated on a Tesla V100 GPU by `tools/eval.py` testing all validation sets in FPS (number of pictures/second). CuDNN version is 7.5, including data loading, network forward execution and post-processing, and Batch size is 1. + +## Training strategy + +- We adopt and [Detectron](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#training-schedules) in the same training strategy. +- 1x strategy indicates that when the total batch size is 8, the initial learning rate is 0.01, and the learning rate decreases by 10 times after 8 epoch and 11 epoch, respectively, and the final training is 12 epoch. +- 2X strategy is twice as much as strategy 1X, and the learning rate adjustment position is twice as much as strategy 1X. + +## ImageNet pretraining model +Paddle provides a skeleton network pretraining model based on ImageNet. All pre-training models were trained by standard Imagenet 1K dataset. Res Net and Mobile Net are high-precision pre-training models obtained by cosine learning rate adjustment strategy or SSLD knowledge distillation training. Model details are available at [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). + + +## Baseline + +### Faster R-CNN + +Please refer to[Faster R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/faster_rcnn/) + +### Mask R-CNN + +Please refer to[Mask R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn/) + +### Cascade R-CNN + +Please refer to[Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn) + +### YOLOv3 + +Please refer to[YOLOv3](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/) + +### SSD + +Please refer to[SSD](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ssd/) + +### FCOS + +Please refer to[FCOS](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/fcos/) + +### SOLOv2 + +Please refer to[SOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/solov2/) + +### PP-YOLO + +Please refer to[PP-YOLO](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/) + +### TTFNet + +请参考[TTFNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/) + +### Group Normalization + +Please refer to[Group Normalization](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/gn/) + +### Deformable ConvNets v2 + +Please refer to[Deformable ConvNets v2](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dcn/) + +### HRNets + +Please refer to[HRNets](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/hrnet/) + +### Res2Net + +Please refer to[Res2Net](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/res2net/) + +### GFL + +Please refer to[GFL](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/gfl) + +### PicoDet + +Please refer to[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet) + + +## Rotating frame detection + +### S2ANet + +Please refer to[S2ANet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dota/) diff --git a/docs/advanced_tutorials/MODEL_TECHNICAL_en.md b/docs/advanced_tutorials/MODEL_TECHNICAL_en.md new file mode 100644 index 000000000..927a08596 --- /dev/null +++ b/docs/advanced_tutorials/MODEL_TECHNICAL_en.md @@ -0,0 +1,409 @@ +# How to Create Model Algorithm +In order to make better use of PaddleDetection, we will introduce the main model technical details and application of PaddleDetection in this document + +## Directory +- [How to Create Model Algorithm](#how-to-create-model-algorithm) + - [Directory](#directory) + - [1. Introduction](#1-introduction) + - [2. Create Model](#2-create-model) + - [2.1 Create Model Structure](#21-create-model-structure) + - [2.1.1 Create Backbone](#211-create-backbone) + - [2.1.2 Create Neck](#212-create-neck) + - [2.1.3 Create Head](#213-create-head) + - [2.1.4 Create Loss](#214-create-loss) + - [2.1.5 Create Post-processing Module](#215-create-post-processing-module) + - [2.1.6 Create Architecture](#216-create-architecture) + - [2.2 Create Configuration File](#22-create-configuration-file) + - [2.2.1 Network Structure Configuration File](#221-network-structure-configuration-file) + - [2.2.2 Optimizer configuration file](#222-optimizer-configuration-file) + - [2.2.3 Reader Configuration File](#223-reader-configuration-file) + +### 1. Introduction +Each model in the PaddleDetecion corresponds to a folder. In the case of Yolov3, models in the Yolov3 family correspond to the `configs/yolov3` folder. Yolov3 Darknet's general configuration file `configs/yolov3/yolov3_darknet53_270e_coco.yml`. +``` +_BASE_: [ + '../datasets/coco_detection.yml', # Dataset configuration file shared by all models + '../runtime.yml', # Runtime configuration + '_base_/optimizer_270e.yml', # Optimizer related configuration + '_base_/yolov3_darknet53.yml', # yolov3 Network structure configuration file + '_base_/yolov3_reader.yml', # yolov3 Reader module configuration +] + +# The relevant configuration defined here can override the configuration of the same name in the above file +snapshot_epoch: 5 +weights: output/yolov3_darknet53_270e_coco/model_final +``` +As you can see, the modules in the configuration file are clearly divided into optimizer, network structure, and reader modules, with the exception of the common dataset configuration and runtime configuration. Rich optimizers, learning rate adjustment strategies, preprocessing operators, etc., are supported in PaddleDetection, so most of the time you don't need to write the optimizer and reader-related code, just configure it in the configuration file. Therefore, the main purpose of adding a new model is to build the network structure. + +In `ppdet/modeling/`, all of the Paddle Detection network structures are defined and combined in the form of components. The main components of the network structure are as follows: +``` + ppdet/modeling/ + ├── architectures + │ ├── faster_rcnn.py # Faster Rcnn model + │ ├── ssd.py # SSD model + │ ├── yolo.py # YOLOv3 model + │ │ ... + ├── heads # detection head module + │ ├── xxx_head.py # define various detection heads + │ ├── roi_extractor.py # detection of region of interest extraction + ├── backbones # backbone network module + │ ├── resnet.py # ResNet network + │ ├── mobilenet.py # MobileNet network + │ │ ... + ├── losses # loss function module + │ ├── xxx_loss.py # define and register various loss functions + ├── necks # feature fusion module + │ ├── xxx_fpn.py # define various FPN modules + ├── proposal_generator # anchor & proposal generate and match modules + │ ├── anchor_generator.py # anchor generate modules + │ ├── proposal_generator.py # proposal generate modules + │ ├── target.py # anchor & proposal Matching function + │ ├── target_layer.py # anchor & proposal Matching function + ├── tests # unit test module + │ ├── test_xxx.py # the operator and module structure in the network are unit tested + ├── ops.py # encapsulates all kinds of common detection components/operators related to the detection of PaddlePaddle objects + ├── layers.py # encapsulates and register all kinds of PaddlePaddle object detection related public detection components/operators + ├── bbox_utils.py # encapsulates the box-related functions + ├── post_process.py # encapsulate and process related modules after registration + ├── shape_spec.py # defines a class for the module to output shape +``` + +![](../images/model_figure.png) + +### 2. Create Model +Next, the modeling process is described in detail by taking the single-stage detector YOLOv3 as an example, so that you can quickly build a new model according to this idea. + +#### 2.1 Create Model Structure + +##### 2.1.1 Create Backbone + +All existing Backbone network code in PaddleDetection is placed under `ppdet/modeling/backbones` directory, so we created `darknet.py` as follows: +```python +import paddle.nn as nn +from ppdet.core.workspace import register, serializable + +@register +@serializable +class DarkNet(nn.Layer): + + __shared__ = ['norm_type'] + + def __init__(self, + depth=53, + return_idx=[2, 3, 4], + norm_type='bn', + norm_decay=0.): + super(DarkNet, self).__init__() + # Omit the content + + def forward(self, inputs): + # Ellipsis processing logic + pass + + @property + def out_shape(self): + # Omit the content + pass +``` +Then add a reference to `backbones/__init__.py`: +```python +from . import darknet +from .darknet import * +``` +**A few notes:** +- To flexibly configure networks in the YAML configuration file, all backbone nodes need to register in `ppdet.core.workspace` as shown in the preceding example. In addition, `serializable` can be used to enable backbone to support serialization; +- All backbone needs to inherit the `paddle.nn.Layer` class and implement the forward function. In addition, it is necessary to implement the out shape attribute to define the channel information of the output feature map. For details, please refer to the source code. +- `__shared__` To realize global sharing of configuration parameters, these parameters can be shared by all registration modules, such as backbone, neck, head, and loss. + +##### 2.1.2 Create Neck +The feature fusion module is placed under the `ppdet/modeling/necks` directory and we create the following `yolo_fpn.py`: + +``` python +import paddle.nn as nn +from ppdet.core.workspace import register, serializable + +@register +@serializable +class YOLOv3FPN(nn.Layer): + __shared__ = ['norm_type'] + + def __init__(self, + in_channels=[256, 512, 1024], + norm_type='bn'): + super(YOLOv3FPN, self).__init__() + # Omit the content + + def forward(self, blocks): + # Omit the content + pass + + @classmethod + def from_config(cls, cfg, input_shape): + # Omit the content + pass + + @property + def out_shape(self): + # Omit the content + pass +``` +Then add a reference to `necks/__init__.py`: +```python +from . import yolo_fpn +from .yolo_fpn import * +``` +**A few notes:** +- The neck module needs to be registered with `register` and can be serialized with `serializable`. +- The neck module needs to inherit the `paddle.nn.Layer` class and implement the forward function. In addition, the `out_shape` attribute needs to be implemented to define the channel information of the output feature map, and the class function `from_config` needs to be implemented to deduce the input channel in the configuration file and initialize `YOLOv3FPN`. +- The neck module can use `shared` to implement global sharing of configuration parameters. + +##### 2.1.3 Create Head +The head module is all stored in the `ppdet/modeling/heads` directory, where we create `yolo_head.py` as follows +``` python +import paddle.nn as nn +from ppdet.core.workspace import register + +@register +class YOLOv3Head(nn.Layer): + __shared__ = ['num_classes'] + __inject__ = ['loss'] + + def __init__(self, + anchors=[[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45],[59, 119], + [116, 90], [156, 198], [373, 326]], + anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]], + num_classes=80, + loss='YOLOv3Loss', + iou_aware=False, + iou_aware_factor=0.4): + super(YOLOv3Head, self).__init__() + # Omit the content + + def forward(self, feats, targets=None): + # Omit the content + pass +``` +Then add a reference to `heads/__init__.py`: +```python +from . import yolo_head +from .yolo_head import * +``` +**A few notes:** +- The head module needs to register with `register`. +- The head module needs to inherit the `paddle.nn.Layer` class and implement the forward function. +- `__inject__` indicates that the module encapsulated in the global dictionary is imported. Such as loss, etc. + +##### 2.1.4 Create Loss +The loss modules are all stored under `ppdet/modeling/losses` directory, where we created `yolo_loss.py` +```python +import paddle.nn as nn +from ppdet.core.workspace import register + +@register +class YOLOv3Loss(nn.Layer): + + __inject__ = ['iou_loss', 'iou_aware_loss'] + __shared__ = ['num_classes'] + + def __init__(self, + num_classes=80, + ignore_thresh=0.7, + label_smooth=False, + downsample=[32, 16, 8], + scale_x_y=1., + iou_loss=None, + iou_aware_loss=None): + super(YOLOv3Loss, self).__init__() + # Omit the content + + def forward(self, inputs, targets, anchors): + # Omit the content + pass +``` +Then add a reference to `losses/__init__.py`: +```python +from . import yolo_loss +from .yolo_loss import * +``` +**A few notes:** +- The loss module needs to register with `register`. +- The loss module needs to inherit the `paddle.nn.Layer` class and implement the forward function. +- `__inject__` modules that have been encapsulated in the global dictionary can be used. Some parameters can be globally shared with `__shared__` configuration. + +##### 2.1.5 Create Post-processing Module +The post-processing module is defined in `ppdet/modeling/post_process.py`, where the `BBoxPostProcess` class is defined for post-processing operations, as follows: +``` python +from ppdet.core.workspace import register + +@register +class BBoxPostProcess(object): + __shared__ = ['num_classes'] + __inject__ = ['decode', 'nms'] + + def __init__(self, num_classes=80, decode=None, nms=None): + # Omit the content + pass + + def __call__(self, head_out, rois, im_shape, scale_factor): + # Omit the content + pass +``` +**A few notes:** +- Post-processing modules need to register with `register` +- `__inject__` modules encapsulated in the global dictionary, such as decode and NMS. Decode and NMS are defined in `ppdet/modeling/layers.py`. + +##### 2.1.6 Create Architecture + +All architecture network code is placed in `ppdet/modeling/architectures` directory, `meta_arch.py` defines the `BaseArch` class, the code is as follows: +``` python +import paddle.nn as nn +from ppdet.core.workspace import register + +@register +class BaseArch(nn.Layer): + def __init__(self): + super(BaseArch, self).__init__() + + def forward(self, inputs): + self.inputs = inputs + self.model_arch() + + if self.training: + out = self.get_loss() + else: + out = self.get_pred() + return out + + def model_arch(self, ): + pass + + def get_loss(self, ): + raise NotImplementedError("Should implement get_loss method!") + + def get_pred(self, ): + raise NotImplementedError("Should implement get_pred method!") +``` +All architecture needs to inherit from the `BaseArch` class, as defined by `yolo.py` in `YOLOv3` as follows: +``` python +@register +class YOLOv3(BaseArch): + __category__ = 'architecture' + __inject__ = ['post_process'] + + def __init__(self, + backbone='DarkNet', + neck='YOLOv3FPN', + yolo_head='YOLOv3Head', + post_process='BBoxPostProcess'): + super(YOLOv3, self).__init__() + self.backbone = backbone + self.neck = neck + self.yolo_head = yolo_head + self.post_process = post_process + + @classmethod + def from_config(cls, cfg, *args, **kwargs): + # Omit the content + pass + + def get_loss(self): + # Omit the content + pass + + def get_pred(self): + # Omit the content + pass +``` + +**A few notes:** +- All architecture needs to be registered using a `register` +- When constructing a complete network, `__category__ = 'architecture'` must be set to represent a complete object detection model; +- Backbone, neck, YOLO head, post-process and other inspection components are passed into the architecture to form the final network. Modularization of detection like this improves the reusability of detection models, and multiple models can be obtained by combining different detection components. +- The from config class function implements the automatic configuration of channels when modules are combined. + +#### 2.2 Create Configuration File + +##### 2.2.1 Network Structure Configuration File +The configuration of the yolov3 network structure is defined in the `configs/yolov3/_base_/` folder. For example, `yolov3_darknet53.yml` defines the network structure of Yolov3 Darknet as follows: +``` +architecture: YOLOv3 +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams +norm_type: sync_bn + +YOLOv3: + backbone: DarkNet + neck: YOLOv3FPN + yolo_head: YOLOv3Head + post_process: BBoxPostProcess + +DarkNet: + depth: 53 + return_idx: [2, 3, 4] + +# use default config +# YOLOv3FPN: + +YOLOv3Head: + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + loss: YOLOv3Loss + +YOLOv3Loss: + ignore_thresh: 0.7 + downsample: [32, 16, 8] + label_smooth: false + +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.005 + downsample_ratio: 32 + clip_bbox: true + nms: + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.01 + nms_threshold: 0.45 + nms_top_k: 1000 + +``` +In the configuration file, you need to specify the network architecture, pretrain weights to specify the URL or path of the training model, and norm type to share as global parameters. The definition of the model is defined in the file from top to bottom, corresponding to the model components in the previous section. For some model components, if the default parameters are used, you do not need to configure them, such as `yolo_fpn` above. By changing related configuration, we can easily combine another model, such as `configs/yolov3/_base_/yolov3_mobilenet_v1.yml` to switch backbone from Darknet to MobileNet. + +##### 2.2.2 Optimizer configuration file +The optimizer profile defines the optimizer used by the model and the learning rate scheduling strategy. Currently, a variety of optimizers and learning rate strategies have been integrated in PaddleDetection, as described in the code `ppdet/optimizer.py`. For example, the optimizer configuration file for yolov3 is defined in `configs/yolov3/_base_/optimizer_270e.yml` as follows: +``` +epoch: 270 + +LearningRate: + base_lr: 0.001 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: + # epoch number + - 216 + - 243 + - !LinearWarmup + start_factor: 0. + steps: 4000 + +OptimizerBuilder: + optimizer: + momentum: 0.9 + type: Momentum + regularizer: + factor: 0.0005 + type: L2 +``` +**A few notes:** +- Optimizer builder. Optimizer specifies the type and parameters of the Optimizer. Currently support the optimizer can reference [PaddlePaddle official documentation](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html) +- The `LearningRate.schedulers` sets the combination of different Learning Rate adjustment strategies. Paddle currently supports a variety of Learning Rate adjustment strategies. Specific also can reference [Paddle Paddle official documentation](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html). It is important to note that you need to simply package the learning rate adjustment strategy in Paddle, which can be found in the source code `ppdet/optimizer.py`. + + +##### 2.2.3 Reader Configuration File +For Reader configuration, see [Reader configuration documentation](./READER_en.md#5.Configuration-and-Operation). + +> After reading this document, you should have some experience in model construction and configuration of Paddle Detection, and you will understand it more thoroughly with the source code. If you have other questions or suggestions about model technology, please send us an issue. We welcome your feedback. diff --git a/docs/advanced_tutorials/READER_en.md b/docs/advanced_tutorials/READER_en.md new file mode 100644 index 000000000..e39246417 --- /dev/null +++ b/docs/advanced_tutorials/READER_en.md @@ -0,0 +1,328 @@ +# Data Processing Module + +## Directory +- [Data Processing Module](#data-processing-module) + - [Directory](#directory) + - [1.Introduction](#1introduction) + - [2.Dataset](#2dataset) + - [2.1COCO Dataset](#21coco-dataset) + - [2.2Pascal VOC dataset](#22pascal-voc-dataset) + - [2.3Customize Dataset](#23customize-dataset) + - [3.Data preprocessing](#3data-preprocessing) + - [3.1Data Enhancement Operator](#31data-enhancement-operator) + - [3.2Custom data enhancement operator](#32custom-data-enhancement-operator) + - [4.Reader](#4reader) + - [5.Configuration and Operation](#5configuration-and-operation) + - [5.1Configuration](#51configuration) + - [5.2run](#52run) + +### 1.Introduction +All code logic for Paddle Detection's data processing module in `ppdet/data/`, the data processing module is used to load data and convert it into a format required for training, evaluation and reasoning of object Detection models. The main components of the data processing module are as follows: +The main components of the data processing module are as follows: +```bash + ppdet/data/ + ├── reader.py # Reader module based on Dataloader encapsulation + ├── source # Data source management module + │ ├── dataset.py # Defines the data source base class from which various datasets are inherited + │ ├── coco.py # The COCO dataset parses and formats the data + │ ├── voc.py # Pascal VOC datasets parse and format data + │ ├── widerface.py # The WIDER-FACE dataset parses and formats data + │ ├── category.py # Category information for the relevant dataset + ├── transform # Data preprocessing module + │ ├── batch_operators.py # Define all kinds of preprocessing operators based on batch data + │ ├── op_helper.py # The auxiliary function of the preprocessing operator + │ ├── operators.py # Define all kinds of preprocessing operators based on single image + │ ├── gridmask_utils.py # GridMask data enhancement function + │ ├── autoaugment_utils.py # AutoAugment auxiliary function + ├── shm_utils.py # Auxiliary functions for using shared memory + ``` + + +### 2.Dataset +The dataset is defined in the `source` directory, where `dataset.py` defines the base class `DetDataSet` of the dataset. All datasets inherit from the base class, and the `DetDataset` base class defines the following methods: + +| Method | Input | Output | Note | +| :-----------------------: | :------------------------------------------: | :---------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | +| \_\_len\_\_ | no | int, the number of samples in the dataset | Filter out the unlabeled samples | +| \_\_getitem\_\_ | int, The index of the sample | dict, Index idx to sample ROIDB | Get the sample roidb after transform | +| check_or_download_dataset | no | no | Check whether the dataset exists, if not, download, currently support COCO, VOC, Widerface and other datasets | +| set_kwargs | Optional arguments, given as key-value pairs | no | Currently used to support receiving mixup, cutMix and other parameters | +| set_transform | A series of transform functions | no | Set the transform function of the dataset | +| set_epoch | int, current epoch | no | Interaction between dataset and training process | +| parse_dataset | no | no | Used to read all samples from the data | +| get_anno | no | no | Used to get the path to the annotation file | + +When a dataset class inherits from `DetDataSet`, it simply implements the Parse dataset function. parse_dataset set dataset root path dataset_dir, image folder image dir, annotated file path anno_path retrieve all samples and save them in a list roidbs Each element in the list is a sample XXX rec(such as coco_rec or voc_rec), represented by dict, which contains the sample image, gt_bbox, gt_class and other fields. The data structure of xxx_rec in COCO and Pascal-VOC datasets is defined as follows: + ```python + xxx_rec = { + 'im_file': im_fname, # The full path to an image + 'im_id': np.array([img_id]), # The ID number of an image + 'h': im_h, # Height of the image + 'w': im_w, # The width of the image + 'is_crowd': is_crowd, # Community object, default is 0 (VOC does not have this field) + 'gt_class': gt_class, # ID number of an enclosure label name + 'gt_bbox': gt_bbox, # label box coordinates(xmin, ymin, xmax, ymax) + 'gt_poly': gt_poly, # Segmentation mask. This field only appears in coco_rec and defaults to None + 'difficult': difficult # Is it a difficult sample? This field only appears in voc_rec and defaults to 0 + } + ``` + +The contents of the xxx_rec can also be controlled by the Data fields parameter of `DetDataSet`, that is, some unwanted fields can be filtered out, but in most cases you do not need to change them. The default configuration in `configs/datasets` will do. + +In addition, a dictionary `cname2cid` holds the mapping of category names to IDS in the Parse dataset function. In coco dataset, can use [coco API](https://github.com/cocodataset/cocoapi) from the label category name of the file to load dataset, and set up the dictionary. In the VOC dataset, if `use_default_label=False` is set, the category list will be read from `label_list.txt`, otherwise the VOC default category list will be used. + +#### 2.1COCO Dataset +COCO datasets are currently divided into COCO2014 and COCO2017, which are mainly composed of JSON files and image files, and their organizational structure is shown as follows: + ``` + dataset/coco/ + ├── annotations + │ ├── instances_train2014.json + │ ├── instances_train2017.json + │ ├── instances_val2014.json + │ ├── instances_val2017.json + │ │ ... + ├── train2017 + │ ├── 000000000009.jpg + │ ├── 000000580008.jpg + │ │ ... + ├── val2017 + │ ├── 000000000139.jpg + │ ├── 000000000285.jpg + │ │ ... + ``` +class `COCODataSet` is defined and registered on `source/coco.py`. And implements the parse the dataset method, called [COCO API](https://github.com/cocodataset/cocoapi) to load and parse COCO format data source ` roidbs ` and ` cname2cid `, See `source/coco.py` source code for details. Converting other datasets to COCO format can be done by referring to [converting User Data to COCO Data](../tutorials/PrepareDataSet_en.md#convert-user-data-to-coco-data) +And implements the parse the dataset method, called [COCO API](https://github.com/cocodataset/cocoapi) to load and parse COCO format data source `roidbs` and `cname2cid`, See `source/coco.py` source code for details. Converting other datasets to COCO format can be done by referring to [converting User Data to COCO Data](../tutorials/PrepareDataSet_en.md#convert-user-data-to-coco-data) + + +#### 2.2Pascal VOC dataset +The dataset is currently divided into VOC2007 and VOC2012, mainly composed of XML files and image files, and its organizational structure is shown as follows: +``` + dataset/voc/ + ├── trainval.txt + ├── test.txt + ├── label_list.txt (optional) + ├── VOCdevkit/VOC2007 + │ ├── Annotations + │ ├── 001789.xml + │ │ ... + │ ├── JPEGImages + │ ├── 001789.jpg + │ │ ... + │ ├── ImageSets + │ | ... + ├── VOCdevkit/VOC2012 + │ ├── Annotations + │ ├── 2011_003876.xml + │ │ ... + │ ├── JPEGImages + │ ├── 2011_003876.jpg + │ │ ... + │ ├── ImageSets + │ │ ... + ``` +The `VOCDataSet` dataset is defined and registered in `source/voc.py` . It inherits the `DetDataSet` base class and rewrites the `parse_dataset` method to parse XML annotations in the VOC dataset. Update `roidbs` and `cname2cid`. To convert other datasets to VOC format, refer to [User Data to VOC Data](../tutorials/PrepareDataSet_en.md#convert-user-data-to-voc-data) + + +#### 2.3Customize Dataset +If the COCO dataset and VOC dataset do not meet your requirements, you can load your dataset by customizing it. There are only two steps to implement a custom dataset + +1. create`source/xxx.py`, define class `XXXDataSet` extends from `DetDataSet` base class, complete registration and serialization, and rewrite `parse_dataset`methods to update `roidbs` and `cname2cid`: + ```python + from ppdet.core.workspace import register, serializable + + #Register and serialize + @register + @serializable + class XXXDataSet(DetDataSet): + def __init__(self, + dataset_dir=None, + image_dir=None, + anno_path=None, + ... + ): + self.roidbs = None + self.cname2cid = None + ... + + def parse_dataset(self): + ... + Omit concrete parse data logic + ... + self.roidbs, self.cname2cid = records, cname2cid + ``` + +2. Add a reference to `source/__init__.py`: + ```python + from . import xxx + from .xxx import * + ``` +Complete the above two steps to add the new Data source `XXXDataSet`, you can refer to [Configure and Run](#5.Configuration-and-Operation) to implement the use of custom datasets. + +### 3.Data preprocessing + +#### 3.1Data Enhancement Operator +A variety of data enhancement operators are supported in PaddleDetection, including single image data enhancement operator and batch data enhancement operator. You can choose suitable operators to use in combination. Single image data enhancement operators are defined in `transform/operators.py`. The supported single image data enhancement operators are shown in the following table: +| Name | Function | +| :----------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Decode | Loads an image from an image file or memory buffer in RGB format | +| Permute | If the input is HWC, the sequence changes to CHW | +| RandomErasingImage | Random erasure of the image | +| NormalizeImage | The pixel value of the image is normalized. If is scale= True is set, the pixel value is divided by 255.0 before normalization. | +| GridMask | GridMask data is augmented | +| RandomDistort | Random disturbance of image brightness, contrast, saturation and hue | +| AutoAugment | Auto Augment data, which contains a series of data augmentation methods | +| RandomFlip | Randomly flip the image horizontally | +| Resize | Resize the image and transform the annotation accordingly | +| MultiscaleTestResize | Rescale the image to each size of the multi-scale list | +| RandomResize | Random Resize of images can be resized to different sizes and different interpolation strategies can be used | +| RandomExpand | Place the original image into an expanded image filled with pixel mean, crop, scale, and flip the image | +| CropWithSampling | Several candidate frames are generated according to the scaling ratio and length-width ratio, and then the prunning results that meet the requirements are selected according to the area intersection ratio (IoU) between these candidate frames and the marking frames | +| CropImageWithDataAchorSampling | Based on Crop Image, in face detection, the Image scale is randomly transformed to a certain range of scale, which greatly enhances the scale change of face | +| RandomCrop | The principle is the same as CropImage, which is processed with random proportion and IoU threshold | +| RandomScaledCrop | According to the long edge, the image is randomly clipped and the corresponding transformation is made to the annotations | +| Cutmix | Cutmix data enhancement, Mosaic of two images | +| Mixup | Mixup data enhancement to scale up two images | +| NormalizeBox | Bounding box is normalized | +| PadBox | If the number of bounding boxes is less than num Max boxes, zero is populated into bboxes | +| BboxXYXY2XYWH | Bounding Box is converted from (xmin,ymin,xmax,ymin) form to (xmin,ymin, Width,height) form | +| Pad | The image Pad is an integer multiple of a certain number or the specified size, and supports the way of specifying Pad | +| Poly2Mask | Poly2Mask data enhancement | | + +Batch data enhancement operators are defined in `transform/batch_operators.py`. The list of operators currently supported is as follows: +| Name | Function | +| :---------------: | :------------------------------------------------------------------------------------------------------------------: | +| PadBatch | Pad operation is performed on each batch of data images randomly to make the images in the batch have the same shape | +| BatchRandomResize | Resize a batch of images so that the images in the batch are randomly scaled to the same size | +| Gt2YoloTarget | Generate the objectives of YOLO series models from GT data | +| Gt2FCOSTarget | Generate the target of the FCOS model from GT data | +| Gt2TTFTarget | Generate TTF Net model targets from GT data | +| Gt2Solov2Target | Generate targets for SOL Ov2 models from GT data | + +**A few notes:** +- The input of Data enhancement operator is sample or samples, and each sample corresponds to a sample of RoIDBS output by `DetDataSet` mentioned above, such as coco_rec or voc_rec +- Single image data enhancement operators (except Mixup, Cutmix, etc.) can also be used in batch data processing. However, there are still some differences between single image processing operators and Batch image processing operators. Taking Random Resize and Batch Random Resize as an example, Random Resize will randomly scale each picture in a Batch. However, the shapes of each image after Resize are different. Batch Random Resize means that all images in a Batch will be randomly scaled to the same shape. +- In addition to Batch Random Resize, the Batch data enhancement operators defined in `transform/batch_operators.py` receive input images in the form of CHW, so please use Permute before using these Batch data enhancement operators . If the Gt2xxx Target operator is used, it needs to be placed further back. The Normalize Box operator is recommended to be placed before Gt2xxx Target. After summarizing these constraints, the order of the recommended preprocessing operator is: + ``` + - XXX: {} + - ... + - BatchRandomResize: {...} # Remove it if not needed, and place it in front of Permute if necessary + - Permute: {} # flush privileges + - NormalizeBox: {} # If necessary, it is recommended to precede Gt2XXXTarget + - PadBatch: {...} # If not, you can remove it. If necessary, it is recommended to place it behind Permute + - Gt2XXXTarget: {...} # It is recommended to place with Pad Batch in the last position + ``` + +#### 3.2Custom data enhancement operator +If you need to customize data enhancement operators, you need to understand the logic of data enhancement operators. The Base class of the data enhancement Operator is the `transform/operators.py`class defined in `BaseOperator`, from which both the single image data enhancement Operator and the batch data enhancement Operator inherit. Refer to the source code for the complete definition. The following code shows the key functions of the `BaseOperator` class: the apply and __call__ methods + ``` python + class BaseOperator(object): + + ... + + def apply(self, sample, context=None): + return sample + + def __call__(self, sample, context=None): + if isinstance(sample, Sequence): + for i in range(len(sample)): + sample[i] = self.apply(sample[i], context) + else: + sample = self.apply(sample, context) + return sample + ``` +__call__ method is call entry of `BaseOperator`, Receive one sample(single image) or multiple samples (multiple images) as input, and call the Apply function to process one or more samples. In most cases, you simply inherit from `BaseOperator` and override the apply method or override the __call__ method, as shown below. Define a XXXOp that inherits from Base Operator and register it: + ```python + @register_op + class XXXOp(BaseOperator): + def __init__(self,...): + + super(XXXImage, self).__init__() + ... + + # In most cases, you just need to override the Apply method + def apply(self, sample, context=None): + ... + 省略对输入的sample具体操作 + ... + return sample + + # If necessary, override call methods such as Mixup, Gt2XXXTarget, etc + # def __call__(self, sample, context=None): + # ... + # The specific operation on the input sample is omitted + # ... + # return sample + ``` +In most cases, you simply override the Apply method, such as the preprocessor in `transform/operators.py` in addition to Mixup and Cutmix. In the case of batch processing, it is generally necessary to override the call method, such as the preprocessing operator of `transform/batch_operators.py`. + +### 4.Reader +The Reader class is defined in `reader.py`, where the `BaseDataLoader` class is defined. `BaseDataLoader` encapsulates a layer on the basis of `paddle.io.DataLoader`, which has all the functions of `paddle.io.DataLoader` and can realize the different needs of `DetDataset` for different models. For example, you can set Reader to control `DetDataset` to support Mixup, Cutmix and other operations. In addition, the Data preprocessing operators are combined into the `DetDataset` and `paddle.io.DataLoader` by the `Compose` and 'Batch Compose' classes, respectively. All Reader classes inherit from the `BaseDataLoader` class. See source code for details. + +### 5.Configuration and Operation + +#### 5.1Configuration + +The configuration files for modules related to data preprocessing contain the configuration files for Datas sets common to all models and the configuration files for readers specific to different models. The configuration file for the Dataset exists in the `configs/datasets` folder. For example, the COCO dataset configuration file is as follows: +``` +metric: COCO # Currently supports COCO, VOC, OID, Wider Face and other evaluation standards +num_classes: 80 # num_classes: The number of classes in the dataset, excluding background classes + +TrainDataset: + !COCODataSet + image_dir: train2017 # The path where the training set image resides relative to the dataset_dir + anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir + dataset_dir: dataset/coco #The path where the dataset is located relative to the PaddleDetection path + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # Controls the fields contained in the sample output of the dataset + +EvalDataset: + !COCODataSet + image_dir: val2017 # The path where the images of the validation set reside relative to the dataset_dir + anno_path: annotations/instances_val2017.json # The path to the annotation file of the validation set relative to the dataset_dir + dataset_dir: dataset/coco # The path where the dataset is located relative to the PaddleDetection path +TestDataset: + !ImageFolder + anno_path: dataset/coco/annotations/instances_val2017.json # The path of the annotation file of the verification set, relative to the path of PaddleDetection +``` +In the YML profile for Paddle Detection, use `!`directly serializes module instances (functions, instances, etc.). The above configuration files are serialized using Dataset. +The Reader configuration files for yolov3 are defined in `configs/yolov3/_base_/yolov3_reader.yml`. An example Reader configuration is as follows: +``` +worker_num: 2 +TrainReader: + sample_transforms: + - Decode: {} + ... + batch_transforms: + ... + batch_size: 8 + shuffle: true + drop_last: true + use_shared_memory: true + +EvalReader: + sample_transforms: + - Decode: {} + ... + batch_size: 1 + drop_empty: false + +TestReader: + inputs_def: + image_shape: [3, 608, 608] + sample_transforms: + - Decode: {} + ... + batch_size: 1 +``` +You can define different preprocessing operators in Reader, batch_size per gpu, worker_num of Data Loader, etc. + +#### 5.2run +In the Paddle Detection training, evaluation, and test runs, Reader iterators are created. The Reader is created in `ppdet/engine/trainer.py`. The following code shows how to create a training-time Reader +``` python +from ppdet.core.workspace import create +# build data loader +self.dataset = cfg['TrainDataset'] +self.loader = create('TrainReader')(selfdataset, cfg.worker_num) +``` +The Reader for prediction and evaluation is similar to `ppdet/engine/trainer.py`. + +> About the data processing module, if you have other questions or suggestions, please send us an issue, we welcome your feedback. diff --git a/docs/tutorials/GETTING_STARTED.md b/docs/tutorials/GETTING_STARTED.md index 1a92c08ab..cdca9fbb8 100644 --- a/docs/tutorials/GETTING_STARTED.md +++ b/docs/tutorials/GETTING_STARTED.md @@ -137,8 +137,8 @@ list below can be viewed by `--help` ## Deployment -Please refer to [depolyment](../../deploy/README.md) +Please refer to [depolyment](../../deploy/README_en.md) ## Model Compression -Please refer to [slim](../../configs/slim/README.md) +Please refer to [slim](../../configs/slim/README_en.md) diff --git a/docs/tutorials/PrepareDataSet_en.md b/docs/tutorials/PrepareDataSet_en.md new file mode 100644 index 000000000..77206402b --- /dev/null +++ b/docs/tutorials/PrepareDataSet_en.md @@ -0,0 +1,423 @@ +# How to Prepare Training Data +## Directory +- [How to Prepare Training Data](#how-to-prepare-training-data) + - [Directory](#directory) + - [Description of Object Detection Data](#description-of-object-detection-data) + - [Prepare Training Data](#prepare-training-data) + - [VOC Data](#voc-data) + - [VOC Dataset Download](#voc-dataset-download) + - [Introduction to VOC Data Annotation File](#introduction-to-voc-data-annotation-file) + - [COCO Data](#coco-data) + - [COCO Data Download](#coco-data-download) + - [Description of COCO Data Annotation](#description-of-coco-data-annotation) + - [User Data](#user-data) + - [Convert User Data to VOC Data](#convert-user-data-to-voc-data) + - [Convert User Data to COCO Data](#convert-user-data-to-coco-data) + - [Reader of User Define Data](#reader-of-user-define-data) + - [Example of User Data Conversion](#example-of-user-data-conversion) + +### Description of Object Detection Data +The data of object detection is more complex than classification. In an image, it is necessary to mark the position and category of each object. + +The general object position is represented by a rectangular box, which is generally expressed in the following three ways + +| Expression | Explanation | +| :---------: | :----------------------------------------------------------------------------: | +| x1,y1,x2,y2 | (x1,y1)is the top left coordinate, (x2,y2)is the bottom right coordonate | +| x1,y1,w,h | (x1,y1)is the top left coordinate, w is width of object, h is height of object | +| xc,yc,w,h | (xc,yc)is center of object, w is width of object, h is height of object | + +Common object detection datasets such as Pascal VOC, adopting `[x1,y1,x2,y2]` to express the bounding box of object. COCO uses `[x1,y1,w,h]` , [format](https://cocodataset.org/#format-data). + +### Prepare Training Data +PaddleDetection is supported [COCO](http://cocodataset.org) and [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) and [WIDER-FACE](http://shuoyang1213.me/WIDERFACE/) datasets by default. + +It also supports custom data sources including: + +(1) Convert custom data to VOC format; +(2) Convert custom data to COOC format; +(3) Customize a new data source, and add custom reader; + +firstly, enter `PaddleDetection` root directory + +``` +cd PaddleDetection/ +ppdet_root=$(pwd) +``` + +#### VOC Data + +VOC data is used in [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) competition. Pascal VOC competition not only contains image classification task, but also contains object detection and object segmentation et al., the annotation file contains the ground truth of multiple tasks. +VOC dataset denotes the data of PAscal VOC competition. when customizeing VOC data, For non mandatory fields in the XML file, please select whether to label or use the default value according to the actual situation. + +##### VOC Dataset Download + +- Download VOC datasets through code automation. The datasets are large and take a long time to download + + ``` + # Execute code to automatically download VOC dataset + python dataset/voc/download_voc.py + ``` + + After code execution, the VOC dataset file organization structure is: + ``` + >>cd dataset/voc/ + >>tree + ├── create_list.py + ├── download_voc.py + ├── generic_det_label_list.txt + ├── generic_det_label_list_zh.txt + ├── label_list.txt + ├── VOCdevkit/VOC2007 + │ ├── annotations + │ ├── 001789.xml + │ | ... + │ ├── JPEGImages + │ ├── 001789.jpg + │ | ... + │ ├── ImageSets + │ | ... + ├── VOCdevkit/VOC2012 + │ ├── Annotations + │ ├── 2011_003876.xml + │ | ... + │ ├── JPEGImages + │ ├── 2011_003876.jpg + │ | ... + │ ├── ImageSets + │ | ... + | ... + ``` + + Description of each document + ``` + # label_list.txt is list of classes name,filename must be label_list.txt. If using VOC dataset, when `use_default_label=true` in config file, this file is not required. + + >>cat label_list.txt + aeroplane + bicycle + ... + + # trainval.txt is file list of trainset + >>cat trainval.txt + VOCdevkit/VOC2007/JPEGImages/007276.jpg VOCdevkit/VOC2007/Annotations/007276.xml + VOCdevkit/VOC2012/JPEGImages/2011_002612.jpg VOCdevkit/VOC2012/Annotations/2011_002612.xml + ... + + # test.txt is file list of testset + >>cat test.txt + VOCdevkit/VOC2007/JPEGImages/000001.jpg VOCdevkit/VOC2007/Annotations/000001.xml + ... + + # label_list.txt voc list of classes name + >>cat label_list.txt + + aeroplane + bicycle + ... + ``` +- If the VOC dataset has been downloaded + You can organize files according to the above data file organization structure. + +##### Introduction to VOC Data Annotation File + +In VOC dataset, Each image file corresponds to an XML file with the same name, the coordinates and categories of the marked object frame in the XML file, such as `2007_002055.jpg`: +![](../images/2007_002055.jpg) + +The XML file corresponding to the image contains the basic information of the corresponding image, such as file name, source, image size, object area information and category information contained in the image. + +The XML file contains the following fields: +- filename, indicating the image name. +- size, indicating the image size, including: image width, image height and image depth + ``` + + 500 + 375 + 3 + + ``` +- object field, indict each object, including: + + | Label | Explanation | + | :--------------: | :------------------------------------------------------------------------------------------------------------------------: | + | name | name of object class | + | pose | attitude description of the target object (non required field) | + | truncated | If the occlusion of the object exceeds 15-20% and is outside the bounding box,mark it as `truncated` (non required field) | + | difficult | objects that are difficult to recognize are marked as`difficult` (non required field) | + | bndbox son laebl | (xmin,ymin) top left coordinate, (xmax,ymax) bottom right coordinate | + + +#### COCO Data +COOC data is used in [COCO](http://cocodataset.org) competition. alike, Coco competition also contains multiple competition tasks, and its annotation file contains the annotation contents of multiple tasks. +The coco dataset refers to the data used in the coco competition. Customizing coco data, some fields in JSON file, please select whether to label or use the default value according to the actual situation. + + +##### COCO Data Download +- The coco dataset is downloaded automatically through the code. The dataset is large and takes a long time to download + + ``` + # automatically download coco datasets by executing code + python dataset/coco/download_coco.py + ``` + + after code execution, the organization structure of coco dataset file is: + ``` + >>cd dataset/coco/ + >>tree + ├── annotations + │ ├── instances_train2017.json + │ ├── instances_val2017.json + │ | ... + ├── train2017 + │ ├── 000000000009.jpg + │ ├── 000000580008.jpg + │ | ... + ├── val2017 + │ ├── 000000000139.jpg + │ ├── 000000000285.jpg + │ | ... + | ... + ``` +- If the coco dataset has been downloaded + The files can be organized according to the above data file organization structure. + +##### Description of COCO Data Annotation +Coco data annotation is to store the annotations of all training images in a JSON file. Data is stored in the form of nested dictionaries. + +The JSON file contains the following keys: +- info,indicating the annotation file info。 +- licenses, indicating the label file licenses。 +- images, indicating the list of image information in the annotation file, and each element is the information of an image. The following is the information of one of the images: + ``` + { + 'license': 3, # license + 'file_name': '000000391895.jpg', # file_name + # coco_url + 'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg', + 'height': 360, # image height + 'width': 640, # image width + 'date_captured': '2013-11-14 11:18:45', # date_captured + # flickr_url + 'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg', + 'id': 391895 # image id + } + ``` +- annotations: indicating the annotation information list of the target object in the annotation file. Each element is the annotation information of a target object. The following is the annotation information of one of the target objects: + ``` + { + + 'segmentation': # object segmentation annotation + 'area': 2765.1486500000005, # object area + 'iscrowd': 0, # iscrowd + 'image_id': 558840, # image id + 'bbox': [199.84, 200.46, 77.71, 70.88], # bbox [x1,y1,w,h] + 'category_id': 58, # category_id + 'id': 156 # image id + } + ``` + + ``` + # Viewing coco annotation files + import json + coco_anno = json.load(open('./annotations/instances_train2017.json')) + + # coco_anno.keys + print('\nkeys:', coco_anno.keys()) + + # Viewing categories information + print('\ncategories:', coco_anno['categories']) + + # Viewing the number of images + print('\nthe number of images:', len(coco_anno['images'])) + + # Viewing the number of obejcts + print('\nthe number of annotation:', len(coco_anno['annotations'])) + + # View object annotation information + print('\nobject annotation information: ', coco_anno['annotations'][0]) + ``` + + Coco data is prepared as follows. + `dataset/coco/`Initial document organization + ``` + >>cd dataset/coco/ + >>tree + ├── download_coco.py + ``` + +#### User Data +There are three processing methods for user data: + (1) Convert user data into VOC data (only include labels necessary for object detection as required) + (2) Convert user data into coco data (only include labels necessary for object detection as required) + (3) Customize a reader for user data (for complex data, you need to customize the reader) + +##### Convert User Data to VOC Data +After the user dataset is converted to VOC data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems): + +``` +dataset/xxx/ +├── annotations +│ ├── xxx1.xml +│ ├── xxx2.xml +│ ├── xxx3.xml +│ | ... +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +├── label_list.txt (Must be provided and the file name must be label_list.txt ) +├── train.txt (list of trainset ./images/xxx1.jpg ./annotations/xxx1.xml) +└── valid.txt (list of valid file) +``` + +Description of each document +``` +# label_list.txt is a list of category names. The file name must be this +>>cat label_list.txt +classname1 +classname2 +... + +# train.txt is list of trainset +>>cat train.txt +./images/xxx1.jpg ./annotations/xxx1.xml +./images/xxx2.jpg ./annotations/xxx2.xml +... + +# valid.txt is list of validset +>>cat valid.txt +./images/xxx3.jpg ./annotations/xxx3.xml +... +``` + +##### Convert User Data to COCO Data +`x2coco.py` is provided in `./tools/` to convert VOC dataset, labelme labeled dataset or cityscape dataset into coco data, for example: + +(1)Conversion of labelme data to coco data: +```bash +python tools/x2coco.py \ + --dataset_type labelme \ + --json_input_dir ./labelme_annos/ \ + --image_input_dir ./labelme_imgs/ \ + --output_dir ./cocome/ \ + --train_proportion 0.8 \ + --val_proportion 0.2 \ + --test_proportion 0.0 +``` +(2)Convert VOC data to coco data: +```bash +python tools/x2coco.py \ + --dataset_type voc \ + --voc_anno_dir path/to/VOCdevkit/VOC2007/Annotations/ \ + --voc_anno_list path/to/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt \ + --voc_label_list dataset/voc/label_list.txt \ + --voc_out_name voc_train.json +``` + +After the user dataset is converted to coco data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems): +``` +dataset/xxx/ +├── annotations +│ ├── train.json # Annotation file of coco data +│ ├── valid.json # Annotation file of coco data +├── images +│ ├── xxx1.jpg +│ ├── xxx2.jpg +│ ├── xxx3.jpg +│ | ... +... +``` + +##### Reader of User Define Data + If new data in the dataset needs to be added to paddedetection, you can refer to the [add new data source] (../advanced_tutorials/READER.md#2.3_Customizing_Dataset) document section in the data processing document to develop corresponding code to complete the new data source support. At the same time, you can read the [data processing document] (../advanced_tutorials/READER.md) for specific code analysis of data processing + + +#### Example of User Data Conversion + Take [Kaggle Dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) competition data as an example to illustrate how to prepare custom data. The dataset of Kaggle [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) competition contains 877 images, four categories:crosswalk,speedlimit,stop,trafficlight. Available for download from kaggle, also available from [link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar). + Example diagram of road sign dataset: + ![](../images/road554.png) + +``` +# Downing and unziping data + >>cd $(ppdet_root)/dataset +# Download and unzip the kaggle dataset. The current file organization is as follows + +├── annotations +│ ├── road0.xml +│ ├── road1.xml +│ ├── road10.xml +│ | ... +├── images +│ ├── road0.jpg +│ ├── road1.jpg +│ ├── road2.jpg +│ | ... +``` + +The data is divided into training set and test set +``` +# Generating label_list.txt +>>echo -e "speedlimit\ncrosswalk\ntrafficlight\nstop" > label_list.txt + +# Generating train.txt, valid.txt and test.txt +>>ls images/*.png | shuf > all_image_list.txt +>>awk -F"/" '{print $2}' all_image_list.txt | awk -F".png" '{print $1}' | awk -F"\t" '{print "images/"$1".png annotations/"$1".xml"}' > all_list.txt + +# The proportion of training set, verification set and test set is about 80%, 10% and 10% respectively. +>>head -n 88 all_list.txt > test.txt +>>head -n 176 all_list.txt | tail -n 88 > valid.txt +>>tail -n 701 all_list.txt > train.txt + +# Deleting unused files +>>rm -rf all_image_list.txt all_list.txt + +The organization structure of the final dataset file is: + +├── annotations +│ ├── road0.xml +│ ├── road1.xml +│ ├── road10.xml +│ | ... +├── images +│ ├── road0.jpg +│ ├── road1.jpg +│ ├── road2.jpg +│ | ... +├── label_list.txt +├── test.txt +├── train.txt +└── valid.txt + +# label_list.txt is list of file name, file name must be label_list.txt +>>cat label_list.txt +crosswalk +speedlimit +stop +trafficlight + +# train.txt is the list of training dataset files, and each line is an image path and the corresponding annotation file path, separated by spaces. Note that the path here is a relative path within the dataset folder. +>>cat train.txt +./images/road839.png ./annotations/road839.xml +./images/road363.png ./annotations/road363.xml +... + +# valid.txt is the list of validation dataset files. Each line is an image path and the corresponding annotation file path, separated by spaces. Note that the path here is a relative path within the dataset folder. +>>cat valid.txt +./images/road218.png ./annotations/road218.xml +./images/road681.png ./annotations/road681.xml +``` + +You can also download [the prepared data](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar), unzip to `dataset/roadsign_voc/` +After preparing the data, we should generally understand the data, such as image quantity, image size, number of target areas of each type, target area size, etc. If necessary, clean the data. + +Roadsign dataset statistics: + +| data | number of images | +| :---: | :--------------: | +| train | 701 | +| valid | 176 | + +**Explanation:** + (1) For user data, it is recommended to carefully check the data before training to avoid crash during training due to wrong data annotation format or incomplete image data + (2) If the image size is too large, it will occupy more memory without limiting the read data size, which will cause memory / video memory overflow. Please set batch reasonably_ Size, you can try from small to large diff --git a/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation_en.md b/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation_en.md new file mode 100644 index 000000000..4c8c92599 --- /dev/null +++ b/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation_en.md @@ -0,0 +1,263 @@ +# RCNN series model parameter configuration tutorial + +Tag: Model parameter configuration + +Take `faster_rcnn_r50_fpn_1x_coco.yml` as an example. The model consists of five sub-profiles: + +- Data profile `coco_detection.yml` + +```yaml +# Data evaluation type +metric: COCO +# The number of categories in the dataset +num_classes: 80 + +# TrainDataset +TrainDataset: + !COCODataSet + # Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir) + image_dir: train2017 + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_train2017.json + # data file + dataset_dir: dataset/coco + # data_fields + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + # Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir) + image_dir: val2017 + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json + # data file file os.path.join(dataset_dir, anno_path) + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json +``` + +- Optimizer configuration file `optimizer_1x.yml` + +```yaml +# Total training epoches +epoch: 12 + +# learning rate setting +LearningRate: + # Default is 8 Gpus training learning rate + base_lr: 0.01 + # Learning rate adjustment strategy + schedulers: + - !PiecewiseDecay + gamma: 0.1 + # Position of change in learning rate (number of epoches) + milestones: [8, 11] + - !LinearWarmup + start_factor: 0.1 + steps: 1000 + +# Optimizer +OptimizerBuilder: + # Optimizer + optimizer: + momentum: 0.9 + type: Momentum + # Regularization + regularizer: + factor: 0.0001 + type: L2 +``` + +- Data reads configuration files `faster_fpn_reader.yml` + +```yaml +# Number of PROCESSES per GPU Reader +worker_num: 2 +# training data +TrainReader: + # Training data transforms + sample_transforms: + - Decode: {} + - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True} + - RandomFlip: {prob: 0.5} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + # Since the model has FPN structure, the input image needs a multiple of 32 padding + - PadBatch: {pad_to_stride: 32} + # Batch_size during training + batch_size: 1 + # Read data is out of order + shuffle: true + # Whether to discard data that does not complete the batch + drop_last: true + # Set it to false. Then you have a sequence of values for GT: List [Tensor] + collate_batch: false + +# Evaluate data +EvalReader: + # Evaluate data transforms + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + # Since the model has FPN structure, the input image needs a multiple of 32 padding + - PadBatch: {pad_to_stride: 32} + # batch_size of evaluation + batch_size: 1 + # Read data is out of order + shuffle: false + # Whether to discard data that does not complete the batch + drop_last: false + # Whether to discard unlabeled data + drop_empty: false + +# test data +TestReader: + # test data transforms + sample_transforms: + - Decode: {} + - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True} + - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]} + - Permute: {} + batch_transforms: + # Since the model has FPN structure, the input image needs a multiple of 32 padding + - PadBatch: {pad_to_stride: 32} + # batch_size of test + batch_size: 1 + # Read data is out of order + shuffle: false + # Whether to discard data that does not complete the batch + drop_last: false +``` + +- Model profile `faster_rcnn_r50_fpn.yml` + +```yaml +# Model structure type +architecture: FasterRCNN +# Pretrain model address +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams + +# FasterRCNN +FasterRCNN: + # backbone + backbone: ResNet + # neck + neck: FPN + # rpn_head + rpn_head: RPNHead + # bbox_head + bbox_head: BBoxHead + # post process + bbox_post_process: BBoxPostProcess + + +# backbone +ResNet: + # index 0 stands for res2 + depth: 50 + # norm_type, Configurable parameter: bn or sync_bn + norm_type: bn + # freeze_at index, 0 represent res2 + freeze_at: 0 + # return_idx + return_idx: [0,1,2,3] + # num_stages + num_stages: 4 + +# FPN +FPN: + # channel of FPN + out_channel: 256 + +# RPNHead +RPNHead: + # anchor generator + anchor_generator: + aspect_ratios: [0.5, 1.0, 2.0] + anchor_sizes: [[32], [64], [128], [256], [512]] + strides: [4, 8, 16, 32, 64] + # rpn_target_assign + rpn_target_assign: + batch_size_per_im: 256 + fg_fraction: 0.5 + negative_overlap: 0.3 + positive_overlap: 0.7 + use_random: True + # The parameters of the proposal are generated during training + train_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 2000 + post_nms_top_n: 1000 + topk_after_collect: True + # The parameters of the proposal are generated during evaluation + test_proposal: + min_size: 0.0 + nms_thresh: 0.7 + pre_nms_top_n: 1000 + post_nms_top_n: 1000 + +# BBoxHead +BBoxHead: + # TwoFCHead as BBoxHead + head: TwoFCHead + # roi align + roi_extractor: + resolution: 7 + sampling_ratio: 0 + aligned: True + # bbox_assigner + bbox_assigner: BBoxAssigner + +# BBoxAssigner +BBoxAssigner: + # batch_size_per_im + batch_size_per_im: 512 + # Background the threshold + bg_thresh: 0.5 + # Prospects for threshold + fg_thresh: 0.5 + # Prospects of proportion + fg_fraction: 0.25 + # Random sampling + use_random: True + +# TwoFCHead +TwoFCHead: + # TwoFCHead feature dimension + out_channel: 1024 + + +# BBoxPostProcess +BBoxPostProcess: + # decode + decode: RCNNBox + # nms + nms: + # use MultiClassNMS + name: MultiClassNMS + keep_top_k: 100 + score_threshold: 0.05 + nms_threshold: 0.5 + +``` + +- runtime configuration file `runtime.yml` + +```yaml +# Whether to use gpu +use_gpu: true +# Log Printing interval +log_iter: 20 +# save_dir +save_dir: output +# Model save interval +snapshot_epoch: 1 +``` diff --git a/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md b/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md new file mode 100644 index 000000000..dfcdd45fd --- /dev/null +++ b/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation_en.md @@ -0,0 +1,266 @@ +# YOLO series model parameter configuration tutorial + +Tag: Model parameter configuration + +Take `ppyolo_r50vd_dcn_1x_coco.yml` as an example, The model consists of five sub-profiles: + +- Data profile `coco_detection.yml` + +```yaml +# Data evaluation type +metric: COCO +# The number of categories in the dataset +num_classes: 80 + +# TrainDataset +TrainDataset: + !COCODataSet + # Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir) + image_dir: train2017 + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_train2017.json + # data file + dataset_dir: dataset/coco + # data_fields + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + # Image data path, Relative path of dataset_dir, os.path.join(dataset_dir, image_dir) + image_dir: val2017 + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json + # data file os.path.join(dataset_dir, anno_path) + dataset_dir: dataset/coco + +TestDataset: + !ImageFolder + # Annotation file path, Relative path of dataset_dir, os.path.join(dataset_dir, anno_path) + anno_path: annotations/instances_val2017.json +``` + +- Optimizer configuration file `optimizer_1x.yml` + +```yaml +# Total training epoches +epoch: 405 + +# learning rate setting +LearningRate: + # Default is 8 Gpus training learning rate + base_lr: 0.01 + # Learning rate adjustment strategy + schedulers: + - !PiecewiseDecay + gamma: 0.1 + # Position of change in learning rate (number of epoches) + milestones: + - 243 + - 324 + # Warmup + - !LinearWarmup + start_factor: 0. + steps: 4000 + +# Optimizer +OptimizerBuilder: + # Optimizer + optimizer: + momentum: 0.9 + type: Momentum + # Regularization + regularizer: + factor: 0.0005 + type: L2 +``` + +- Data reads configuration files `ppyolo_reader.yml` + +```yaml +# Number of PROCESSES per GPU Reader +worker_num: 2 +# training data +TrainReader: + inputs_def: + num_max_boxes: 50 + # Training data transforms + sample_transforms: + - Decode: {} + - Mixup: {alpha: 1.5, beta: 1.5} + - RandomDistort: {} + - RandomExpand: {fill_value: [123.675, 116.28, 103.53]} + - RandomCrop: {} + - RandomFlip: {} + # batch_transforms + batch_transforms: + - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False} + - NormalizeBox: {} + - PadBox: {num_max_boxes: 50} + - BboxXYXY2XYWH: {} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]} + # Batch size during training + batch_size: 24 + # Read data is out of order + shuffle: true + # Whether to discard data that does not complete the batch + drop_last: true + # mixup_epoch,Greater than maximum epoch, Indicates that the training process has been augmented with mixup data + mixup_epoch: 25000 + # Whether to use the shared memory to accelerate data reading, ensure that the shared memory size (such as /dev/shm) is greater than 1 GB + use_shared_memory: true + +# Evaluate data +EvalReader: + # Evaluating data transforms + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + # Batch_size during evaluation + batch_size: 8 + # Whether to discard unlabeled data + drop_empty: false + +# test data +TestReader: + inputs_def: + image_shape: [3, 608, 608] + # test data transforms + sample_transforms: + - Decode: {} + - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} + - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True} + - Permute: {} + # batch_size during training + batch_size: 1 +``` + +- Model profile `ppyolo_r50vd_dcn.yml` + +```yaml +# Model structure type +architecture: YOLOv3 +# Pretrain model address +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams +# norm_type +norm_type: sync_bn +# Whether to use EMA +use_ema: true +# ema_decay +ema_decay: 0.9998 + +# YOLOv3 +YOLOv3: + # backbone + backbone: ResNet + # neck + neck: PPYOLOFPN + # yolo_head + yolo_head: YOLOv3Head + # post_process + post_process: BBoxPostProcess + + +# backbone +ResNet: + # depth + depth: 50 + # variant + variant: d + # return_idx, 0 represent res2 + return_idx: [1, 2, 3] + # dcn_v2_stages + dcn_v2_stages: [3] + # freeze_at + freeze_at: -1 + # freeze_norm + freeze_norm: false + # norm_decay + norm_decay: 0. + +# PPYOLOFPN +PPYOLOFPN: + # whether coord_conv or not + coord_conv: true + # whether drop_block or not + drop_block: true + # block_size + block_size: 3 + # keep_prob + keep_prob: 0.9 + # whether spp or not + spp: true + +# YOLOv3Head +YOLOv3Head: + # anchors + anchors: [[10, 13], [16, 30], [33, 23], + [30, 61], [62, 45], [59, 119], + [116, 90], [156, 198], [373, 326]] + # anchor_masks + anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]] + # loss + loss: YOLOv3Loss + # whether to use iou_aware + iou_aware: true + # iou_aware_factor + iou_aware_factor: 0.4 + +# YOLOv3Loss +YOLOv3Loss: + # ignore_thresh + ignore_thresh: 0.7 + # downsample + downsample: [32, 16, 8] + # whether label_smooth or not + label_smooth: false + # scale_x_y + scale_x_y: 1.05 + # iou_loss + iou_loss: IouLoss + # iou_aware_loss + iou_aware_loss: IouAwareLoss + +# IouLoss +IouLoss: + loss_weight: 2.5 + loss_square: true + +# IouAwareLoss +IouAwareLoss: + loss_weight: 1.0 + +# BBoxPostProcess +BBoxPostProcess: + decode: + name: YOLOBox + conf_thresh: 0.01 + downsample_ratio: 32 + clip_bbox: true + scale_x_y: 1.05 + # nms setting + nms: + name: MatrixNMS + keep_top_k: 100 + score_threshold: 0.01 + post_threshold: 0.01 + nms_top_k: -1 + background_label: -1 + +``` + +- Runtime file `runtime.yml` + +```yaml +# Whether to use gpu +use_gpu: true +# Log Printing interval +log_iter: 20 +# save_dir +save_dir: output +# Model save interval +snapshot_epoch: 1 +``` diff --git a/static/dataset/fddb/download.sh b/static/dataset/fddb/download.sh index 29375d791..7a40c8b05 100755 --- a/static/dataset/fddb/download.sh +++ b/static/dataset/fddb/download.sh @@ -13,7 +13,7 @@ cd "$DIR" # Download the data. echo "Downloading..." -# external link to the Faces in the Wild data set and annotations file +# external link to the Faces in the Wild dataset and annotations file wget http://tamaraberg.com/faceDataset/originalPics.tar.gz wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz wget http://vis-www.cs.umass.edu/fddb/evaluation.tgz diff --git a/static/docs/featured_model/FACE_DETECTION_en.md b/static/docs/featured_model/FACE_DETECTION_en.md index cc7b6f26a..09ae7de1e 100644 --- a/static/docs/featured_model/FACE_DETECTION_en.md +++ b/static/docs/featured_model/FACE_DETECTION_en.md @@ -197,7 +197,7 @@ matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;" ``` #### Evaluate on the FDDB -We provide a FDDB data set evaluation process (currently only supports Linux systems), +We provide a FDDB dataset evaluation process (currently only supports Linux systems), please refer to [FDDB official website](http://vis-www.cs.umass.edu/fddb/) for other specific details. - 1)Download and install OpenCV: diff --git a/static/docs/featured_model/champion_model/CACascadeRCNN_en.md b/static/docs/featured_model/champion_model/CACascadeRCNN_en.md new file mode 100644 index 000000000..238cf791b --- /dev/null +++ b/static/docs/featured_model/champion_model/CACascadeRCNN_en.md @@ -0,0 +1,45 @@ +# CACascade RCNN +## Intorduction +Objects365 2019 Challenge CACascade RCNN is one of the best single models won by Baidu Visual Technology Department in Objects365 2019 Challenge. Objects365 is a new dataset in the field of universal object detection, which aims to promote detection research on different objects in natural scenes. Objects365 marks 365 object classes on 630,000 images, and there are more than 10 million boundary boxes in the training set. This is one of the best single models in Full Track. + +![](../../images/obj365_gt.png) + +## Methods described + +According to the characteristics of large-scale object detection algorithm, we propose a Class Aware Sampling method based on the number of object categories contained in the image. Training model based on this method can make the model converge to a better effect in a shorter time. + +![](../../images/cas.png) + +The best single model published this time is a two-stage detection model based on Cascade RCNN, which replaces Backbone with a more powerful SENet154 model, Deformable Conv module and a more complex two-stage network structure. As the Batch Size is relatively small, Group Normalization operation is added and multi-scale training is used, which has achieved very good results. The pre-training model was trained on ImageNet and COCO dataset successively, among which Mask branch was added in COCO dataset training, and the rest structure was the same as CACascade RCNN, which was automatically downloaded when the training started. + +## Method of use + +1.Data preparation + +Data need to be [Objects365 official website](https://www.objects365.org/download.html) to apply for download, download data after placing data in a dataset directory. +``` +${THIS REPO ROOT} + \--dataset + \-- objects365 + \-- annotations + |-- train.json + |-- val.json + \-- train + \-- val +``` + +2.Priming training model + +```bash +python tools/train.py -c configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas.yml +``` + +3.Model prediction results + +| Model | Val set mAP | Download link | Configuration File | +| :-----------------: | :--------: | :----------------------------------------------------------: | :--------: | +| CACascadeRCNN SE154 | 31.6 | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas_obj365.tar) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas.yml) | + +## Model effect + +![](../../images/obj365_pred.png) diff --git a/static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL_en.md b/static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL_en.md new file mode 100644 index 000000000..468ac94b1 --- /dev/null +++ b/static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL_en.md @@ -0,0 +1,52 @@ +# CascadeCA RCNN +## Introduction +CascadeCA RCNN is the best single model of Baidu Visual Technology Department in Google AI Open Images 2019 Object Detction competition. This single model helped the team win the second place among more than 500 parameter teams. Open Images Dataset V5(OIDV5) contains 500 categories, 173W training Images and more than 1400W labeled borders. It is the largest Open Dataset of object detection known at present. Dataset address [https://storage.googleapis.com/openimages/web/index.html](https://storage.googleapis.com/openimages/web/index.html), Address of team's technical proposal report in competition [https://arxiv.org/pdf/1911.07171.pdf](https://arxiv.org/pdf/1911.07171.pdf) + +![](../../images/oidv5_gt.png) + +## Methods described +This model combines the current better detection methods. Specifically, it uses ResNet200-vd as the backbone of the detection model, The imagenet classification the training model in [here](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/README_en.md) download; CascadeCA RCNN, Feature Pyramid Networks, Non-local, Deformable-V2 and other methods are combined. It should be noted here that the standard CascadeRCNN only predicts two boxes (foreground and background, using the score information to determine the category to which the final foreground belongs), while this model separately predicts one box (Cascade Class Aware) for each category. The final block diagram of the model is shown in the figure below. + +![](../../images/oidv5_model_framework.png) + + +Due to the serious category imbalance of OIDV5, the strategy of dynamic sampling is adopted to select samples and carry out training. Multi-scale training is used to solve the problem of large border area. In addition, the team used Libra Loss instead of Smooth L1 Loss to calculate the loss of the prediction box; In the prediction, SoftNMS method is used for post-processing to ensure that more boxes can be recalled. + +About 189 categories of Objects365 Dataset and OIDV5 are repeated, so the two datasets are combined for training to expand the training data of OIDV5. Finally, the model and its performance indicators are shown in the following table. More specific model training and integration strategies can be seen: [OIDV5 technical report](https://arxiv.org/pdf/1911.07171.pdf)。 + +The training results of OIDV5 model are as follows. + + +| Model structure | Public/Private Score | Download link | Configuration File | +| :-----------------: | :--------: | :----------------------------------------------------------: | :--------: | +| CascadeCARCNN-FPN-Dcnv2-Nonlocal ResNet200-vd | 0.62690/0.59459 | [model](https://paddlemodels.bj.bcebos.com/object_detection/oidv5_cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.tar) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/oidv5/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml) | + + +In addition, to verify the performance of the model, Paddle Detection also trained models for COCO2017 and Objects365 Dataset based on the model structure. The model and validation set indicators are shown in the following table. + +| Model structure | Dataset | val set mAP | Download link | Configuration File | +| :-----------------: | :--------: | :--------: | :----------------------------------------------------------: | :--------: | +| CascadeCARCNN-FPN-Dcnv2-Nonlocal ResNet200-vd | COCO2017 | 51.7% | [Model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.tar) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/dcn/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml) | +| CascadeCARCNN-FPN-Dcnv2-Nonlocal ResNet200-vd | Objects365 | 34.5% | [Model](https://paddlemodels.bj.bcebos.com/object_detection/obj365_cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.tar) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/obj365/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml) | + +COCO and Objects365 Dataset have the same data format. Currently, they only support prediction and evaluation. + +## Method of use + +OIDV5 dataset format is different from COCO, currently only single image prediction is supported. OIDV5 model evaluation method can be referred to [documentation](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/challenge_evaluation.md) + +1. Download the model and unzip it. + +2. Run the prediction program. + +```bash +python -u tools/infer.py -c configs/oidv5/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml -o weights=./oidv5_cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/ --infer_img=demo/000000570688.jpg +``` + +The folder where the model is located needs to be modified according to its position. + +Detection result images can be viewed in the `output` folder. + +## Model detection effect + +![](../../images/oidv5_pred.jpg) diff --git a/static/docs/tutorials/QUICK_STARTED.md b/static/docs/tutorials/QUICK_STARTED.md index d8ca71377..b0390ae09 100644 --- a/static/docs/tutorials/QUICK_STARTED.md +++ b/static/docs/tutorials/QUICK_STARTED.md @@ -1,7 +1,7 @@ English | [简体中文](QUICK_STARTED_cn.md) # Quick Start -In order to enable users to quickly produce models in a short time and master the use of PaddleDetection, this tutorial uses a pre-trained detection model to finetune small data sets. A good model can be produced in a short period of time. In actual business, it is recommended that users select a suitable model configuration file for adaptation according to their needs. +In order to enable users to quickly produce models in a short time and master the use of PaddleDetection, this tutorial uses a pre-trained detection model to finetune small datasets. A good model can be produced in a short period of time. In actual business, it is recommended that users select a suitable model configuration file for adaptation according to their needs. - **Set GPU** ```bash export CUDA_VISIBLE_DEVICES=0 -- GitLab