未验证 提交 7eadb565 编写于 作者: G Guanghua Yu 提交者: GitHub

fix all documentation (#196)

* fix all documentation

* fix docs conf.py

* fix some error

* fix readthedocs
上级 2cda4b28
......@@ -58,7 +58,9 @@ coverage.xml
*.json
*.tar
*.pyc
.idea/
dataset/coco/annotations
dataset/coco/train2017
......
......@@ -7,7 +7,7 @@ PaddleDetection的目的是为工业界和学术界提供丰富、易用的目
**目前检测库下模型均要求使用PaddlePaddle 1.6及以上版本或适当的develop版本。**
<div align="center">
<img src="demo/output/000000570688.jpg" />
<img src="docs/images/000000570688.jpg" />
</div>
......@@ -52,87 +52,49 @@ PaddleDetection的目的是为工业界和学术界提供丰富、易用的目
**注意:** Synchronized batch normalization 只能在多GPU环境下使用,不能在CPU环境或者单GPU环境下使用。
## 文档教程
## 使用教程
**最新动态:** 已发布文档教程:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
- [安装说明](docs/INSTALL_cn.md)
- [快速开始](docs/QUICK_STARTED_cn.md)
- [训练、评估流程](docs/GETTING_STARTED_cn.md)
- [数据预处理及自定义数据集](docs/DATA_cn.md)
- [配置模块设计和介绍](docs/CONFIG_cn.md)
- [详细的配置信息和参数说明示例](docs/config_example/)
### 入门教程
- [安装说明](docs/tutorials/INSTALL_cn.md)
- [快速开始](docs/tutorials/QUICK_STARTED_cn.md)
- [训练/评估/预测流程](docs/tutorials/GETTING_STARTED_cn.md)
### 进阶教程
- [数据预处理及自定义数据集](docs/advanced_tutorials/READER.md)
- [搭建模型步骤](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- [配置模块设计和介绍](docs/advanced_tutorials/CONFIG_cn.md)
- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
- [迁移学习教程](docs/TRANSFER_LEARNING_cn.md)
- [迁移学习教程](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md)
- [模型压缩](slim)
- [量化训练压缩示例](slim/quantization)
- [剪枝压缩示例](slim/prune)
- [蒸馏压缩示例](slim/distillation)
- [神经网络搜索示例](slim/nas)
- [推理部署](inference)
- [模型导出教程](docs/advanced_tutorials/inference/EXPORT_MODEL.md)
- [模型预测](docs/advanced_tutorials/inference/INFERENCE.md)
- [C++推理部署](inference/README.md)
- [推理Benchmark](docs/advanced_tutorials/inference/BENCHMARK_INFER_cn.md)
## 模型库
- [模型库](docs/MODEL_ZOO_cn.md)
- [人脸检测模型](configs/face_detection/README.md)
- [行人检测和车辆检测预训练模型](contrib/README_cn.md) 针对不同场景的检测模型
- [YOLOv3增强模型](docs/YOLOv3_ENHANCEMENT.md) 改进原始YOLOv3,精度达到41.4%,原论文精度为33.0%,同时预测速度也得到提升
- [Objects365 2019 Challenge夺冠模型](docs/CACascadeRCNN.md) Objects365 Full Track任务中最好的单模型之一,精度达到31.7%
- [Open Images V5和Objects365数据集模型](docs/OIDV5_BASELINE_MODEL.md)
## 模型压缩
- [量化训练压缩示例](slim/quantization)
- [剪枝压缩示例](slim/prune)
## 推理部署
- [模型导出教程](docs/EXPORT_MODEL.md)
- [C++推理部署](inference/README.md)
## Benchmark
- [推理Benchmark](docs/BENCHMARK_INFER_cn.md)
- [YOLOv3增强模型](docs/featured_model/YOLOv3_ENHANCEMENT.md) 改进原始YOLOv3,精度达到41.4%,原论文精度为33.0%,同时预测速度也得到提升
- [Objects365 2019 Challenge夺冠模型](docs/featured_model/CACascadeRCNN.md) Objects365 Full Track任务中最好的单模型之一,精度达到31.7%
- [Open Images V5和Objects365数据集模型](docs/featured_model/OIDV5_BASELINE_MODEL.md)
## 许可证书
本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
## 版本更新
### 12/2019
- 增加Res2Net模型。
- 增加HRNet模型。
- 增加GIOU loss和DIOU loss。
### 21/11/2019
- 增加CascadeClsAware RCNN模型。
- 增加CBNet,ResNet200和Non-local模型。
- 增加SoftNMS。
- 增加Open Image V5数据集和Objects365数据集模型。
### 10/2019
- 增加增强版YOLOv3模型,精度高达41.4%。
- 增加人脸检测模型BlazeFace、Faceboxes。
- 丰富基于COCO的模型,精度高达51.9%。
- 增加Objects365 2019 Challenge上夺冠的最佳单模型之一CACascade-RCNN。
- 增加行人检测和车辆检测预训练模型。
- 支持FP16训练。
- 增加跨平台的C++推理部署方案。
- 增加模型压缩示例。
### 2/9/2019
- 增加GroupNorm模型。
- 增加CascadeRCNN+Mask模型。
#### 5/8/2019
- 增加Modulated Deformable Convolution系列模型。
#### 29/7/2019
- 增加检测库中文文档
- 修复R-CNN系列模型训练同时进行评估的问题
- 新增ResNext101-vd + Mask R-CNN + FPN模型
- 新增基于VOC数据集的YOLOv3模型
#### 3/7/2019
- 首次发布PaddleDetection检测库和检测模型库
- 模型包括:Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD.
v0.2.0版本已经在`01/2020`发布,增加多个模型,升级数据处理模块,拆分YOLOv3的loss,修复已知诸多bug等,
详细内容请参考[版本更新文档](docs/CHANGELOG.md)
## 如何贡献代码
......
......@@ -10,7 +10,7 @@ flexible, catering to research needs.
**Now all models in PaddleDetection require PaddlePaddle version 1.6 or higher, or suitable develop version.**
<div align="center">
<img src="demo/output/000000570688.jpg" />
<img src="docs/images/000000570688.jpg" />
</div>
......@@ -62,16 +62,33 @@ Advanced Features:
**NOTE:** Synchronized batch normalization can only be used on multiple GPU devices, can not be used on CPU devices or single GPU device.
## Get Started
## Tutorials
- [Installation guide](docs/INSTALL.md)
- [Quick start on small dataset](docs/QUICK_STARTED.md)
- For detailed training and evaluation workflow, please refer to [GETTING_STARTED](docs/GETTING_STARTED.md)
- [Guide to preprocess pipeline and custom dataset](docs/DATA.md)
- [Introduction to the configuration workflow](docs/CONFIG.md)
- [Examples for detailed configuration explanation](docs/config_example/)
**News:** Documentation:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
### Get Started
- [Installation guide](docs/tutorials/INSTALL.md)
- [Quick start on small dataset](docs/tutorials/QUICK_STARTED.md)
- For detailed training and evaluation workflow, please refer to [GETTING_STARTED](docs/tutorials/GETTING_STARTED.md)
### Advanced Tutorial
- [Guide to preprocess pipeline and custom dataset](docs/advanced_tutorials/READER.md)
- [Models technical](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- [Introduction to the configuration workflow](docs/advanced_tutorials/CONFIG.md)
- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
- [Transfer learning document](docs/TRANSFER_LEARNING.md)
- [Transfer learning document](docs/advanced_tutorials/TRANSFER_LEARNING.md)
- [Model compression](slim)
- [Quantization-aware training example](slim/quantization)
- [Model pruning example](slim/prune)
- [Model distillation example](slim/distillation)
- [Neural Architecture Search example](slim/nas)
- [Deployment](inference)
- [Export model for inference](docs/advanced_tutorials/inference/EXPORT_MODEL.md)
- [Model inference](docs/advanced_tutorials/inference/INFERENCE.md)
- [C++ inference](inference/README.md)
- [Inference benchmark](docs/advanced_tutorials/inference/BENCHMARK_INFER_cn.md)
## Model Zoo
......@@ -82,68 +99,13 @@ Advanced Features:
- [Objects365 2019 Challenge champion model](docs/CACascadeRCNN.md) One of the best single models in Objects365 Full Track of which MAP reaches 31.7%.
- [Open Images Dataset V5 and Objects365 Dataset models](docs/OIDV5_BASELINE_MODEL.md)
## Model compression
- [Quantization-aware training example](slim/quantization)
- [Model pruning example](slim/prune)
## Deployment
- [Export model for inference](docs/EXPORT_MODEL.md)
- [C++ inference](inference/README.md)
## Benchmark
- [Inference benchmark](docs/BENCHMARK_INFER_cn.md)
## License
PaddleDetection is released under the [Apache 2.0 license](LICENSE).
## Updates
#### 12/2019
- Add Res2Net model.
- Add HRNet model.
- Add GIOU loss and DIOU loss.
#### 21/11/2019
- Add CascadeClsAware RCNN model.
- Add CBNet, ResNet200 and Non-local model.
- Add SoftNMS.
- Add models of Open Images Dataset V5 and Objects365 Dataset.
#### 10/2019
- Add enhanced YOLOv3 models, box mAP up to 41.4%.
- Face detection models included: BlazeFace, Faceboxes.
- Enrich COCO models, box mAP up to 51.9%.
- Add CACacascade RCNN, one of the best single model of Objects365 2019 challenge Full Track champion.
- Add pretrained models for pedestrian and vehicle detection.
- Support mixed-precision training.
- Add C++ inference depolyment.
- Add model compression examples.
#### 2/9/2019
- Add retrained models for GroupNorm.
- Add Cascade-Mask-RCNN+FPN.
#### 5/8/2019
- Add a series of models ralated modulated Deformable Convolution.
#### 29/7/2019
- Update Chinese docs for PaddleDetection
- Fix bug in R-CNN models when train and test at the same time
- Add ResNext101-vd + Mask R-CNN + FPN models
- Add YOLOv3 on VOC models
#### 3/7/2019
- Initial release of PaddleDetection and detection model zoo
- Models included: Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, and SSD.
v0.2.0 was released at `01/2020`, add some models,Upgrade data processing module, Split YOLOv3's loss, fix many known bugs, etc.
Please refer to [版本更新文档](docs/CHANGELOG.md) for details.
## Contributing
......
[English](README_en.md) | 简体中文
# FaceDetection
## 内容
- [简介](#简介)
- [模型库与基线](#模型库与基线)
- [快速开始](#快速开始)
- [数据准备](#数据准备)
- [训练与推理](#训练与推理)
- [评估](#评估)
- [算法细节](#算法细节)
- [如何贡献代码](#如何贡献代码)
## 简介
FaceDetection的目标是提供高效、高速的人脸检测解决方案,包括最先进的模型和经典模型。
<div align="center">
<img src="../../demo/output/12_Group_Group_12_Group_Group_12_935.jpg" />
</div>
## 模型库与基线
下表中展示了PaddleDetection当前支持的网络结构,具体细节请参考[算法细节](#算法细节)
| | 原始版本 | Lite版本 <sup>[1](#lite)</sup> | NAS版本 <sup>[2](#nas)</sup> |
|:------------------------:|:--------:|:--------------------------:|:------------------------:|
| [BlazeFace](#BlazeFace) | ✓ | ✓ | ✓ |
| [FaceBoxes](#FaceBoxes) | ✓ | ✓ | x |
<a name="lite">[1]</a> `Lite版本`表示减少网络层数和通道数。
<a name="nas">[2]</a> `NA版本`表示使用 `神经网络搜索`方法来构建网络结构。
### 模型库
#### WIDER-FACE数据集上的mAP
| 网络结构 | 类型 | 输入尺寸 | 图片个数/GPU | 学习率策略 | Easy Set | Medium Set | Hard Set | 下载 |
|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:|
| BlazeFace | 原始版本 | 640 | 8 | 32w | **0.915** | **0.892** | **0.797** | [模型](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_original.tar) |
| BlazeFace | Lite版本 | 640 | 8 | 32w | 0.909 | 0.885 | 0.781 | [模型](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) |
| BlazeFace | NAS版本 | 640 | 8 | 32w | 0.837 | 0.807 | 0.658 | [模型](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) |
| FaceBoxes | 原始版本 | 640 | 8 | 32w | 0.878 | 0.851 | 0.576 | [模型](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_original.tar) |
| FaceBoxes | Lite版本 | 640 | 8 | 32w | 0.901 | 0.875 | 0.760 | [模型](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_lite.tar) |
**注意:**
- 我们使用`tools/face_eval.py`中多尺度评估策略得到`Easy/Medium/Hard Set`里的mAP。具体细节请参考[在WIDER-FACE数据集上评估](#在WIDER-FACE数据集上评估)
- BlazeFace-Lite的训练与测试使用 [blazeface.yml](../../configs/face_detection/blazeface.yml)配置文件并且设置:`lite_edition: true`
#### FDDB数据集上的mAP
| 网络结构 | Type | Size | DistROC | ContROC |
|:------------:|:--------:|:----:|:-------:|:-------:|
| BlazeFace | 原始版本 | 640 | **0.992** | **0.762** |
| BlazeFace | Lite版本 | 640 | 0.990 | 0.756 |
| BlazeFace | NAS版本 | 640 | 0.981 | 0.741 |
| FaceBoxes | 原始版本 | 640 | 0.987 | 0.736 |
| FaceBoxes | Lite版本 | 640 | 0.988 | 0.751 |
**注意:**
- 我们在FDDB数据集上使用多尺度测试的方法得到mAP,具体细节请参考[在FDDB数据集上评估](#在FDDB数据集上评估)
#### 推理时间和模型大小比较
| 网络结构 | 类型 | 输入尺寸 | P4(trt32) (ms) | CPU (ms) | 高通骁龙855(armv8) (ms) | 模型大小(MB) |
|:------------:|:--------:|:----:|:--------------:|:--------:|:-------------------------------------:|:---------------:|
| BlazeFace | 原始版本 | 128 | 1.387 | 23.461 | 6.036 | 0.777 |
| BlazeFace | Lite版本 | 128 | 1.323 | 12.802 | 6.193 | 0.68 |
| BlazeFace | NAS版本 | 128 | 1.03 | 6.714 | 2.7152 | 0.234 |
| FaceBoxes | 原始版本 | 128 | 3.144 | 14.972 | 19.2196 | 3.6 |
| FaceBoxes | Lite版本 | 128 | 2.295 | 11.276 | 8.5278 | 2 |
| BlazeFace | 原始版本 | 320 | 3.01 | 132.408 | 70.6916 | 0.777 |
| BlazeFace | Lite版本 | 320 | 2.535 | 69.964 | 69.9438 | 0.68 |
| BlazeFace | NAS版本 | 320 | 2.392 | 36.962 | 39.8086 | 0.234 |
| FaceBoxes | 原始版本 | 320 | 7.556 | 84.531 | 52.1022 | 3.6 |
| FaceBoxes | Lite版本 | 320 | 18.605 | 78.862 | 59.8996 | 2 |
| BlazeFace | 原始版本 | 640 | 8.885 | 519.364 | 149.896 | 0.777 |
| BlazeFace | Lite版本 | 640 | 6.988 | 284.13 | 149.902 | 0.68 |
| BlazeFace | NAS版本 | 640 | 7.448 | 142.91 | 69.8266 | 0.234 |
| FaceBoxes | 原始版本 | 640 | 78.201 | 394.043 | 169.877 | 3.6 |
| FaceBoxes | Lite版本 | 640 | 59.47 | 313.683 | 139.918 | 2 |
**注意:**
- CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz。
- P4(trt32)和CPU的推理时间测试基于PaddlePaddle-1.6.1版本。
- ARM测试环境:
- 高通骁龙855(armv8);
- 单线程;
- Paddle-Lite 2.0.0版本。
## 快速开始
### 数据准备
我们使用[WIDER-FACE数据集](http://shuoyang1213.me/WIDERFACE/)进行训练和模型测试,官方网站提供了详细的数据介绍。
- WIDER-Face数据源:
使用如下目录结构加载`wider_face`类型的数据集:
```
dataset/wider_face/
├── wider_face_split
│ ├── wider_face_train_bbx_gt.txt
│ ├── wider_face_val_bbx_gt.txt
├── WIDER_train
│ ├── images
│ │ ├── 0--Parade
│ │ │ ├── 0_Parade_marchingband_1_100.jpg
│ │ │ ├── 0_Parade_marchingband_1_381.jpg
│ │ │ │ ...
│ │ ├── 10--People_Marching
│ │ │ ...
├── WIDER_val
│ ├── images
│ │ ├── 0--Parade
│ │ │ ├── 0_Parade_marchingband_1_1004.jpg
│ │ │ ├── 0_Parade_marchingband_1_1045.jpg
│ │ │ │ ...
│ │ ├── 10--People_Marching
│ │ │ ...
```
- 手动下载数据集:
要下载WIDER-FACE数据集,请运行以下命令:
```
cd dataset/wider_face && ./download.sh
```
- 自动下载数据集:
如果已经开始训练,但是数据集路径设置不正确或找不到路径, PaddleDetection会从[WIDER-FACE数据集](http://shuoyang1213.me/WIDERFACE/)自动下载它们,
下载后解压的数据集将缓存在`~/.cache/paddle/dataset/`中,并且之后的训练测试会自动加载它们。
#### 数据增强方法
- **尺度变换(Data-anchor-sampling):**
具体操作是:根据随机选择的人脸高和宽,获取到$v=\sqrt{width * height}$,之后再判断`v`的值范围,其中`v`值位于缩放区间`[16,32,64,128]`
假设`v=45`,则选定`32<v<64`, 以均匀分布的概率选取`[16,32,64]`中的任意一个值。若选中`64`,则该人脸的缩放区间在`[64 / 2, min(v * 2, 64 * 2)]`中选定。
- **其他方法:** 包括随机扰动、翻转、裁剪等。具体请参考[DATA_cn.md](../../docs/DATA_cn.md#APIs)
### 训练与推理
训练流程与推理流程方法与其他算法一致,请参考[GETTING_STARTED_cn.md](../../docs/GETTING_STARTED_cn.md)
**注意:**
- `BlazeFace``FaceBoxes`训练是以每卡`batch_size=8`在4卡GPU上进行训练(总`batch_size`是32),并且训练320000轮
(如果你的GPU数达不到4,请参考[学习率计算规则表](../../docs/GETTING_STARTED_cn.md#faq))。
- 人脸检测模型目前我们不支持边训练边评估。
### 评估
目前我们支持在`WIDER FACE`数据集和`FDDB`数据集上评估。首先运行`tools/face_eval.py`生成评估结果文件,其次使用matlab(WIDER FACE)
或OpenCV(FDDB)计算具体的评估指标。
其中,运行`tools/face_eval.py`的参数列表如下:
- `-f` 或者 `--output_eval`: 评估生成的结果文件保存路径,默认是: `output/pred`
- `-e` 或者 `--eval_mode`: 评估模式,包括 `widerface``fddb`,默认是`widerface`
- `--multi_scale`: 如果在命令中添加此操作按钮,它将选择多尺度评估。默认值为`False`,它将选择单尺度评估。
#### 在WIDER-FACE数据集上评估
评估并生成结果文件:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/face_eval.py -c configs/face_detection/blazeface.yml \
-o weights=output/blazeface/model_final/ \
--eval_mode=widerface
```
评估完成后,将在`output/pred`中生成txt格式的测试结果。
- 下载官方评估脚本来评估AP指标:
```
wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
unzip eval_tools.zip && rm -f eval_tools.zip
```
-`eval_tools/wider_eval.m`中修改保存结果路径和绘制曲线的名称:
```
# Modify the folder name where the result is stored.
pred_dir = './pred';
# Modify the name of the curve to be drawn
legend_name = 'Fluid-BlazeFace';
```
- `wider_eval.m` 是评估模块的主要执行程序。运行命令如下:
```
matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
```
#### 在FDDB数据集上评估
我们提供了一套FDDB数据集的评估流程(目前仅支持Linux系统),其他具体细节请参考[FDDB官网](http://vis-www.cs.umass.edu/fddb/)
- 1)下载安装opencv:
下载OpenCV: 进入[OpenCV library](https://opencv.org/releases/)手动下载
安装OpenCV:请参考[OpenCV官方安装教程](https://docs.opencv.org/master/d7/d9f/tutorial_linux_install.html)通过源码安装。
- 2)下载数据集、评估代码以及格式化数据:
```
./dataset/fddb/download.sh
```
- 3)编译FDDB评估代码:
进入`dataset/fddb/evaluation`目录下,修改MakeFile文件中内容如下:
```
evaluate: $(OBJS)
$(CC) $(OBJS) -o $@ $(LIBS)
```
修改`common.hpp`中内容为如下形式:
```
#define __IMAGE_FORMAT__ ".jpg"
//#define __IMAGE_FORMAT__ ".ppm"
#define __CVLOADIMAGE_WORKING__
```
根据`grep -r "CV_RGB"`命令找到含有`CV_RGB`的代码段,将`CV_RGB`改成`Scalar`,并且在cpp中加入`using namespace cv;`
然后编译:
```
make clean && make
```
- 4)开始评估:
修改config文件中`dataset_dir``annotation`字段内容:
```
EvalReader:
...
dataset:
dataset_dir: dataset/fddb
anno_path: FDDB-folds/fddb_annotFile.txt
...
```
评估并生成结果文件:
```
python -u tools/face_eval.py -c configs/face_detection/blazeface.yml \
-o weights=output/blazeface/model_final/ \
--eval_mode=fddb
```
评估完成后,将在`output/pred/pred_fddb_res.txt`中生成txt格式的测试结果。
生成ContROC与DiscROC数据:
```
cd dataset/fddb/evaluation
./evaluate -a ./FDDB-folds/fddb_annotFile.txt \
-f 0 -i ./ -l ./FDDB-folds/filePath.txt -z .jpg \
-d {RESULT_FILE} \
-r {OUTPUT_DIR}
```
**注意:**
(1)`RESULT_FILE``tools/face_eval.py`输出的FDDB预测结果文件;
(2)`OUTPUT_DIR`是FDDB评估输出结果文件前缀,会生成两个文件`{OUTPUT_DIR}ContROC.txt``{OUTPUT_DIR}DiscROC.txt`
(3)参数用法及注释可通过执行`./evaluate --help`来获取。
## 算法细节
### BlazeFace
**简介:**
[BlazeFace](https://arxiv.org/abs/1907.05047) 是Google Research发布的人脸检测模型。它轻巧并且性能良好,
专为移动GPU推理量身定制。在旗舰设备上,速度可达到200-1000+FPS。
**特点:**
- 锚点策略在8×8(输入128x128)的特征图上停止,在该分辨率下每个像素点6个锚点;
- 5个单BlazeBlock和6个双BlazeBlock:5×5 depthwise卷积,可以保证在相同精度下网络层数更少;
- 用混合策略替换非极大值抑制算法,该策略将边界框的回归参数估计为重叠预测之间的加权平均值。
**版本信息:**
- 原始版本: 参考原始论文复现;
- Lite版本: 使用3x3卷积替换5x5卷积,更少的网络层数和通道数;
- NAS版本: 使用神经网络搜索算法构建网络结构,相比于`Lite`版本,NAS版本需要更少的网络层数和通道数。
### FaceBoxes
**简介:**
[FaceBoxes](https://arxiv.org/abs/1708.05234) 由Shifeng Zhang等人提出的高速和高准确率的人脸检测器,
被称为“高精度CPU实时人脸检测器”。 该论文收录于IJCB(2017)。
**特点:**
- 锚点策略分别在20x20、10x10、5x5(输入640x640)执行,每个像素点分别是3、1、1个锚点,对应密度系数是`1, 2, 4`(20x20)、4(10x10)、4(5x5);
- 在基础网络中个别block中使用CReLU和inception的结构;
- 使用密度先验盒(density_prior_box)可提高检测精度。
**版本信息:**
- 原始版本: 参考原始论文复现;
- Lite版本: 使用更少的网络层数和通道数,具体可参考[代码](../../ppdet/modeling/architectures/faceboxes.py)
## 如何贡献代码
我们非常欢迎您可以为PaddleDetection中的人脸检测模型提供代码,您可以提交PR供我们review;也十分感谢您的反馈,可以提交相应issue,我们会及时解答。
**文档教程请参考:** [FACE_DETECTION.md](../../docs/featured_model/FACE_DETECTION.md) <br/>
**English document please refer:** [FACE_DETECTION_en.md](../../docs/featured_model/FACE_DETECTION_en.md)
......@@ -37,7 +37,7 @@ EvalReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'im_shape']
# for voc
#fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult']
#fields: ['image', 'im_info', 'im_id', 'im_shape', 'gt_bbox', 'gt_class', 'is_difficult']
dataset:
!COCODataSet
image_dir: val2017
......
......@@ -33,7 +33,7 @@ EvalReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'im_shape']
# for voc
#fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_difficult']
#fields: ['image', 'im_info', 'im_id', 'im_shape', 'gt_bbox', 'gt_class', 'is_difficult']
dataset:
!COCODataSet
image_dir: val2017
......
# PaddleDetection applied for specific scenarios
We provide some models implemented by PaddlePaddle to detect objects in specific scenarios, users can download the models and use them in these scenarios.
| Task | Algorithm | Box AP | Download |
|:---------------------|:---------:|:------:| :-------------------------------------------------------------------------------------: |
| Vehicle Detection | YOLOv3 | 54.5 | [model](https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar) |
| Pedestrian Detection | YOLOv3 | 51.8 | [model](https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar) |
## Vehicle Detection
One of major applications of vehichle detection is traffic monitoring. In this scenary, vehicles to be detected are mostly captured by the cameras mounted on top of traffic light columns.
### 1. Network
The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53.
### 2. Configuration for training
PaddleDetection provides users with a configuration file [yolov3_darknet.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_darknet.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for vehicle detection:
* max_iters: 120000
* num_classes: 6
* anchors: [[8, 9], [10, 23], [19, 15], [23, 33], [40, 25], [54, 50], [101, 80], [139, 145], [253, 224]]
* label_smooth: false
* nms/nms_top_k: 400
* nms/score_threshold: 0.005
* milestones: [60000, 80000]
* dataset_dir: dataset/vehicle
### 3. Accuracy
The accuracy of the model trained and evaluated on our private data is shown as followed:
AP at IoU=.50:.05:.95 is 0.545.
AP at IoU=.50 is 0.764.
### 4. Inference
Users can employ the model to conduct the inference:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/infer.py -c contrib/VehicleDetection/vehicle_yolov3_darknet.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar \
--infer_dir contrib/VehicleDetection/demo \
--draw_threshold 0.2 \
--output_dir contrib/VehicleDetection/demo/output
```
Some inference results are visualized below:
![](VehicleDetection/demo/output/001.jpeg)
![](VehicleDetection/demo/output/005.png)
## Pedestrian Detection
The main applications of pedetestrian detection include intelligent monitoring. In this scenary, photos of pedetestrians are taken by surveillance cameras in public areas, then pedestrian detection are conducted on these photos.
### 1. Network
The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53.
### 2. Configuration for training
PaddleDetection provides users with a configuration file [yolov3_darknet.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_darknet.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for pedestrian detection:
* max_iters: 200000
* num_classes: 1
* snapshot_iter: 5000
* milestones: [150000, 180000]
* dataset_dir: dataset/pedestrian
### 3. Accuracy
The accuracy of the model trained and evaluted on our private data is shown as followed:
AP at IoU=.50:.05:.95 is 0.518.
AP at IoU=.50 is 0.792.
### 4. Inference
Users can employ the model to conduct the inference:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/infer.py -c contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar \
--infer_dir contrib/PedestrianDetection/demo \
--draw_threshold 0.3 \
--output_dir contrib/PedestrianDetection/demo/output
```
Some inference results are visualized below:
![](PedestrianDetection/demo/output/001.png)
![](PedestrianDetection/demo/output/004.png)
**文档教程请参考:** [CONTRIB_cn.md](../docs/featured_model/CONTRIB_cn.md) <br/>
**English document please refer:** [CONTRIB.md](../docs/featured_model/CONTRIB.md)
# 版本更新信息
### 12/2019
- 增加Res2Net模型。
- 增加HRNet模型。
- 增加GIOU loss和DIOU loss。
### 21/11/2019
- 增加CascadeClsAware RCNN模型。
- 增加CBNet,ResNet200和Non-local模型。
- 增加SoftNMS。
- 增加Open Image V5数据集和Objects365数据集模型。
### 10/2019
- 增加增强版YOLOv3模型,精度高达41.4%。
- 增加人脸检测模型BlazeFace、Faceboxes。
- 丰富基于COCO的模型,精度高达51.9%。
- 增加Objects365 2019 Challenge上夺冠的最佳单模型之一CACascade-RCNN。
- 增加行人检测和车辆检测预训练模型。
- 支持FP16训练。
- 增加跨平台的C++推理部署方案。
- 增加模型压缩示例。
### 2/9/2019
- 增加GroupNorm模型。
- 增加CascadeRCNN+Mask模型。
### 5/8/2019
- 增加Modulated Deformable Convolution系列模型。
### 29/7/2019
- 增加检测库中文文档
- 修复R-CNN系列模型训练同时进行评估的问题
- 新增ResNext101-vd + Mask R-CNN + FPN模型
- 新增基于VOC数据集的YOLOv3模型
### 3/7/2019
- 首次发布PaddleDetection检测库和检测模型库
- 模型包括:Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD.
English | [简体中文](DATA_cn.md)
# Data Pipline
## Introduction
The data pipeline is responsible for loading and converting data. Each
resulting data sample is a tuple of np.ndarrays.
For example, Faster R-CNN training uses samples of this format: `[(im,
im_info, im_id, gt_bbox, gt_class, is_crowd), (...)]`.
### Implementation
The data pipeline consists of four sub-systems: data parsing, image
pre-processing, data conversion and data feeding APIs.
Data samples are collected to form `data.Dataset`s, usually 3 sets are
needed for training, validation, and testing respectively.
First, `data.source` loads the data files into memory, then
`data.transform` processes them, and lastly, the batched samples
are fetched by `data.Reader`.
Sub-systems details:
1. Data parsing
Parses various data sources and creates `data.Dataset` instances. Currently,
following data sources are supported:
- COCO data source
Loads `COCO` type datasets with directory structures like this:
```
dataset/coco/
├── annotations
│ ├── instances_train2014.json
│ ├── instances_train2017.json
│ ├── instances_val2014.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
- Pascal VOC data source
Loads `Pascal VOC` like datasets with directory structure like this:
```
dataset/voc/
├── train.txt
├── val.txt
├── test.txt
├── label_list.txt (optional)
├── VOCdevkit/VOC2007
│ ├── Annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.xml
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 003876.xml
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
**NOTE:** If you set `use_default_label=False` in yaml configs, the `label_list.txt`
of Pascal VOC dataset will be read, otherwise, `label_list.txt` is unnecessary and
the default Pascal VOC label list which defined in
[voc\_loader.py](../ppdet/data/source/voc_loader.py) will be used.
- Roidb data source
A generalized data source serialized as pickle files, which have the following
structure:
```python
(records, cname2id)
# `cname2id` is a `dict` which maps category name to class IDs
# and `records` is a list of dict of this structure:
{
'im_file': im_fname, # image file name
'im_id': im_id, # image ID
'h': im_h, # height of image
'w': im_w, # width of image
'is_crowd': is_crowd, # crowd marker
'gt_class': gt_class, # ground truth class
'gt_bbox': gt_bbox, # ground truth bounding box
'gt_poly': gt_poly, # ground truth segmentation
}
```
We provide a tool to generate roidb data sources. To convert `COCO` or `VOC`
like dataset, run this command:
```sh
# --type: the type of original data (xml or json)
# --annotation: the path of file, which contains the name of annotation files
# --save-dir: the save path
# --samples: the number of samples (default is -1, which mean all datas in dataset)
python ./ppdet/data/tools/generate_data_for_training.py
--type=json \
--annotation=./annotations/instances_val2017.json \
--save-dir=./roidb \
--samples=-1
```
2. Image preprocessing
the `data.transform.operator` module provides operations such as image
decoding, expanding, cropping, etc. Multiple operators are combined to form
larger processing pipelines.
3. Data transformer
Transform a `data.Dataset` to achieve various desired effects, Notably: the
`data.transform.paralle_map` transformer accelerates image processing with
multi-threads or multi-processes. More transformers can be found in
`data.transform.transformer`.
4. Data feeding apis
To facilitate data pipeline building, we combine multiple `data.Dataset` to
form a `data.Reader` which can provide data for training, validation and
testing respectively. Users can simply call `Reader.[train|eval|infer]` to get
the corresponding data stream. Many aspect of the `Reader`, such as storage
location, preprocessing pipeline, acceleration mode can be configured with yaml
files.
### APIs
The main APIs are as follows:
1. Data parsing
- `source/coco_loader.py`: COCO dataset parser. [source](../ppdet/data/source/coco_loader.py)
- `source/voc_loader.py`: Pascal VOC dataset parser. [source](../ppdet/data/source/voc_loader.py)
[Note] To use a non-default label list for VOC datasets, a `label_list.txt`
file is needed, one can use the provided label list
(`data/pascalvoc/ImageSets/Main/label_list.txt`) or generate a custom one (with `tools/generate_data_for_training.py`). Also, `use_default_label` option should
be set to `false` in the configuration file
- `source/loader.py`: Roidb dataset parser. [source](../ppdet/data/source/loader.py)
2. Operator
`transform/operators.py`: Contains a variety of data augmentation methods, including:
- `DecodeImage`: Read images in RGB format.
- `RandomFlipImage`: Horizontal flip.
- `RandomDistort`: Distort brightness, contrast, saturation, and hue.
- `ResizeImage`: Resize image with interpolation.
- `RandomInterpImage`: Use a random interpolation method to resize the image.
- `CropImage`: Crop image with respect to different scale, aspect ratio, and overlap.
- `ExpandImage`: Pad image to a larger size, padding filled with mean image value.
- `NormalizeImage`: Normalize image pixel values.
- `NormalizeBox`: Normalize the bounding box.
- `Permute`: Arrange the channels of the image and optionally convert image to BGR format.
- `MixupImage`: Mixup two images with given fraction<sup>[1](#mix)</sup>.
<a name="mix">[1]</a> Please refer to [this paper](https://arxiv.org/pdf/1710.09412.pdf)
`transform/arrange_sample.py`: Assemble the data samples needed by different models.
3. Transformer
`transform/post_map.py`: Transformations that operates on whole batches, mainly for:
- Padding whole batch to given stride values
- Resize images to Multi-scales
- Randomly adjust the image size of the batch data
`transform/transformer.py`: Data filtering batching.
`transform/parallel_map.py`: Accelerate data processing with multi-threads/multi-processes.
4. Reader
`reader.py`: Combine source and transforms, return batch data according to `max_iter`.
`data_feed.py`: Configure default parameters for `reader.py`.
### Usage
#### Canned Datasets
Preset for common datasets, e.g., `COCO` and `Pascal Voc` are included. In
most cases, user can simply use these canned dataset as is. Moreover, the
whole data pipeline is fully customizable through the yaml configuration files.
#### Custom Datasets
- Option 1: Convert the dataset to COCO format.
```sh
# a small utility (`tools/x2coco.py`) is provided to convert
# Labelme-annotated dataset or cityscape dataset to COCO format.
python ./ppdet/data/tools/x2coco.py --dataset_type labelme
--json_input_dir ./labelme_annos/
--image_input_dir ./labelme_imgs/
--output_dir ./cocome/
--train_proportion 0.8
--val_proportion 0.2
--test_proportion 0.0
# --dataset_type: The data format which is need to be converted. Currently supported are: 'labelme' and 'cityscape'
# --json_input_dir:The path of json files which are annotated by Labelme.
# --image_input_dir:The path of images.
# --output_dir:The path of coverted COCO dataset.
# --train_proportion:The train proportion of annatation data.
# --val_proportion:The validation proportion of annatation data.
# --test_proportion: The inference proportion of annatation data.
```
- Option 2:
1. Add `source/XX_loader.py` and implement the `load` function, following the
example of `source/coco_loader.py` and `source/voc_loader.py`.
2. Modify the `load` function in `source/loader.py` to make use of the newly
added data loader.
3. Modify `/source/__init__.py` accordingly.
```python
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
source_type = 'RoiDbSource'
# Replace the above code with the following code:
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource', 'XXSource']:
source_type = 'RoiDbSource'
```
4. In the configure file, define the `type` of `dataset` as `XXSource`.
#### How to add data pre-processing?
- To add pre-processing operation for a single image, refer to the classes in
`transform/operators.py`, and implement the desired transformation with a new
class.
- To add pre-processing for a batch, one needs to modify the `build_post_map`
function in `transform/post_map.py`.
# 数据模块
## 介绍
本模块是一个Python模块,用于加载数据并将其转换成适用于检测模型的训练、验证、测试所需要的格式——由多个np.ndarray组成的tuple数组,例如用于Faster R-CNN模型的训练数据格式为:`[(im, im_info, im_id, gt_bbox, gt_class, is_crowd), (...)]`
### 实现
该模块内部可分为4个子功能:数据解析、图片预处理、数据转换和数据获取接口。
我们采用`data.Dataset`表示一份数据,比如`COCO`数据包含3份数据,分别用于训练、验证和测试。原始数据存储与文件中,通过`data.source`加载到内存,然后使用`data.transform`对数据进行处理转换,最终通过`data.Reader`的接口可以获得用于训练、验证和测试的batch数据。
子功能介绍:
1. 数据解析
数据解析得到的是`data.Dataset`,实现逻辑位于`data.source`中。通过它可以实现解析不同格式的数据集,已支持的数据源包括:
- COCO数据源
该数据集目前分为COCO2014和COCO2017,主要由json文件和image文件组成,其组织结构如下所示:
```
dataset/coco/
├── annotations
│ ├── instances_train2014.json
│ ├── instances_train2017.json
│ ├── instances_val2014.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
- Pascal VOC数据源
该数据集目前分为VOC2007和VOC2012,主要由xml文件和image文件组成,其组织结构如下所示:
```
dataset/voc/
├── train.txt
├── val.txt
├── test.txt
├── label_list.txt (optional)
├── VOCdevkit/VOC2007
│ ├── Annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.xml
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 003876.xml
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
**说明:** 如果你在yaml配置文件中设置`use_default_label=False`, 将从`label_list.txt`
中读取类别列表,反之则可以没有`label_list.txt`文件,检测库会使用Pascal VOC数据集的默
认类别列表,默认类别列表定义在[voc\_loader.py](../ppdet/data/source/voc_loader.py)
- Roidb数据源
该数据集主要由COCO数据集和Pascal VOC数据集转换而成的pickle文件,包含一个dict,而dict中只包含一个命名为‘records’的list(可能还有一个命名为‘cname2cid’的字典),其内容如下所示:
```python
(records, catname2clsid)
'records'是一个list并且它的结构如下:
{
'im_file': im_fname, # 图像文件名
'im_id': im_id, # 图像id
'h': im_h, # 图像高度
'w': im_w, # 图像宽度
'is_crowd': is_crowd, # 是否重叠
'gt_class': gt_class, # 真实框类别
'gt_bbox': gt_bbox, # 真实框坐标
'gt_poly': gt_poly, # 多边形坐标
}
'cname2id'是一个dict保存了类别名到id的映射
```
我们在`./tools/`中提供了一个生成roidb数据集的代码,可以通过下面命令实现该功能。
```
# --type: 原始数据集的类别(只能是xml或者json)
# --annotation: 一个包含所需标注文件名的文件的路径
# --save-dir: 保存路径
# --samples: sample的个数(默认是-1,代表使用所有sample)
python ./ppdet/data/tools/generate_data_for_training.py
--type=json \
--annotation=./annotations/instances_val2017.json \
--save-dir=./roidb \
--samples=-1
```
2. 图片预处理
图片预处理通过包括图片解码、缩放、裁剪等操作,我们采用`data.transform.operator`算子的方式来统一实现,这样能方便扩展。此外,多个算子还可以组合形成复杂的处理流程, 并被`data.transformer`中的转换器使用,比如多线程完成一个复杂的预处理流程。
3. 数据转换器
数据转换器的功能是完成对某个`data.Dataset`进行转换处理,从而得到一个新的`data.Dataset`。我们采用装饰器模式实现各种不同的`data.transform.transformer`。比如用于多进程预处理的`dataset.transform.paralle_map`转换器。
4. 数据获取接口
为方便训练时的数据获取,我们将多个`data.Dataset`组合在一起构成一个`data.Reader`为用户提供数据,用户只需要调用`Reader.[train|eval|infer]`即可获得对应的数据流。`Reader`支持yaml文件配置数据地址、预处理过程、加速方式等。
### APIs
主要的APIs如下:
1. 数据解析
- `source/coco_loader.py`:用于解析COCO数据集。[详见代码](../ppdet/data/source/coco_loader.py)
- `source/voc_loader.py`:用于解析Pascal VOC数据集。[详见代码](../ppdet/data/source/voc_loader.py)
[注意]在使用VOC数据集时,若不使用默认的label列表,则需要先使用`tools/generate_data_for_training.py`生成`label_list.txt`(使用方式与数据解析中的roidb数据集获取过程一致),或提供`label_list.txt`放置于`data/pascalvoc/ImageSets/Main`中;同时在配置文件中设置参数`use_default_label``true`
- `source/loader.py`:用于解析Roidb数据集。[详见代码](../ppdet/data/source/loader.py)
2. 算子
`transform/operators.py`:包含多种数据增强方式,主要包括:
``` python
RandomFlipImage水平翻转
RandomDistort随机扰动图片亮度对比度饱和度和色相
ResizeImage根据特定的插值方式调整图像大小
RandomInterpImage使用随机的插值方式调整图像大小
CropImage根据缩放比例长宽比例两个参数生成若干候选框再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果
ExpandImage将原始图片放进一张使用像素均值填充(随后会在减均值操作中减掉)的扩张图中再对此图进行裁剪缩放和翻转
DecodeImage以RGB格式读取图像
Permute对图像的通道进行排列并转为BGR格式
NormalizeImage对图像像素值进行归一化
NormalizeBox对bounding box进行归一化
MixupImage按比例叠加两张图像
```
[注意]:Mixup的操作可参考[论文](https://arxiv.org/pdf/1710.09412.pdf)
`transform/arrange_sample.py`:实现对输入网络数据的排序。
3. 转换
`transform/post_map.py`:用于完成批数据的预处理操作,其主要包括:
``` python
随机调整批数据的图像大小
多尺度调整图像大小
padding操作
```
`transform/transformer.py`:用于过滤无用的数据,并返回批数据。
`transform/parallel_map.py`:用于实现加速。
4. 读取
`reader.py`:用于组合source和transformer操作,根据`max_iter`返回batch数据。
`data_feed.py`: 用于配置 `reader.py`中所需的默认参数.
### 使用
#### 常规使用
结合yaml文件中的配置信息,完成本模块的功能。yaml文件的使用可以参见配置文件部分。
- 读取用于训练的数据
``` python
ccfg = load_cfg('./config.yml')
coco = Reader(ccfg.DATA, ccfg.TRANSFORM, maxiter=-1)
```
#### 如何使用自定义数据集?
- 选择1:将数据集转换为COCO格式。
```
# 在./tools/中提供了x2coco.py用于将labelme标注的数据集或cityscape数据集转换为COCO数据集
python ./ppdet/data/tools/x2coco.py --dataset_type labelme
--json_input_dir ./labelme_annos/
--image_input_dir ./labelme_imgs/
--output_dir ./cocome/
--train_proportion 0.8
--val_proportion 0.2
--test_proportion 0.0
# --dataset_type:需要转换的数据格式,目前支持:’labelme‘和’cityscape‘
# --json_input_dir:使用labelme标注的json文件所在文件夹
# --image_input_dir:图像文件所在文件夹
# --output_dir:转换后的COCO格式数据集存放位置
# --train_proportion:标注数据中用于train的比例
# --val_proportion:标注数据中用于validation的比例
# --test_proportion: 标注数据中用于infer的比例
```
- 选择2:
1. 仿照`./source/coco_loader.py``./source/voc_loader.py`,添加`./source/XX_loader.py`并实现`load`函数。
2.`./source/loader.py``load`函数中添加使用`./source/XX_loader.py`的入口。
3. 修改`./source/__init__.py`
```python
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
source_type = 'RoiDbSource'
# 将上述代码替换为如下代码:
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource', 'XXSource']:
source_type = 'RoiDbSource'
```
4. 在配置文件中修改`dataset`下的`type``XXSource`
#### 如何增加数据预处理?
- 若增加单张图像的增强预处理,可在`transform/operators.py`中参考每个类的代码,新建一个类来实现新的数据增强;同时在配置文件中增加该预处理。
- 若增加单个batch的图像预处理,可在`transform/post_map.py`中参考`build_post_map`中每个函数的代码,新建一个内部函数来实现新的批数据预处理;同时在配置文件中增加该预处理。
......@@ -86,33 +86,36 @@ The backbone models pretrained on ImageNet are available. All backbone models ar
| CBResNet200-vd-FPN-Nonlocal | Cascade Faster | c3-c5 | 1 | 2.5x | - | 53.3%(softnms) | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.tar) |
#### Notes:
**Notes:**
- Deformable ConvNets v2(dcn_v2) reference from [Deformable ConvNets v2](https://arxiv.org/abs/1811.11168).
- `c3-c5` means adding `dcn` in resnet stage 3 to 5.
- Detailed configuration file in [configs/dcn](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/dcn)
### HRNet
* See more details in [HRNet model zoo](../configs/hrnet/README.md).
* See more details in [HRNet model zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/hrnet/).
### Res2Net
* See more details in [Res2Net model zoo](../configs/res2net/README.md).
* See more details in [Res2Net model zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/res2net/).
### IOU loss
* GIOU loss and DIOU loss are included now. See more details in [IOU loss model zoo](../configs/iou_loss/README.md).
* GIOU loss and DIOU loss are included now. See more details in [IOU loss model zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/iou_loss/).
### GCNet
* See more details in [GCNet model zoo](../configs/gcnet/README.md).
* See more details in [GCNet model zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/gcnet/).
### Group Normalization
| Backbone | Type | Image/gpu | Lr schd | Box AP | Mask AP | Download |
| :------------------- | :------------- | :-----: | :-----: | :----: | :-----: | :----------------------------------------------------------: |
| ResNet50-FPN | Faster | 2 | 2x | 39.7 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_fpn_gn_2x.tar) |
| ResNet50-FPN | Mask | 1 | 2x | 40.1 | 35.8 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_fpn_gn_2x.tar) |
#### Notes:
**Notes:**
- Group Normalization reference from [Group Normalization](https://arxiv.org/abs/1803.08494).
- Detailed configuration file in [configs/gn](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/gn)
......@@ -149,7 +152,8 @@ The backbone models pretrained on ImageNet are available. All backbone models ar
| ResNet34 | 416 | 8 | 270e | - | 81.9 | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_r34_voc.tar) |
| ResNet34 | 320 | 8 | 270e | - | 80.1 | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_r34_voc.tar) |
#### Notes:
**Notes:**
- YOLOv3-DarkNet53 performance in paper [YOLOv3](https://arxiv.org/abs/1804.02767) is also provided above, our implements
improved performance mainly by using L1 loss in bounding box width and height regression, image mixup and label smooth.
- YOLO v3 is trained in 8 GPU with total batch size as 64 and trained 270 epoches. YOLO v3 training data augmentations: mixup,
......@@ -188,11 +192,11 @@ results of image size 608/416/320 above. Deformable conv is added on stage 5 of
randomly cropping, randomly expansion, randomly flipping.
## Face Detection
### Face Detection
Please refer [face detection models](../configs/face_detection) for details.
Please refer [face detection models](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/face_detection) for details.
## Object Detection in Open Images Dataset V5
### Object Detection in Open Images Dataset V5
Please refer [Open Images Dataset V5 Baseline model](OIDV5_BASELINE_MODEL.md) for details.
Please refer [Open Images Dataset V5 Baseline model](featured_model/OIDV5_BASELINE_MODEL.md) for details.
......@@ -83,35 +83,38 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
| ResNet200-vd-FPN-Nonlocal | CascadeClsAware Faster | c3-c5 | 1 | 2.5x | - | 51.7%(softnms) | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.tar) |
| CBResNet200-vd-FPN-Nonlocal | Cascade Faster | c3-c5 | 1 | 2.5x | - | 53.3%(softnms) | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.tar) |
#### 注意事项:
**注意事项:**
- Deformable卷积网络v2(dcn_v2)参考自论文[Deformable ConvNets v2](https://arxiv.org/abs/1811.11168).
- `c3-c5`意思是在resnet模块的3到5阶段增加`dcn`.
- 详细的配置文件在[configs/dcn](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/dcn)
- 详细的配置文件在[configs/dcn](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/dcn)
### HRNet
* 详情见[HRNet模型库](../configs/hrnet/README.md)
* 详情见[HRNet模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/hrnet/)
### Res2Net
* 详情见[Res2Net模型库](../configs/res2net/README.md)
* 详情见[Res2Net模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/res2net/)
### IOU loss
* 目前模型库中包括GIOU loss和DIOU loss,详情加[IOU loss模型库](../configs/iou_loss/README.md).
* 目前模型库中包括GIOU loss和DIOU loss,详情加[IOU loss模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/master//configs/iou_loss/).
### GCNet
* 详情见[GCNet模型库](../configs/gcnet/README.md).
* 详情见[GCNet模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/gcnet/).
### Group Normalization
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 | Box AP | Mask AP | 下载 |
| :------------------- | :------------- |:--------: | :-----: | :----: | :-----: | :----------------------------------------------------------: |
| ResNet50-FPN | Faster | 2 | 2x | 39.7 | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_fpn_gn_2x.tar) |
| ResNet50-FPN | Mask | 1 | 2x | 40.1 | 35.8 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_fpn_gn_2x.tar) |
#### 注意事项:
**注意事项:**
- Group Normalization参考论文[Group Normalization](https://arxiv.org/abs/1803.08494).
- 详细的配置文件在[configs/gn](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/gn)
- 详细的配置文件在[configs/gn](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/gn)
### YOLO v3
......@@ -146,7 +149,8 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
| ResNet34 | 416 | 8 | 270e | - | 81.9 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_r34_voc.tar) |
| ResNet34 | 320 | 8 | 270e | - | 80.1 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_r34_voc.tar) |
#### 注意事项:
**注意事项:**
- 上表中也提供了原论文[YOLOv3](https://arxiv.org/abs/1804.02767)中YOLOv3-DarkNet53的精度,我们的实现版本主要从在bounding box的宽度和高度回归上使用了L1损失,图像mixup和label smooth等方法优化了其精度。
- YOLO v3在8卡,总batch size为64下训练270轮。数据增强包括:mixup, 随机颜色失真,随机剪裁,随机扩张,随机插值法,随机翻转。YOLO v3在训练阶段对minibatch采用随机reshape,可以采用相同的模型测试不同尺寸图片,我们分别提供了尺寸为608/416/320大小的测试结果。deformable卷积作用在骨架网络5阶段。
......@@ -179,11 +183,11 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
**注意事项:** MobileNet-SSD在2卡,总batch size为64下训练120周期。VGG-SSD在总batch size为32下训练240周期。数据增强包括:随机颜色失真,随机剪裁,随机扩张,随机翻转。
## 人脸检测
### 人脸检测
详细请参考[人脸检测模型](../configs/face_detection)
详细请参考[人脸检测模型](featured_model/FACE_DETECTION.md)
## 基于Open Images V5数据集的物体检测
### 基于Open Images V5数据集的物体检测
详细请参考[Open Images V5数据集基线模型](OIDV5_BASELINE_MODEL.md)
详细请参考[Open Images V5数据集基线模型](featured_model/OIDV5_BASELINE_MODEL.md)
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SPHINXPROJ = PaddleDetection
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
......@@ -159,7 +159,6 @@ LearningRate:
steps: 500
```
[Complete config files](config_example/) of multiple detection architectures are given and brief description of each parameter.
## Requirements
......
# 配置模块
# 配置模块设计与介绍
## 简介
......@@ -149,7 +149,6 @@ LearningRate:
steps: 500
```
[示例配置文件](config_example/)中给出了多种检测结构的完整配置文件,以及其中各个超参的简要说明。
## 安装依赖
......
# 新增模型算法
为了让用户更好的使用PaddleDetection,本文档中,我们将介绍PaddleDetection的主要模型技术细节及应用,
包括:如何搭建模型,如何定义检测组件和模型配置与运行。
## 简介
PaddleDetection的网络模型模块所有代码逻辑在`ppdet/modeling/`中,所有网络模型是以组件的形式进行定义与组合,网络模型模块的主要构成如下架构所示:
```
ppdet/modeling/
├── architectures #
│ ├── faster_rcnn.py # Faster Rcnn模型
│ ├── ssd.py # SSD模型
│ ├── yolov3.py # YOLOv3模型
│ │ ...
├── anchor_heads # anchor生成检测头模块
│ ├── xxx_head.py # 定义各类anchor生成检测头
├── backbones # 基干网络模块
│ ├── resnet.py # ResNet网络
│ ├── mobilenet.py # MobileNet网络
│ │ ...
├── losses # 损失函数模块
│ ├── xxx_loss.py # 定义注册各类loss函数
├── roi_extractors # 检测感兴趣区域提取
│ ├── roi_extractor.py # FPNRoIAlign等实现
├── roi_heads # 两阶段检测器检测头
│ ├── bbox_head.py # Faster-Rcnn系列检测头
│ ├── cascade_head.py # cascade-Rcnn系列检测头
│ ├── mask_head.py # Mask-Rcnn系列检测头
├── tests # 单元测试模块
│ ├── test_architectures.py # 对网络结构进行单元测试
├── ops.py # 封装及注册各类PaddlePaddle物体检测相关公共检测组件/算子
├── target_assigners.py # 封装bbox/mask等最终结果的公共检测组件
```
![](../images/models_figure.png)
## 新增模型
我们以单阶段检测器YOLOv3为例,结合[yolov3_darknet.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_darknet.yml)配置文件,对建立模型过程进行详细描述,
按照此思路您可以快速搭建新的模型。
搭建新模型的一般步骤是:Backbone编写、检测组件编写与模型组网这三个步骤,下面为您详细介绍:
### Backbone编写
1.代码编写:
PaddleDetection中现有所有Backbone网络代码都放置在`ppdet/modeling/backbones`目录下,所以我们在其中新建`darknet.py`如下:
```python
from ppdet.core.workspace import register
@register
class DarkNet(object):
__shared__ = ['norm_type', 'weight_prefix_name']
def __init__(self,
depth=53,
norm_type='bn',
norm_decay=0.,
weight_prefix_name=''):
# 省略内容
pass
def __call__(self, input):
# 省略处理逻辑
pass
```
然后在`backbones/__init__.py`中加入引用:
```python
from . import darknet
from .darknet import *
```
**几点说明:**
- 为了在yaml配置文件中灵活配置网络,所有Backbone、模型组件与architecture类需要利用`ppdet.core.workspace`里的`register`进行注册,形式请参考如上示例;
- 在Backbone中都需定义`__init__`函数与`__call__`函数,`__init__`函数负责初始化参数,在调用此Backbone时会执行`__call__`函数;
- `__shared__`为了实现一些参数的配置全局共享,具体细节请参考[配置文件说明文档](CONFIG_cn.md#faq)
2.配置编写:
在yaml文件中以注册好了的`DarkNet`类名为标题,可选择性的对`__init__`函数中的参数进行更新,不在配置文件中配置的参数会保持`__init__`函数中的初始化值:
```yaml
DarkNet:
norm_type: sync_bn
norm_decay: 0.
depth: 53
```
### 检测组件编写
1.代码编写:编写好Backbone后,我们开始编写生成anchor的检测头部分,anchor的检测头代码都在`ppdet/modeling/anchor_heads`目录下,所以我们在其中新建`yolo_head.py`如下:
```python
from ppdet.core.workspace import register
@register
class YOLOv3Head(object):
__inject__ = ['yolo_loss', 'nms']
__shared__ = ['num_classes', 'weight_prefix_name']
def __init__(self,
num_classes=80,
anchors=[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
[59, 119], [116, 90], [156, 198], [373, 326]],
yolo_loss="YOLOv3Loss",
nms=MultiClassNMS(
score_threshold=0.01,
nms_top_k=1000,
keep_top_k=100,
nms_threshold=0.45,
background_label=-1).__dict__):
# 省略部分内容
pass
```
然后在`anchor_heads/__init__.py`中加入引用:
```python
from . import yolo_head
from .yolo_head import *
```
**几点说明:**
- `__inject__`表示引入封装好了的检测组件/算子列表,此处`yolo_loss``nms`变量指向外部定义好的检测组件/算子;
- anchor的检测头实现中类函数需有输出loss接口`get_loss`与预测框或建议框输出接口`get_prediction`
- 两阶段检测器在anchor的检测头里定义的是候选框输出接口`get_proposals`,之后还会在`roi_extractors``roi_heads`中进行后续计算,定义方法与如下一致。
- YOLOv3算法的loss函数比较复杂,所以我们将loss函数进行拆分,具体实现在`losses/yolo_loss.py`中,也需要注册;
- nms算法是封装paddlepaddle中现有检测组件/算子,如何定义与注册详见[定义公共检测组件/算子](#定义公共检测组件/算子)部分。
2.配置编写:
在yaml文件中以注册好了的`YOLOv3Head`类名为标题,可选择性的对`__init__`函数中的参数进行更新,不在配置文件中配置的参数会保持`__init__`函数中的初始化值:
```yaml
YOLOv3Head:
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
norm_decay: 0.
yolo_loss: YOLOv3Loss
nms:
background_label: -1
keep_top_k: 100
nms_threshold: 0.45
nms_top_k: 1000
normalized: false
score_threshold: 0.01
YOLOv3Loss:
batch_size: 8
ignore_thresh: 0.7
label_smooth: false
```
如上配置文件中的`YOLOv3Loss`是注册好检测组件接口,所以需要在配置文件中也对`YOLOv3Loss`进行参数设置。
### 模型组网
本步骤中,我们需要将编写好的Backbone、各个检测组件进行整合拼接,搭建一个完整的物体检测网络能够提供给训练、评估和测试程序去运行。
1.组建`architecture`
所有architecture网络代码都放置在`ppdet/modeling/architectures`目录下,所以我们在其中新建`yolov3.py`如下:
```python
from ppdet.core.workspace import register
@register
class YOLOv3(object):
__category__ = 'architecture'
__inject__ = ['backbone', 'yolo_head']
__shared__ = ['use_fine_grained_loss']
def __init__(self,
backbone,
yolo_head='YOLOv3Head',
use_fine_grained_loss=False):
super(YOLOv3, self).__init__()
# 省略内容
def build(self, feed_vars, mode='train'):
# 省略内容
pass
def build_inputs(self, ):
# 详解见【模型输入设置】章节
pass
def train(self, feed_vars):
return self.build(feed_vars, mode='train')
def eval(self, feed_vars):
return self.build(feed_vars, mode='test')
def test(self, feed_vars):
return self.build(feed_vars, mode='test')
```
**几点说明:**
- 在组建一个完整的网络时必须要设定`__category__ = 'architecture'`来表示一个完整的物体检测模型;
-`__init__`函数中传入我们上面定义好的`backbone``yolo_head`的名称即可,根据yaml配置文件里这些组件的参数初始化,`ppdet.core.workspace`会自动解析加载;
- 在architecture类里必须定义`build_inputs`函数,为了适配检测网络的输入与Reader模块,具体见[模型输入设置](#模型输入设置)模块;
- 在architecture类里必须定义`train``eval``test`函数,在训练、评估和测试程序中会分别调用这三个函数来在不同场景中加载网络模型。
2.配置编写:
首先定义网络模型名称:
```yaml
architecture: YOLOv3
```
接下来根据网络模型名称`YOLOv3`来初始化网络组件名称:
```yaml
YOLOv3:
backbone: DarkNet
yolo_head: YOLOv3Head
```
之后`backbone``yolo_head`的配置步骤在上面已经介绍,完成如上配置就完成了物体检测模型组网的工作。
### 模型输入设置
在architecture定义的类里必须含有`build_inputs`函数,这个函数的作用是生成`feed_vars``loader`
1.`feed_vars`是由`key:fluid.data`构成的字典,key是由如下yaml文件中`fields`字段构成,在不同数据集、训练、评估和测试中字段不尽相同,
在使用中需要合理组合。
```yaml
TrainReader:
inputs_def:
fields: ['image', 'gt_bbox', 'gt_class', 'gt_score']
EvalReader:
inputs_def:
fields: ['image', 'im_size', 'im_id']
...
```
[数据源解析](READER.md#数据解析)中已经提到,数据源roidbs会解析为字典的形式,Reader会根据feed_vars所含字段进行解析适配。
2.`loader`是调用[fluid.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/DataLoader_cn.html#dataloader)
根据`feed_vars`来完成DataLoader的组建。
## 定义公共检测组件/算子
为了更好的复用一些公共检测组件/算子,以及可以在yaml配置文件中配置化,检测模型相关检测组件/算子都在`ppdet/modeling/ops.py`中定义并注册。这部分是选做部分,不是必需的。
(1)基于现有的PaddlePaddle物体检测相关OP进行二次封装注册:
例如[fluid.layers.multiclass_nms](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/layers_cn/multiclass_nms_cn.html)
在PaddlePaddle已经存在,我们想让它在yaml文件中灵活配置,我们只需要在`ops.py`中如下配置即可:
```python
from ppdet.core.workspace import register, serializable
@register
@serializable
class MultiClassNMS(object):
__op__ = fluid.layers.multiclass_nms
__append_doc__ = True
def __init__(self,
score_threshold=.05,
nms_top_k=-1,
keep_top_k=100,
nms_threshold=.5,
normalized=False,
nms_eta=1.0,
background_label=0):
super(MultiClassNMS, self).__init__()
# 省略
```
**注意:** 我们是对`fluid.layers.multiclass_nms`这个OP进行二次封装,在`__init__`方法中添加所需的可选参数即可,保持默认的参数可以不添加进来。
(2)从零开始定义检测组件/算子:
`ops.py`中定义`xxx`函数,然后在相应位置添加`from ppdet.modeling.ops import xxx`即可调用,无需注册与序列化。
## 配置及运行
PaddleDetection在`ppdet/optimizer.py`中注册实现了学习率配置接口类`LearningRate`、优化器接口类`OptimizerBuilder`
- 学习率配置
在yaml文件中可便捷配置学习率各个参数:
```yaml
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 400000
- 450000
- !LinearWarmup
start_factor: 0.
steps: 4000
```
**几点说明:**
- `PiecewiseDecay``LinearWarmup`策略在`ppdet/optimizer.py`中都已注册。
- 除了这两个优化器之外,您还可以使用paddlepaddle中所有的优化器[paddlepaddle官网文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_guides/low_level/optimizer.html)
- Optimizer优化器配置:
```yaml
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
```
- 其他配置:在训练、评估与测试阶段,定义了一些所需参数如下:
```yaml
use_gpu: true # 是否使用GPU运行程序
max_iters: 500200 # 最大迭代轮数
log_smooth_window: 20 # 日志打印队列长度
log_iter: 20 # 训练时日志每迭代x轮打印一次
save_dir: output # 模型保存路径
snapshot_iter: 2000 # 训练时第x轮保存/评估
metric: COCO # 数据集名称
pretrain_weights: xxx # 预训练模型地址(网址/路径)
weights: xxx/model_final # 评估或测试时模型权重的路径
num_classes: 80 # 类别数
```
> 看过此文档,您应该对PaddleDetection中模型搭建与配置有了一定经验,结合源码会理解的更加透彻。关于模型技术,如您有其他问题或建议,请给我们提issue,我们非常欢迎您的反馈。
此差异已折叠。
......@@ -10,7 +10,7 @@ In transfer learning, if different dataset and the number of classes is used, th
In transfer learning, it's needed to load pretrained model selectively. The following two methods can be used:
1. Set `finetune_exclude_pretrained_params` in YAML configuration files. Please refer to [configure file](../configs/yolov3_mobilenet_v1_fruit.yml#L15)
1. Set `finetune_exclude_pretrained_params` in YAML configuration files. Please refer to [configure file](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_mobilenet_v1_fruit.yml#L15)
2. Set -o finetune_exclude_pretrained_params in command line. For example:
```python
......@@ -23,7 +23,7 @@ python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
* Note:
1. The path in pretrain\_weights is the open-source model link of faster RCNN from COCO dataset. For full models link, please refer to [MODEL_ZOO](MODEL_ZOO.md)
1. The path in pretrain\_weights is the open-source model link of faster RCNN from COCO dataset. For full models link, please refer to [MODEL_ZOO](../MODEL_ZOO.md)
2. The parameter fields are set in finetune\_exclude\_pretrained\_params. If the name of parameter matches field (wildcard matching), the parameter will be ignored in loading.
If users want to fine-tune by own dataet, and remain the model construction, need to ignore the parameters related to the number of classes. PaddleDetection lists ignored parameter fields corresponding to different model type. The table is shown below: </br>
......
# 迁移学习
# 迁移学习教程
迁移学习为利用已有知识,对新知识进行学习。例如利用ImageNet分类预训练模型做初始化来训练检测模型,利用在COCO数据集上的检测模型做初始化来训练基于PascalVOC数据集的检测模型。
......@@ -8,7 +8,7 @@
在迁移学习中,对预训练模型进行选择性加载,可通过如下两种方式实现:
1. 在 YMAL 配置文件中通过设置`finetune_exclude_pretrained_params`字段。可参考[配置文件](../configs/yolov3_mobilenet_v1_fruit.yml#L15)
1. 在 YMAL 配置文件中通过设置`finetune_exclude_pretrained_params`字段。可参考[配置文件](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_mobilenet_v1_fruit.yml#L15)
2. 在 train.py的启动参数中设置 -o finetune_exclude_pretrained_params。例如:
```python
......@@ -21,7 +21,7 @@ python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
* 说明:
1. pretrain\_weights的路径为COCO数据集上开源的faster RCNN模型链接,完整模型链接可参考[MODEL_ZOO](MODEL_ZOO_cn.md)
1. pretrain\_weights的路径为COCO数据集上开源的faster RCNN模型链接,完整模型链接可参考[MODEL_ZOO](../MODEL_ZOO_cn.md)
2. finetune\_exclude\_pretrained\_params中设置参数字段,如果参数名能够匹配以上参数字段(通配符匹配方式),则在模型加载时忽略该参数。
如果用户需要利用自己的数据进行finetune,模型结构不变,只需要忽略与类别数相关的参数。PaddleDetection给出了不同模型类型所对应的忽略参数字段。如下表所示:</br>
......
高级使用教程
===========================================
.. toctree::
:maxdepth: 2
READER.md
MODEL_TECHNICAL.md
CONFIG_cn.md
TRANSFER_LEARNING_cn.md
slim/index
inference/index
# 推理Benchmark
## 环境准备
- 测试环境:
- CUDA 9.0
- CUDNN 7.5
......@@ -17,10 +14,7 @@
- 采用Fluid C++预测引擎: 包含Fluid C++预测、Fluid-TensorRT预测,下面同时测试了Float32 (FP32) 和Float16 (FP16)的推理速度。
- 测试时开启了 FLAGS_cudnn_exhaustive_search=True,使用exhaustive方式搜索卷积计算算法。
### 推理速度
## 推理速度
| 模型 | Tesla V100 Fluid (ms/image) | Tesla V100 Fluid-TensorRT-FP32 (ms/image) | Tesla V100 Fluid-TensorRT-FP16 (ms/image) | Tesla P4 Fluid (ms/image) | Tesla P4 Fluid-TensorRT-FP32 (ms/image) |
......@@ -84,6 +78,5 @@
2. YOLO v3系列模型,Fluid-TensorRT相比Fluid预测加速5% - 10%不等。
3. SSD和YOLOv3系列模型 TensorRT-FP16预测速度有一定的优势,加速约20% - 40%不等。具体如下图。
<div align="center">
<img src="images/bench_ssd_yolo_infer.png" />
</div>
![](../../images/bench_ssd_yolo_infer.png)
# PaddleDetection C++预测部署方案
## 本文档结构
[1.说明](#1说明)
[2.主要目录和文件](#2主要目录和文件)
[3.编译](#3编译)
[4.预测并可视化结果](#4预测并可视化结果)
## 1.说明
本目录提供一个跨平台的图像检测模型的C++预测部署方案,用户通过一定的配置,加上少量的代码,即可把模型集成到自己的服务中,完成相应的图像检测任务。
主要设计的目标包括以下四点:
- 跨平台,支持在 Windows 和 Linux 完成编译、开发和部署
- 可扩展性,支持用户针对新模型开发自己特殊的数据预处理等逻辑
- 高性能,除了`PaddlePaddle`自身带来的性能优势,我们还针对图像检测的特点对关键步骤进行了性能优化
- 支持多种常见的图像检测模型,如YOLOv3, Faster-RCNN, Faster-RCNN+FPN,用户通过少量配置即可加载模型完成常见检测任务
## 2.主要目录和文件
```bash
deploy
├── detection_demo.cpp # 完成图像检测预测任务C++代码
├── conf
│ ├── detection_rcnn.yaml #示例faster rcnn 目标检测配置
│ └── detection_rcnn_fpn.yaml #示例faster rcnn + fpn目标检测配置
├── images
│ └── detection_rcnn # 示例faster rcnn + fpn目标检测测试图片目录
├── tools
│ └── vis.py # 示例图像检测结果可视化脚本
├── docs
│ ├── linux_build.md # Linux 编译指南
│ ├── windows_vs2015_build.md # windows VS2015编译指南
│ └── windows_vs2019_build.md # Windows VS2019编译指南
├── utils # 一些基础公共函数
├── preprocess # 数据预处理相关代码
├── predictor # 模型加载和预测相关代码
├── CMakeList.txt # cmake编译入口文件
└── external-cmake # 依赖的外部项目cmake(目前仅有yaml-cpp)
```
## 3.编译
支持在`Windows``Linux`平台编译和使用:
- [Linux 编译指南](./docs/linux_build.md)
- [Windows 使用 Visual Studio 2019 Community 编译指南](./docs/windows_vs2019_build.md)
- [Windows 使用 Visual Studio 2015 编译指南](./docs/windows_vs2015_build.md)
`Windows`上推荐使用最新的`Visual Studio 2019 Community`直接编译`CMake`项目。
## 4.预测并可视化结果
完成编译后,便生成了需要的可执行文件和链接库。这里以我们基于`faster rcnn`检测模型为例,介绍部署图像检测模型的通用流程。
### 4.1. 下载模型文件
我们提供faster rcnn,faster rcnn+fpn模型用于预测coco17数据集,可在以下链接下载:[faster rcnn示例模型下载地址](https://paddleseg.bj.bcebos.com/inference/faster_rcnn_pp50.zip)
[faster rcnn + fpn示例模型下载地址](https://paddleseg.bj.bcebos.com/inference/faster_rcnn_pp50_fpn.zip)
下载并解压,解压后目录结构如下:
```
faster_rcnn_pp50/
├── __model__ # 模型文件
└── __params__ # 参数文件
```
解压后把上述目录拷贝到合适的路径:
**假设**`Windows`系统上,我们模型和参数文件所在路径为`D:\projects\models\faster_rcnn_pp50`
**假设**`Linux`上对应的路径则为`/root/projects/models/faster_rcnn_pp50/`
### 4.2. 修改配置
`inference`源代码(即本目录)的`conf`目录下提供了示例基于faster rcnn的配置文件`detection_rcnn.yaml`, 相关的字段含义和说明如下:
```yaml
DEPLOY:
# 是否使用GPU预测
USE_GPU: 1
# 模型和参数文件所在目录路径
MODEL_PATH: "/root/projects/models/faster_rcnn_pp50"
# 模型文件名
MODEL_FILENAME: "__model__"
# 参数文件名
PARAMS_FILENAME: "__params__"
# 预测图片的标准输入,尺寸不一致会resize
EVAL_CROP_SIZE: (608, 608)
# resize方式,支持 UNPADDING和RANGE_SCALING
RESIZE_TYPE: "RANGE_SCALING"
# 短边对齐的长度,仅在RANGE_SCALING下有效
TARGET_SHORT_SIZE : 800
# 均值
MEAN: [0.4647, 0.4647, 0.4647]
# 方差
STD: [0.0834, 0.0834, 0.0834]
# 图片类型, rgb或者rgba
IMAGE_TYPE: "rgb"
# 像素分类数
NUM_CLASSES: 1
# 通道数
CHANNELS : 3
# 预处理器, 目前提供图像检测的通用处理类DetectionPreProcessor
PRE_PROCESSOR: "DetectionPreProcessor"
# 预测模式,支持 NATIVE 和 ANALYSIS
PREDICTOR_MODE: "ANALYSIS"
# 每次预测的 batch_size
BATCH_SIZE : 3
# 长边伸缩的最大长度,-1代表无限制。
RESIZE_MAX_SIZE: 1333
# 输入的tensor数量。
FEEDS_SIZE: 3
```
修改字段`MODEL_PATH`的值为你在**上一步**下载并解压的模型文件所放置的目录即可。更多配置文件字段介绍,请参考文档[预测部署方案配置文件说明](./docs/configuration.md)
**注意**在使用CPU版本预测库时,`USE_GPU`的值必须设为0,否则无法正常预测。
### 4.3. 执行预测
在终端中切换到生成的可执行文件所在目录为当前目录(Windows系统为`cmd`)。
`Linux` 系统中执行以下命令:
```shell
./detection_demo --conf=conf/detection_rcnn.yaml --input_dir=images/detection_rcnn
```
`Windows` 中执行以下命令:
```shell
.\detection_demo.exe --conf=conf\detection_rcnn.yaml --input_dir=images\detection_rcnn\
```
预测使用的两个命令参数说明如下:
| 参数 | 含义 |
|-------|----------|
| conf | 模型配置的Yaml文件路径 |
| input_dir | 需要预测的图片目录 |
·
配置文件说明请参考上一步,样例程序会扫描input_dir目录下的所有图片,并为每一张图片生成对应的预测结果,输出到屏幕,并在`X`同一目录下保存到`X.pb文件`(X为对应图片的文件名)。可使用工具脚本vis.py将检测结果可视化。
**检测结果可视化**
运行可视化脚本时,只需输入命令行参数图片路径、检测结果pb文件路径、目标框阈值以及类别-标签映射文件路径即可得到可视化的图片`X.png` (tools目录下提供coco17的类别标签映射文件coco17.json)。
```bash
python vis.py --img_path=../build/images/detection_rcnn/000000087038.jpg --img_result_path=../build/images/detection_rcnn/000000087038.jpg.pb --threshold=0.1 --c2l_path=coco17.json
```
检测结果(每个图片的结果用空行隔开)
```原图:```
![](../../../inference/images/detection_rcnn/000000087038.jpg)
```检测结果图:```
![](../../images/000000087038_res.jpg)
......@@ -11,14 +11,13 @@
## 使用示例
使用[训练/评估/推断](GETTING_STARTED_cn.md)中训练得到的模型进行试用,脚本如下
使用[训练/评估/推断](../../tutorials/GETTING_STARTED_cn.md)中训练得到的模型进行试用,脚本如下
```bash
# 导出FasterRCNN模型, 模型中data层默认的shape为3x800x1333
python tools/export_model.py -c configs/faster_rcnn_r50_1x.yml \
--output_dir=./inference_model \
-o weights=output/faster_rcnn_r50_1x/model_final \
```
预测模型会导出到`inference_model/faster_rcnn_r50_1x`目录下,模型名和参数名分别为`__model__``__params__`
......
......@@ -22,13 +22,14 @@ python tools/cpp_infer.py --model_path=inference_model/faster_rcnn_r50_1x/ --con
**注意**
1. 设置shape时必须保持与模型导出时shape大小一致
2. ```min_subgraph_size```的设置与模型arch相关,对部分arch需要调大该参数,一般设置为40适用于所有模型。适当的调小```min_subgraph_size```会对预测有小幅加速效果。例如YOLO中该参数可设置为3
1. 设置shape时必须保持与模型导出时shape大小一致;
2. `min_subgraph_size`的设置与模型arch相关,对部分arch需要调大该参数,一般设置为40适用于所有模型。适当的调小`min_subgraph_size`会对预测有小幅加速效果,例如YOLO中该参数可设置为3。
## Paddle环境搭建
需要基于develop分支编译TensorRT版本Paddle, 在编译命令中指定TensorRT路径:
```
```bash
cmake .. -DWITH_MKL=ON \
-DWITH_GPU=ON \
-DWITH_TESTING=ON \
......
......@@ -16,7 +16,7 @@
```yaml
# 预测部署时所有配置字段需在DEPLOY字段下
DEPLOY:
DEPLOY:
# 类型:required int
# 含义:是否使用GPU预测。 0:不使用 1:使用
USE_GPU: 1
......@@ -71,5 +71,5 @@ DEPLOY:
FEEDS_SIZE: 2
# 类型: optional int
# 含义: 将图像的边变为该字段的值的整数倍。在使用fpn模型时需要设为32。默认值为1。
COARSEST_STRIDE: 32
```
\ No newline at end of file
COARSEST_STRIDE: 32
```
......@@ -43,9 +43,9 @@ fluid_inference
1. 在OpenCV官网下载适用于Windows平台的3.4.6版本, [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)
2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\projects\opencv`
3. 配置环境变量,如下流程所示
- 我的电脑->属性->高级系统设置->环境变量
- 我的电脑->属性->高级系统设置->环境变量
- 在系统变量中找到Path(如没有,自行创建),并双击编辑
- 新建,将opencv路径填入并保存,如`D:\projects\opencv\build\x64\vc14\bin`
- 新建,将opencv路径填入并保存,如`D:\projects\opencv\build\x64\vc14\bin`
### Step4: 以VS2015为例编译代码
......@@ -57,7 +57,7 @@ fluid_inference
```
call "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" amd64
```
三个编译参数的含义说明如下(带*表示仅在使用**GPU版本**预测库时指定, 其中CUDA库版本尽量对齐,**使用9.0、10.0版本,不使用9.2、10.1等版本CUDA库**):
| 参数名 | 含义 |
......@@ -111,4 +111,4 @@ cd /d D:\projects\PaddleDetection\inference\build\release
detection_demo.exe --conf=/path/to/your/conf --input_dir=/path/to/your/input/data/directory
```
更详细说明请参考ReadMe文档: [预测和可视化部分](../README.md)
\ No newline at end of file
更详细说明请参考ReadMe文档: [预测和可视化部分](../README.md)
......@@ -46,9 +46,9 @@ fluid_inference
1. 在OpenCV官网下载适用于Windows平台的3.4.6版本, [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)
2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\projects\opencv`
3. 配置环境变量,如下流程所示
- 我的电脑->属性->高级系统设置->环境变量
- 我的电脑->属性->高级系统设置->环境变量
- 在系统变量中找到Path(如没有,自行创建),并双击编辑
- 新建,将opencv路径填入并保存,如`D:\projects\opencv\build\x64\vc14\bin`
- 新建,将opencv路径填入并保存,如`D:\projects\opencv\build\x64\vc14\bin`
### Step4: 使用Visual Studio 2019直接编译CMake
......@@ -99,4 +99,4 @@ cd D:\projects\PaddleDetection\inference\out\build\x64-Release
detection_demo.exe --conf=/path/to/your/conf --input_dir=/path/to/your/input/data/directory
```
更详细说明请参考ReadMe文档: [预测和可视化部分](../README.md)
\ No newline at end of file
更详细说明请参考ReadMe文档: [预测和可视化部分](../README.md)
推理部署
===========================================
.. toctree::
:maxdepth: 2
EXPORT_MODEL.md
INFERENCE.md
BENCHMARK_INFER_cn.md
DEPLOYMENT.md
../../../slim/distillation/README.md
\ No newline at end of file
../../../slim/nas/README.md
\ No newline at end of file
../../../slim/quantization/README.md
\ No newline at end of file
模型压缩教程
===========================================
.. toctree::
:maxdepth: 2
DISTILLATION.md
QUANTIZATION.md
NAS.md
prune/index
../../../../slim/prune/README.md
\ No newline at end of file
../../../../slim/sensitive/README.md
\ No newline at end of file
模型剪枝教程
===========================================
.. toctree::
:maxdepth: 2
PRUNE.md
SENSITIVE.md
# -*- coding: utf-8 -*-
#
# PaddleDetection documentation build configuration file, created by
# sphinx-quickstart on Thu Jan 16 11:54:53 2020.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
# sys.path.insert(0, os.path.abspath('.'))
import sphinx_rtd_theme
from recommonmark.parser import CommonMarkParser
from recommonmark.transform import AutoStructify
sys.path.insert(0, os.path.abspath(".."))
sys.path.insert(0, os.path.abspath("../../"))
DEPLOY = os.environ.get("READTHEDOCS") == "True"
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
#extensions = []
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'recommonmark',
'sphinx_markdown_tables',
# 'sphinx.ext.autosectionlabel',
]
# autosectionlabel_prefix_document = True
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
source_suffix = ['.rst', '.md']
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'PaddleDetection'
copyright = u'2020, paddlepaddle'
author = u'paddlepaddle'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = u'latest'
# The full version, including alpha/beta/rc tags.
release = u'0.1'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
# language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# The name of the Pygments (syntax highlighting) style to use.
# pygments_style = 'sphinx'
# If true, `todo` and `todoList` produce output, else they produce nothing.
# todo_include_todos = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
# html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# -- Options for HTMLHelp output ------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'PaddleDetectiondoc'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'PaddleDetection.tex', u'PaddleDetection Documentation',
u'paddlepaddle', 'manual'),
]
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, 'paddledetection', u'PaddleDetection Documentation',
[author], 1)]
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'PaddleDetection', u'PaddleDetection Documentation', author,
'PaddleDetection', 'One line description of project.', 'Miscellaneous'),
]
def url_resolver(url):
if ".html" not in url:
url = url.replace("../", "")
return "https://github.com/PaddlePaddle/PaddleDetection/tree/master" + url
else:
if DEPLOY:
return "http://paddledetection.readthedocs.io/" + url
else:
return "/" + url
#def setup(app):
# app.add_config_value(
# "recommonmark_config",
# {
# "url_resolver": url_resolver,
# "auto_toc_tree_section": "Contents",
# "enable_math": True,
# "enable_inline_math": True,
# "enable_eval_rst": True,
# 'enable_auto_doc_ref': True,
# },
# True, )
# app.add_transform(AutoStructify)
# Architecture of detection, which is also the prefix of data feed module
architecture: MaskRCNN
# Data feed module
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
# Use GPU or CPU, true by default
use_gpu: true
# Maximum number of iteration.
# In rcnn models, max_iters is 180000 if lr schedule is 1x and batch_size is 1.
max_iters: 180000
# Snapshot period. If training and test at same time, evaluate model at each snapshot_iter. 10000 by default.
snapshot_iter: 10000
# Smooth the log output in specified iterations, 20 by default.
log_smooth_window: 20
# The number of iteration interval to display in training log.
log_iter: 20
# The directory to save models.
save_dir: output
# The path of oretrained wegiths. If url is provided, it will download the pretrain_weights and decompress automatically.
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
# Evalution method, COCO and VOC are available.
metric: COCO
# The path of final model for evaluation and test.
weights: output/mask_rcnn_r50_fpn_1x/model_final/
# Number of classes, 81 for COCO and 21 for VOC
num_classes: 81
# Mask RCNN architecture, see https://arxiv.org/abs/1703.06870
MaskRCNN:
backbone: ResNet
fpn: FPN
roi_extractor: FPNRoIAlign
rpn_head: FPNRPNHead
bbox_assigner: BBoxAssigner
bbox_head: BBoxHead
mask_assigner: MaskAssigner
mask_head: MaskHead
rpn_only: false
# Backbone module
ResNet:
# Index of stages using deformable conv v2, [] by default
dcn_v2_stages: []
# ResNet depth, 50 by default
depth: 50
# Stage index of returned feature map, [2,3,4,5] by default
feature_maps:
- 2
- 3
- 4
- 5
# Stage Index of backbone to freeze, 2 by default
freeze_at: 2
# Whether freeze normalization layers, true by default
freeze_norm: true
# Weight decay for normalization layer weights, 0. by default
norm_decay: 0.0
# Normalization type, bn/sync_bn/affine_channel, affine_channel by default
norm_type: affine_channel
# ResNet variant, supports 'a', 'b', 'c', 'd' currently, b by default
variant: b
# FPN module
FPN:
# Whether has extra conv in higher levels, false by default
has_extra_convs: false
# Highest level of the backbone feature map to use, 6 by default
max_level: 6
# Lowest level of the backbone feature map to use, 6 by default
min_level: 2
# FPN normalization type, bn/sync_bn/affine_channel, null by default
norm_type: null
# Number of feature channels, 256 by default
num_chan: 256
# Feature map scaling factors, [0.03125, 0.0625, 0.125, 0.25] by default
spatial_scale:
- 0.03125
- 0.0625
- 0.125
- 0.25
# RPN module, if use non-FPN architecture, use RPNHead instead
# Extract proposals according to anchors and assign box targets and
# score targets to selected proposals to compute RPN loss. For FPN
# architecture, RPN is computed from each levels and collect proposals
# together.
FPNRPNHead:
# fluid.layers.anchor_generator
# Generate anchors for RCNN models. Each position of input produces
# N anchors. N = anchor_sizes * aspect_ratios. In FPNRPNHead, aspect_ratios
# is provided and anchor_sizes depends on FPN levels and anchor_start_size.
anchor_generator:
aspect_ratios:
- 0.5
- 1.0
- 2.0
variance:
- 1.0
- 1.0
- 1.0
- 1.0
# fluid.layers.rpn_target_assign
# Assign classification and regression targets to each anchor according
# to Intersection-over-Union(IoU) overlap between anchors and ground
# truth boxes. The classification targets is binary class labels. the
# positive labels are two kinds of anchors: the anchors with the highest
# IoU overlap with a ground-truth box, or an anchor that has an IoU overlap
# higher than rpn_positive_overlap with any ground-truth box.
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
# fluid.layers.generate_proposals in training
# Generate RoIs according to each box with probability to be a foreground
# object. The operation performs following steps: Transposes and resizes
# scores and bbox_deltas; Calculate box locations as proposal candidates;
# Clip boxes to image; Remove predicted boxes with small area; Apply NMS to
# get final proposals as output.
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
# fluid.layers.generate_proposals in test
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
# Size of anchor at the first scale, 32 by default
anchor_start_size: 32
# highest level of FPN output, 6 by default
max_level: 6
# Lowest level of FPN output, 2 by default
min_level: 2
# Number of FPN output channels, 256 by default
num_chan: 256
# Number of classes in RPN output, 1 by default
num_classes: 1
# RoI extractor module, if use non-FPN architecture, use RoIAlign instead
# For FPN architecture, proposals are distributed to different levels and
# apply roi align at each level. Then concat the outputs.
FPNRoIAlign:
# The canconical FPN feature map level, 4 by default
canconical_level: 4
# The canconical FPN feature map size, 224 by default
canonical_size: 224
# The highest level of FPN layer, 5 by default
max_level: 5
# The lowest level of FPN layer, 2 by default
min_level: 2
# Number of sampling points, 0 by default
sampling_ratio: 2
# Box resolution, 7 by default
box_resolution: 7
# Mask RoI resolution, 14 by default
mask_resolution: 14
# Mask head module
# Generate mask output and compute loss mask.
MaskHead:
# Number of convolutions, 4 for FPN, 0 otherwise. 0 by default
num_convs: 4
# size of the output mask, 14 by default
resolution: 28
# Dilation rate, 1 by default
dilation: 1
# Number of channels after first conv, 256 by default
num_chan_reduced: 256
# Number of output classes, 81 by default
num_classes: 81
# fluid.layers.generate_proposal_labels
# Combine boxes and gt_boxes, and sample foreground proposals and background
# prosals.Then assign classification and regression targets to selected RoIs.
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights:
- 0.1
- 0.1
- 0.2
- 0.2
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
shuffle_before_sample: true
# fluid.layers.generate_mask_labels
# For given the RoIs and corresponding labels, sample foreground RoIs.
# Assign mask targets to selected RoIs which are encoded to K binary masks
# of resolution M x M.
MaskAssigner:
resolution: 28
num_classes: 81
# BBox head module
# Faster bbox head following the RoI extractor, and apply post process, such as
# NMS and box coder..
BBoxHead:
# Head after RoI extractor, ResNetC5/TwoFCHead
head: TwoFCHead
# fluid.layers.multiclass_nms
# Select a subset of detection bounding boxes that have high scores larger
# than score_threshold. Then prune away boxes that have high IoU overlap
# with already selected boxes by nms_threshold.
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
# fluid.layers.box_coder
box_coder:
axis: 1
box_normalized: false
code_type: decode_center_size
prior_box_var:
- 0.1
- 0.1
- 0.2
- 0.2
num_classes: 81
# RCNN head with two Fully Connected layers
TwoFCHead:
# The number of output channels, 1024 by default
num_chan: 1024
# Learning rate configuration
LearningRate:
# Base learning rate, 0.01 by default
base_lr: 0.01
# Learning rate schedulers, PiecewiseDecay and LinearWarmup by default
schedulers:
# fluid.layers.piecewise_decay
# Values has higher priority and if values is null, learning rate is multipled by gamma at each stage
- !PiecewiseDecay
gamma: 0.1
milestones:
- 120000
- 160000
values: null
# fluid.layers.linear_lr_warmup
# Start learning rate equals to base_lr * start_factor
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
# Optimizer module
OptimizerBuilder:
# fluid.optimizer
optimizer:
momentum: 0.9
type: Momentum
# fluid.regularizer
regularizer:
factor: 0.0001
type: L2
# Data feed module for training
MaskRCNNTrainFeed:
# Batch size per device, 1 by default
batch_size: 1
# Dataset module
dataset:
# Annotation file path
annotation: annotations/instances_train2017.json
# Dataset directory
dataset_dir: dataset/coco
# Directory where image files are stored
image_dir: train2017
# List of data fields needed
fields:
- image
- im_info
- im_id
- gt_box
- gt_label
- is_crowd
- gt_mask
# list of image dims
image_shape:
- 3
- 800
- 1333
# List of sample transformations to use
sample_transforms:
# Transform the image data to numpy format.
- !DecodeImage
to_rgb: true # default: true
with_mixup: false # default: false
# Flip images randomly
# Transform the x coordinates of bboxes and segmentations
- !RandomFlipImage
is_mask_flip: true # default: false
# Whether bbox is normalized
is_normalized: false # default: false
prob: 0.5 # default: 0.5
# Normalize the image
- !NormalizeImage
# The format of image, [H, W, C]/[C, H, W], true by default
is_channel_first: false
# Whether divide by 255, true by default
is_scale: true
# default: [0.485, 0.456, 0.406]
mean:
- 0.485
- 0.456
- 0.406
# default: [1, 1, 1]
std:
- 0.229
- 0.224
- 0.225
# Rescale image to the specified target size, and capped at max_size
- !ResizeImage
# Resize method, cv2.INTER_LINEAR(1) by default
interp: 1
max_size: 1333
target_size: 800
use_cv2: true # default: true
# Change the channel
- !Permute
# The format of image, [H, W, C]/[C, H, W], true by default
channel_first: true
to_bgr: false # default: true
# List of batch transformations to use
batch_transforms:
# Pad a batch of samples to same dimensions
- !PadBatch
pad_to_stride: 32 # default: 32
# Drop last batch if size is uneven, false by default
drop_last: false
# Number of workers processes(or threads), 2 by default
num_workers: 2
# Number of samples, -1 represents all samples. -1 by default
samples: -1
# If samples should be shuffled, true by default
shuffle: true
# If update im_info after padding, false by default
use_padded_im_info: false
# If use multi-process, false by default
use_process: false
# Data feed module for test
MaskRCNNEvalFeed:
# Batch size per device, 1 by default
batch_size: 1
# Dataset module
dataset:
# Annotation file path
annotation: annotations/instances_val2017.json
# Dataset directory
dataset_dir: dataset/coco
# Directory where image files are stored
image_dir: val2017
# List of data fields needed
fields:
- image
- im_info
- im_id
- im_shape
# list of image dims
image_shape:
- 3
- 800
- 1333
# List of sample transformations to use
sample_transforms:
# Transform the image data to numpy format.
- !DecodeImage
to_rgb: true # default: true
with_mixup: false # default: false
# Normalize the image
- !NormalizeImage
# The format of image, [H, W, C]/[C, H, W], true by default
is_channel_first: false
# Whether divide by 255, true by default
is_scale: true
# default: [0.485, 0.456, 0.406]
mean:
- 0.485
- 0.456
- 0.406
# default: [1, 1, 1]
std:
- 0.229
- 0.224
- 0.225
# Rescale image to the specified target size, and capped at max_size
- !ResizeImage
# Resize method, cv2.INTER_LINEAR(1) by default
interp: 1
max_size: 1333
target_size: 800
use_cv2: true # default: true
# Change the channel
- !Permute
# The format of image, [H, W, C]/[C, H, W], true by default
channel_first: true
to_bgr: false # default: true
# List of batch transformations to use
batch_transforms:
# Pad a batch of samples to same dimensions
- !PadBatch
pad_to_stride: 32 # default: 32
# Drop last batch if size is uneven, false by default
drop_last: false
# Number of workers processes(or threads), 2 by default
num_workers: 2
# Number of samples, -1 represents all samples. -1 by default
samples: -1
# If samples should be shuffled, true by default
shuffle: false
# If update im_info after padding, false by default
use_padded_im_info: true
# If use multi-process, false by default
use_process: false
# Data feed module for test
MaskRCNNTestFeed:
# Batch size per device, 1 by default
batch_size: 1
# Dataset module
dataset:
# Annotation file path
annotation: dataset/coco/annotations/instances_val2017.json
# List of data fields needed
fields:
- image
- im_info
- im_id
- im_shape
# list of image dims
image_shape:
- 3
- 800
- 1333
# List of sample transformations to use
sample_transforms:
# Transform the image data to numpy format.
- !DecodeImage
to_rgb: true # default: true
with_mixup: false # default: false
# Normalize the image
- !NormalizeImage
# The format of image, [H, W, C]/[C, H, W], true by default
is_channel_first: false
# Whether divide by 255, true by default
is_scale: true
# default: [0.485, 0.456, 0.406]
mean:
- 0.485
- 0.456
- 0.406
# default: [1, 1, 1]
std:
- 0.229
- 0.224
- 0.225
# Change the channel
- !Permute
# The format of image, [H, W, C]/[C, H, W], true by default
channel_first: true
to_bgr: false # default: true
# List of batch transformations to use
batch_transforms:
# Pad a batch of samples to same dimensions
- !PadBatch
pad_to_stride: 32 # default: 32
# Drop last batch if size is uneven, false by default
drop_last: false
# Number of workers processes(or threads), 2 by default
num_workers: 2
# Number of samples, -1 represents all samples. -1 by default
samples: -1
# If samples should be shuffled, true by default
shuffle: false
# If update im_info after padding, false by default
use_padded_im_info: true
# If use multi-process, false by default
use_process: false
# Architecture of detection, which is also the prefix of data feed module.
architecture: SSD
# Data feed module.
# Data feed in training.
train_feed: SSDTrainFeed
# Data feed in Evaluation.
eval_feed: SSDEvalFeed
# Data feed in infer.
test_feed: SSDTestFeed
# Use GPU or CPU, true by default.
use_gpu: true
# Maximum number of iteration.
max_iters: 400000
# Snapshot period. If training and test at same time, evaluate model at each snapshot_iter. 10000 by default.
snapshot_iter: 10000
# Smooth the log output in specified iterations, 20 by default.
log_smooth_window: 20
# The log in training is displayed once every period.
log_iter: 20
# Evaluation method, COCO and VOC are available.
metric: COCO
# Evaluation mAP calculation method in VOC metric, 11point and integral are available.
map_type: 11point
# The path of final model for evaluation and test.
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_caffe_pretrained.tar
# The directory to save models.
save_dir: output
# The path of final model for evaluation and test.
weights: output/ssd_vgg16_300/model_final
# Number of classes, 81 for COCO and 21 for VOC.
num_classes: 81
# SSD architecture, see https://arxiv.org/abs/1512.02325
SSD:
# backbone instance, defined below.
backbone: VGG
# `MultiBoxHead` instance, defined below.
multi_box_head: MultiBoxHead
# fluid.layers.detection_output, Detection Output Layer for SSD.
# This operation is to get the detection results by performing following two steps:
# 1. Decode input bounding box predictions according to the prior boxes.
# 2. Get the final detection results by applying multi-class non maximum suppression (NMS).
# this operation doesn’t clip the final output bounding boxes to the image window.
output_decoder:
# The index of background label, the background label will be ignored.
# If set to -1, then all categories will be considered.
background_label: 0
# Number of total bboxes to be kept per image after NMS.
keep_top_k: 200
# The parameter for adaptive NMS.
nms_eta: 1.0
# The threshold to be used in NMS.
nms_threshold: 0.45
# Maximum number of detections to be kept according to the confidences
# aftern the filtering detections based on score_threshold.
nms_top_k: 400
# Threshold to filter out bounding boxes with low confidence score.
# If not provided, consider all boxes.
score_threshold: 0.01
# VGG backbone, see https://arxiv.org/abs/1409.1556
VGG:
# the VGG net depth (16 or 19
depth: 16
# whether or not extra blocks should be added
with_extra_blocks: true
# in each extra block, params:
# [in_channel, out_channel, padding_size, stride_size, filter_size]
extra_block_filters:
- [256, 512, 1, 2, 3]
- [128, 256, 1, 2, 3]
- [128, 256, 0, 1, 3]
- [128, 256, 0, 1, 3]
# params list of init scale in l2 norm, skip init scale if param is -1.
normalizations: [20., -1, -1, -1, -1, -1]
# fluid.layers.multi_box_head, Generate prior boxes for SSD algorithm.
# Generate `prior_box` according to the inputs list and other parameters
# Each position of the input produce N prior boxes, N is determined by
# the count of min_sizes, max_sizes and aspect_ratios, The size of the box
# is in range(min_size, max_size) interval, which is generated in sequence
# according to the aspect_ratios.
MultiBoxHead:
# the base_size is used to get min_size and max_size according to min_ratio and max_ratio.
base_size: 300
# the aspect ratios of generated prior boxes. The length of input and aspect_ratios must be equal.
aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2.], [2.]]
# the min ratio of generated prior boxes.
min_ratio: 15
# the max ratio of generated prior boxes.
max_ratio: 90
# If len(inputs) <=2, min_sizes must be set up, and the length of min_sizes
# should equal to the length of inputs. Default: None.
min_sizes: [30.0, 60.0, 111.0, 162.0, 213.0, 264.0]
# If len(inputs) <=2, max_sizes must be set up, and the length of min_sizes
# should equal to the length of inputs. Default: None.
max_sizes: [60.0, 111.0, 162.0, 213.0, 264.0, 315.0]
# If step_w and step_h are the same, step_w and step_h can be replaced by steps.
steps: [8, 16, 32, 64, 100, 300]
# Prior boxes center offset. Default: 0.5
offset: 0.5
# Whether to flip aspect ratios. Default:False.
flip: true
# The kernel size of conv2d. Default: 1.
kernel_size: 3
# The padding of conv2d. Default:0.
pad: 1
# Learning rate configuration
LearningRate:
# Base learning rate, 0.01 by default
base_lr: 0.001
# Learning rate schedulers, PiecewiseDecay and LinearWarmup by default
schedulers:
# fluid.layers.piecewise_decay
# Values has higher priority and if values is null, learning rate is multipled by gamma at each stage
- !PiecewiseDecay
gamma: 0.1
milestones: [280000, 360000]
# fluid.layers.linear_lr_warmup
# Start learning rate equals to base_lr * start_factor
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
# Optimizer module
OptimizerBuilder:
# fluid.optimizer, Neural network in essence is a Optimization problem .
# With forward computing and back propagation , Optimizer use back-propagation
# gradients to optimize parameters in a neural network.
optimizer:
# Momentum optimizer adds momentum on the basis of SGD ,
# reducing noise problem in the process of random gradient descent.
momentum: 0.9
type: Momentum
# fluid.regularizer
regularizer:
# implements the L2 Weight Decay Regularization
# Small values of L2 can help prevent over fitting the training data.
factor: 0.0005
type: L2
# Data feed module for training
SSDTrainFeed:
# Batch size per device
batch_size: 16
# list of batch transformations to use
batch_transforms: []
# The data buffer size
bufsize: 10
# Dataset module
dataset:
# Dataset directory
dataset_dir: dataset/coco
# Annotation file path
annotation: annotations/instances_train2017.json
# Directory where image files are stored
image_dir: train2017
# Drop last batch if size is uneven, false by default
drop_last: true
# List of data fields needed
fields: [image, gt_box, gt_label]
# list of image dims
image_shape: [3, 300, 300]
# number of workers processes (or threads)
num_workers: 8
# List of sample transformations to use
sample_transforms:
# Transform the image data to numpy format.
- !DecodeImage
# whether to convert BGR to RGB
to_rgb: true # default: true
# whether or not to mixup image and gt_bbbox/gt_score
with_mixup: false # default: false
# Transform the bounding box's coornidates to [0,1].
- !NormalizeBox {}
# modify image brightness,contrast,saturation,hue,reordering channels and etc.
- !RandomDistort
# brightness_lower/ brightness_upper (float): the brightness
# between brightness_lower and brightness_upper
brightness_lower: 0.875
brightness_upper: 1.125
# brightness_prob (float): the probability of changing brightness
brightness_prob: 0.5
# contrast_lower/ contrast_upper (float): the contrast between
# contrast_lower and contrast_lower
contrast_lower: 0.5
contrast_upper: 1.5
# contrast_prob (float): the probability of changing contrast
contrast_prob: 0.5
# count (int): the kinds of doing distrot
count: 4
# hue_lower/ hue_upper (float): the hue between hue_lower and hue_upper
hue_lower: -18
hue_upper: 18
# hue_prob (float): the probability of changing hue
hue_prob: 0.5
# is_order (bool): whether determine the order of distortion
is_order: true
# saturation_lower/ saturation_upper (float): the saturation
# between saturation_lower and saturation_upper
saturation_lower: 0.5
saturation_upper: 1.5
# saturation_prob (float): the probability of changing saturation
saturation_prob: 0.5
#Expand the image and modify bounding box.
# Operators:
# 1. Scale the image weight and height.
# 2. Construct new images with new height and width.
# 3. Fill the new image with the mean.
# 4. Put original imge into new image.
# 5. Rescale the bounding box.
# 6. Determine if the new bbox is satisfied in the new image.
- !ExpandImage
# max_ratio (float): the ratio of expanding
max_ratio: 4
# mean (list): the pixel mean
mean: [104, 117, 123]
# prob (float): the probability of expanding image
prob: 0.5
# Crop the image and modify bounding box.
# Operators:
# 1. Scale the image weight and height.
# 2. Crop the image according to a radom sample.
# 3. Rescale the bounding box.
# 4. Determine if the new bbox is satisfied in the new image.
- !CropImage
# avoid_no_bbox (bool): whether to to avoid the
# situation where the box does not appear.
avoid_no_bbox: false
# batch_sampler (list): Multiple sets of different parameters for cropping.
batch_sampler:
- [1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0]
- [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 0.0]
- [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 0.0]
- [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 0.0]
- [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 0.0]
- [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 0.0]
- [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]
# satisfy_all (bool): whether all boxes must satisfy.
satisfy_all: false
# Rescale image to the specified target size, and capped at max_size if max_size != 0.
# If target_size is list, selected a scale randomly as the specified target size.
- !ResizeImage
# Resize method, cv2.INTER_LINEAR(1) by default
interp: 1
# max_size (int): the max size of image
max_size: 0
# target_size (int|list): the target size of image's short side,
# multi-scale training is adopted when type is list.
target_size: 300
# use_cv2 (bool): use the cv2 interpolation method or use PIL interpolation method
use_cv2: false
# Filp the image and bounding box.
# Operators:
# 1. Flip the image numpy.
# 2. Transform the bboxes' x coordinates. (Must judge whether the coordinates are normalized!)
# 3. Transform the segmentations' x coordinates. (Must judge whether the coordinates are normalized!)
- !RandomFlipImage
# is_mask_flip (bool): whether flip the segmentation
is_mask_flip: false
# is_normalized (bool): whether the bbox scale to [0,1]
is_normalized: true
# prob (float): the probability of flipping image
prob: 0.5
# Change the channel
- !Permute
# The format of image, [H, W, C]/[C, H, W], true by default
channel_first: true
# to_bgr (bool): confirm whether to convert RGB to BGR
to_bgr: true
# Normalize the image.
# Operators:
# 1.(optional) Scale the image to [0,1]
# 2. Each pixel minus mean and is divided by std
- !NormalizeImage
# The format of image, [H, W, C]/[C, H, W], true by default
is_channel_first: true
# Whether divide by 255, true by default
is_scale: false
# mean (list): the pixel mean
mean: [104, 117, 123]
# std (list): the pixel variance
std: [1, 1, 1]
# Number of samples, -1 represents all samples. -1 by default
samples: -1
# If samples should be shuffled, true by default
shuffle: true
# If use multi-process, false by default
use_process: true
# Data feed module for Eval
SSDEvalFeed:
# Batch size per device
batch_size: 32
# list of batch transformations to use
batch_transforms: []
# The data buffer size
bufsize: 10
# Dataset module
dataset:
# Dataset directory
dataset_dir: dataset/coco
# Annotation file path
annotation: annotations/instances_val2017.json
# Directory where image files are stored
image_dir: val2017
# Drop last batch if size is uneven, false by default
drop_last: true
# List of data fields needed
fields: [image, im_shape, im_id, gt_box, gt_label, is_difficult]
# list of image dims
image_shape: [3, 300, 300]
# number of workers processes (or threads)
num_workers: 8
# List of sample transformations to use
sample_transforms:
# Transform the image data to numpy format.
- !DecodeImage
# whether to convert BGR to RGB
to_rgb: true # default: true
# whether or not to mixup image and gt_bbbox/gt_score
with_mixup: false # default: false
# Transform the bounding box's coornidates to [0,1].
- !NormalizeBox {}
# Rescale image to the specified target size, and capped at max_size if max_size != 0.
# If target_size is list, selected a scale randomly as the specified target size.
- !ResizeImage
# Resize method, cv2.INTER_LINEAR(1) by default
interp: 1
# max_size (int): the max size of image
max_size: 0
# target_size (int|list): the target size of image's short side,
# multi-scale training is adopted when type is list.
target_size: 300
# use_cv2 (bool): use the cv2 interpolation method or use PIL interpolation method
use_cv2: false
- !Permute
# The format of image, [H, W, C]/[C, H, W], true by default
channel_first: true
# to_bgr (bool): confirm whether to convert RGB to BGR
to_bgr: true
# Normalize the image.
# Operators:
# 1.(optional) Scale the image to [0,1]
# 2. Each pixel minus mean and is divided by std
- !NormalizeImage
# The format of image, [H, W, C]/[C, H, W], true by default
is_channel_first: true
# Whether divide by 255, true by default
is_scale: false
# mean (list): the pixel mean
mean: [104, 117, 123]
# std (list): the pixel variance
std: [1, 1, 1]
# Number of samples, -1 represents all samples. -1 by default
samples: -1
# If samples should be shuffled, true by default
shuffle: false
# If use multi-process, false by default
use_process: false
# Data feed module for test
SSDTestFeed:
# Batch size per device
batch_size: 1
# list of batch transformations to use
batch_transforms: []
# The data buffer size
bufsize: 10
# Dataset module
dataset:
# Annotation file path
annotation: dataset/coco/annotations/instances_val2017.json
# Drop last batch if size is uneven, false by default
drop_last: false
# List of data fields needed
fields: [image, im_id]
# list of image dims
image_shape: [3, 300, 300]
# number of workers processes (or threads)
num_workers: 8
# List of sample transformations to use
sample_transforms:
# Transform the image data to numpy format.
- !DecodeImage
# whether to convert BGR to RGB
to_rgb: true # default: true
# whether or not to mixup image and gt_bbbox/gt_score
with_mixup: false # default: false
# Rescale image to the specified target size, and capped at max_size if max_size != 0.
# If target_size is list, selected a scale randomly as the specified target size.
- !ResizeImage
# Resize method, cv2.INTER_LINEAR(1) by default
interp: 1
# max_size (int): the max size of image
max_size: 0
# target_size (int|list): the target size of image's short side,
# multi-scale training is adopted when type is list.
target_size: 300
# use_cv2 (bool): use the cv2 interpolation method or use PIL interpolation method
use_cv2: false
- !Permute
# The format of image, [H, W, C]/[C, H, W], true by default
channel_first: true
# to_bgr (bool): confirm whether to convert RGB to BGR
to_bgr: true
# Normalize the image.
# Operators:
# 1.(optional) Scale the image to [0,1]
# 2. Each pixel minus mean and is divided by std
- !NormalizeImage
# The format of image, [H, W, C]/[C, H, W], true by default
is_channel_first: true
# Whether divide by 255, true by default
is_scale: false
# mean (list): the pixel mean
mean: [104, 117, 123]
# std (list): the pixel variance
std: [1, 1, 1]
# Number of samples, -1 represents all samples. -1 by default
samples: -1
# If samples should be shuffled, true by default
shuffle: false
# If use multi-process, false by default
use_process: false
# Architecture of detection, which is also the prefix of data feed module
architecture: YOLOv3
# Data feed module.
train_feed: YoloTrainFeed
eval_feed: YoloEvalFeed
test_feed: YoloTestFeed
# Use GPU or CPU, true by default.
use_gpu: true
# Maximum number of iteration.
# In YOLOv3 model, default iteration number is to train for 270 epoches.
max_iters: 500200
# Smooth the log output in specified iterations, 20 by default.
log_smooth_window: 20
# The number of iteration interval to display in training log.
log_iter: 20
# The directory to save models.
save_dir: output
# Snapshot period. If training and test at same time, evaluate model at each snapshot_iter. 2000 by default.
snapshot_iter: 2000
# Evalution method, COCO and VOC are available.
metric: COCO
# The path of oretrained wegiths. If url is provided, it will be downloaded and decompressed automatically.
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar
# The path of final model for evaluation and test.
weights: output/yolov3_darknet/model_final
# Number of classes, 80 for COCO and 20 for VOC.
num_classes: 80
# Whether use fine grained YOLOv3 loss, if true, build YOLOv3 loss by python code with common OPs,
# if false, use fluid.layer.yolov3_loss OP to calculate YOLOv3 loss, the former one is better
# for redesign YOLOv3 loss, the latter one is better for training by original YOLOv3 loss
use_fine_grained_loss: false
# YOLOv3 architecture, see https://arxiv.org/abs/1804.02767
YOLOv3:
backbone: DarkNet
yolo_head: YOLOv3Head
# Backbone module
DarkNet:
# Batch normalization type in training, sync_bn for synchronized batch normalization
norm_type: sync_bn
# L2 weight decay factor of batch normalization layer
norm_decay: 0.
# Darknet convolution layer number, only support 53 currently
depth: 53
# YOLOv3 head module
# Generate bbox output in evaluation and calculate loss in training
# fluid.layers.yolov3_loss / fluid.layers.yolo_box
YOLOv3Head:
# anchor mask of 3 yolo_loss/yolo_box layers, each yolo_loss/yolo_box layer has 3 anchors
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
# 9 anchors for 3 yolo_loss/yolo_box layer, generated by perform kmeans on COCO gtboxes
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
# L2 weight decay factor of batch normalization layer
norm_decay: 0.
# use YOLOv3Loss, which will be defined in following YOLOv3Loss segmentation.
yolo_loss: YOLOv3Loss
# fluid.layers.multiclass_nms
# Non-max suppress for output prediction boxes, see multiclass_nms for following parameters.
# 1. Select detection bounding boxes with high scores larger than score_threshold.
# 2. Select detection bounding boxes with the largest nms_top_k scores.
# 3. Suppress detection bounding boxes which have high IoU overlap witch already selected boxes.
# 4. Keep the top keep_top_k detection bounding boxes as output.
nms:
# Which label is regard as backgroud and will be ignored, -1 for no backgroud label.
background_label: -1
# Number of total bboxes to be kept per image after NMS step.
keep_top_k: 100
# IoU threshold for NMS, bbox with IoU over nms_threshold will be suppressed.
nms_threshold: 0.45
# Maximum number of detections to be kept according to the confidences after the filtering detections based on score_threshold.
nms_top_k: 1000
# Whether detections are normalized.
normalized: false
# Threshold to filter out bounding boxes with low confidence score.
score_threshold: 0.01
YOLOv3Loss:
# training batch size, this will be used when use_fine_grained_loss is set as True.
# ATTENTION: this should be same as batch size defined in YoloTrainFeed in fine
# grained loss mode.
batch_size: 8
# Ignore threshold for yolo_loss layer, 0.7 by default.
# Objectness loss will be ignored if a predcition bbox overlap a gtbox over ignore_thresh.
ignore_thresh: 0.7
# Whether use label smooth in yolo_loss layer
# It is recommended to set as true when only num_classes is very big
label_smooth: false
# Learning rate configuration
LearningRate:
# Base learning rate for training, 1e-3 by default.
base_lr: 0.001
# Learning rate schedulers, PiecewiseDecay and LinearWarmup by default
schedulers:
# fluid.layers.piecewise_decay
# each milestone stage decay gamma
- !PiecewiseDecay
gamma: 0.1
milestones:
- 400000
- 450000
# fluid.layers.linear_lr_warmup
# Start learning rate equals to base_lr * start_factor
- !LinearWarmup
start_factor: 0.
steps: 4000
# Optimizer module
OptimizerBuilder:
# fluid.optimizer
optimizer:
momentum: 0.9
type: Momentum
# fluid.regularizer
regularizer:
factor: 0.0005
type: L2
# Data feed module for training
YoloTrainFeed:
# Batch size per device, 8 by default
batch_size: 8
# Dataset module
dataset:
# Dataset directory.
dataset_dir: dataset/coco
# Annotation file path.
annotation: annotations/instances_train2017.json
# Directory where image files are stored.
image_dir: train2017
# List of data fields needed.
fields: [image, gt_box, gt_label, gt_score]
# List of image dims
image_shape: [3, 608, 608]
# List of sample transformations to use.
sample_transforms:
# read image data and decode to numpy.
- !DecodeImage
to_rgb: true
# YOLOv3 use image mixup in training.
with_mixup: true
# Mixup two images in training, a trick to improve performance.
- !MixupImage
alpha: 1.5 # default: 1.5
beta: 1.5 # default: 1.5
# Normalize gtbox to range [0, 1]
- !NormalizeBox {}
# Random color distort: brightness, contrast, hue, saturation.
- !RandomDistort
brightness_lower: 0.5
brightness_prob: 0.5
brightness_upper: 1.5
contrast_lower: 0.5
contrast_prob: 0.5
contrast_upper: 1.5
count: 4
hue_lower: -18
hue_prob: 0.5
hue_upper: 18
is_order: false
saturation_lower: 0.5
saturation_prob: 0.5
saturation_upper: 1.5
# Random Expand the image and modify bounding box.
# Operators:
# 1. Scale the image weight and height.
# 2. Construct new images with new height and width.
# 3. Fill the new image with the mean.
# 4. Put original imge into new image.
# 5. Rescale the bounding box.
# 6. Determine if the new bbox is satisfied in the new image.
- !ExpandImage
# max expand ratio, default 4.0.
max_ratio: 4.0
mean: [123.675, 116.28, 103.53]
prob: 0.5
# Random Crop the image and modify bounding box.
# Operators:
# 1. Scale the image weight and height.
# 2. Crop the image according to a radom sample.
# 3. Rescale the bounding box.
# 4. Determine if the new bbox is satisfied in the new image.
- !CropImage
# Recrop image if there are no bbox in output cropped image.
avoid_no_bbox: true
batch_sampler: [[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
# Whether should all bbox satisfy IoU constrains.
satisfy_all: false
# Interpolate image to target_size with random interpolate method:
# cv2.INTER_NEAREST,
# cv2.INTER_LINEAR,
# cv2.INTER_AREA,
# cv2.INTER_CUBIC,
# cv2.INTER_LANCZOS4,
- !RandomInterpImage
max_size: 0
target_size: 608
# Filp the image and bounding box.
# Operators:
# 1. Flip the image numpy.
# 2. Transform the bboxes' x coordinates. (Must judge whether the coordinates are normalized!)
# 3. Transform the segmentations' x coordinates. (Must judge whether the coordinates are normalized!)
- !RandomFlipImage
is_mask_flip: false
is_normalized: true
prob: 0.5
# Normalize the image.
# Operators:
# 1.(optional) Scale the image to [0,1]
# 2. Each pixel minus mean and is divided by std
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
# Change data layout to [C, H, W].
- !Permute
channel_first: true
to_bgr: false
# List of batch transformations to use.
batch_transforms:
# Random reshape images in each mini-batch to different shapes.
- !RandomShape
sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
# YOLOv3 read gtbox into zero padded tensor with max box number as 50.
num_max_boxes: 50
# YOLOv3 read gtlabel without regarding backgroud as label 0.
with_background: false
# Number of samples, -1 represents all samples. -1 by default.
samples: -1
# Whether samples should be shuffled, true by default.
shuffle: true
# Whether drop last images which less than a batch.
drop_last: true
# Whether use multi-process reader in training.
use_process: true
# Use multi-process reader number.
num_workers: 8
# Buffer size for reader.
bufsize: 128
# Mixup image epoch number.
mixup_epoch: 250
# Data feed module for evaluation
YoloEvalFeed:
batch_size: 8
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
batch_transforms: []
fields: [image, im_size, im_id, gt_box, gt_label, is_difficult]
image_shape: [3, 608, 608]
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
# Rescale image to the specified target size, and capped at max_size if max_size != 0.
# If target_size is list, selected a scale randomly as the specified target size.
- !ResizeImage
interp: 2 # 2 for cv2.INTER_CUBIC
max_size: 0
target_size: 608
use_cv2: true
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
- !Permute
channel_first: true
to_bgr: false
num_max_boxes: 50
samples: -1
shuffle: false
drop_last: false
# Use multi-thread reader in evaluation mode.
use_process: false
# Thread number for multi-thread reader.
num_workers: 8
with_background: false
# Data feed module for test
YoloTestFeed:
batch_size: 1
dataset:
annotation: dataset/coco/annotations/instances_val2017.json
batch_transforms: []
fields: [image, im_size, im_id]
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !ResizeImage
interp: 2
max_size: 0
target_size: 608
use_cv2: true
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
- !Permute
channel_first: true
to_bgr: false
num_max_boxes: 50
samples: -1
shuffle: false
drop_last: false
# Use multi-thread reader in test mode.
use_process: false
num_workers: 8
with_background: false
......@@ -2,17 +2,13 @@
## 简介
CACascade RCNN是百度视觉技术部在Objects365 2019 Challenge上夺冠的最佳单模型之一,Objects365是在通用物体检测领域的一个全新的数据集,旨在促进对自然场景不同对象的检测研究。Objects365在63万张图像上标注了365个对象类,训练集中共有超过1000万个边界框。这里放出的是Full Track任务中最好的单模型之一。
<div align="center">
<img src="../demo/obj365_gt.png"/>
</div>
![](../images/obj365_gt.png)
## 方法描述
针对大规模物体检测算法的特点,我们提出了一种基于图片包含物体类别的数量的采样方式(Class Aware Sampling)。基于这种方式进行训练模型可以在更短的时间使模型收敛到更好的效果。
<div align="center">
<img src="../demo/cas.png"/>
</div>
![](../images/cas.png)
本次公布的最好单模型是一个基于Cascade RCNN的两阶段检测模型,在此基础上将Backbone替换为更加强大的SENet154模型,Deformable Conv模块以及更复杂二阶段网络结构,针对BatchSize比较小的情况增加了Group Normalization操作并同时使用了多尺度训练,最终达到了非常理想的效果。预训练模型先后分别在ImageNet和COCO数据集上进行了训练,其中在COCO数据集上训练时增加了Mask分支,其余结构与CACascade RCNN相同, 会在启动训练时自动下载。
......@@ -46,6 +42,4 @@ python tools/train.py -c configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn.yml
## 模型效果
<div align="center">
<img src="../demo/obj365_pred.png"/>
</div>
![](../images/obj365_pred.png)
English | [简体中文](CONTRIB_cn.md)
# PaddleDetection applied for specific scenarios
We provide some models implemented by PaddlePaddle to detect objects in specific scenarios, users can download the models and use them in these scenarios.
| Task | Algorithm | Box AP | Download |
|:---------------------|:---------:|:------:| :-------------------------------------------------------------------------------------: |
| Vehicle Detection | YOLOv3 | 54.5 | [model](https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar) |
| Pedestrian Detection | YOLOv3 | 51.8 | [model](https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar) |
## Vehicle Detection
One of major applications of vehichle detection is traffic monitoring. In this scenary, vehicles to be detected are mostly captured by the cameras mounted on top of traffic light columns.
### 1. Network
The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53.
### 2. Configuration for training
PaddleDetection provides users with a configuration file [yolov3_darknet.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_darknet.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for vehicle detection:
* max_iters: 120000
* num_classes: 6
* anchors: [[8, 9], [10, 23], [19, 15], [23, 33], [40, 25], [54, 50], [101, 80], [139, 145], [253, 224]]
* label_smooth: false
* nms/nms_top_k: 400
* nms/score_threshold: 0.005
* milestones: [60000, 80000]
* dataset_dir: dataset/vehicle
### 3. Accuracy
The accuracy of the model trained and evaluated on our private data is shown as followed:
AP at IoU=.50:.05:.95 is 0.545.
AP at IoU=.50 is 0.764.
### 4. Inference
Users can employ the model to conduct the inference:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/infer.py -c contrib/VehicleDetection/vehicle_yolov3_darknet.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar \
--infer_dir contrib/VehicleDetection/demo \
--draw_threshold 0.2 \
--output_dir contrib/VehicleDetection/demo/output
```
Some inference results are visualized below:
![](../images/VehicleDetection_001.jpeg)
![](../images/VehicleDetection_005.png)
## Pedestrian Detection
The main applications of pedetestrian detection include intelligent monitoring. In this scenary, photos of pedetestrians are taken by surveillance cameras in public areas, then pedestrian detection are conducted on these photos.
### 1. Network
The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53.
### 2. Configuration for training
PaddleDetection provides users with a configuration file [yolov3_darknet.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_darknet.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for pedestrian detection:
* max_iters: 200000
* num_classes: 1
* snapshot_iter: 5000
* milestones: [150000, 180000]
* dataset_dir: dataset/pedestrian
### 3. Accuracy
The accuracy of the model trained and evaluted on our private data is shown as followed:
AP at IoU=.50:.05:.95 is 0.518.
AP at IoU=.50 is 0.792.
### 4. Inference
Users can employ the model to conduct the inference:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/infer.py -c contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar \
--infer_dir contrib/PedestrianDetection/demo \
--draw_threshold 0.3 \
--output_dir contrib/PedestrianDetection/demo/output
```
Some inference results are visualized below:
![](../images/PedestrianDetection_001.png)
![](../images/PedestrianDetection_004.png)
# PaddleDetection 特色垂类检测模型
[English](CONTRIB.md) | 简体中文
# 特色垂类检测模型
我们提供了针对不同场景的基于PaddlePaddle的检测模型,用户可以下载模型进行使用。
......@@ -54,9 +55,9 @@ python -u tools/infer.py -c contrib/VehicleDetection/vehicle_yolov3_darknet.yml
预测结果示例:
![](VehicleDetection/demo/output/001.jpeg)
![](../images/VehicleDetection_001.jpeg)
![](VehicleDetection/demo/output/005.png)
![](../images/VehicleDetection_005.png)
## 行人检测(Pedestrian Detection)
......@@ -101,6 +102,6 @@ python -u tools/infer.py -c contrib/PedestrianDetection/pedestrian_yolov3_darkne
预测结果示例:
![](PedestrianDetection/demo/output/001.png)
![](../images/PedestrianDetection_001.png)
![](PedestrianDetection/demo/output/004.png)
![](../images/PedestrianDetection_004.png)
[English](FACE_DETECTION_en.md) | 简体中文
# FaceDetection
## 内容
- [简介](#简介)
- [模型库与基线](#模型库与基线)
- [快速开始](#快速开始)
- [数据准备](#数据准备)
- [训练与推理](#训练与推理)
- [评估](#评估)
- [算法细节](#算法细节)
- [如何贡献代码](#如何贡献代码)
## 简介
FaceDetection的目标是提供高效、高速的人脸检测解决方案,包括最先进的模型和经典模型。
![](../images/12_Group_Group_12_Group_Group_12_935.jpg)
## 模型库与基线
下表中展示了PaddleDetection当前支持的网络结构,具体细节请参考[算法细节](#算法细节)
| | 原始版本 | Lite版本 <sup>[1](#lite)</sup> | NAS版本 <sup>[2](#nas)</sup> |
|:------------------------:|:--------:|:--------------------------:|:------------------------:|
| [BlazeFace](#BlazeFace) | ✓ | ✓ | ✓ |
| [FaceBoxes](#FaceBoxes) | ✓ | ✓ | x |
<a name="lite">[1]</a> `Lite版本`表示减少网络层数和通道数。
<a name="nas">[2]</a> `NA版本`表示使用 `神经网络搜索`方法来构建网络结构。
### 模型库
#### WIDER-FACE数据集上的mAP
| 网络结构 | 类型 | 输入尺寸 | 图片个数/GPU | 学习率策略 | Easy Set | Medium Set | Hard Set | 下载 |
|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:|
| BlazeFace | 原始版本 | 640 | 8 | 32w | **0.915** | **0.892** | **0.797** | [模型](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_original.tar) |
| BlazeFace | Lite版本 | 640 | 8 | 32w | 0.909 | 0.885 | 0.781 | [模型](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) |
| BlazeFace | NAS版本 | 640 | 8 | 32w | 0.837 | 0.807 | 0.658 | [模型](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) |
| FaceBoxes | 原始版本 | 640 | 8 | 32w | 0.878 | 0.851 | 0.576 | [模型](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_original.tar) |
| FaceBoxes | Lite版本 | 640 | 8 | 32w | 0.901 | 0.875 | 0.760 | [模型](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_lite.tar) |
**注意:**
- 我们使用`tools/face_eval.py`中多尺度评估策略得到`Easy/Medium/Hard Set`里的mAP。具体细节请参考[在WIDER-FACE数据集上评估](#在WIDER-FACE数据集上评估)
- BlazeFace-Lite的训练与测试使用 [blazeface.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/face_detection/blazeface.yml)配置文件并且设置:`lite_edition: true`
#### FDDB数据集上的mAP
| 网络结构 | Type | Size | DistROC | ContROC |
|:------------:|:--------:|:----:|:-------:|:-------:|
| BlazeFace | 原始版本 | 640 | **0.992** | **0.762** |
| BlazeFace | Lite版本 | 640 | 0.990 | 0.756 |
| BlazeFace | NAS版本 | 640 | 0.981 | 0.741 |
| FaceBoxes | 原始版本 | 640 | 0.987 | 0.736 |
| FaceBoxes | Lite版本 | 640 | 0.988 | 0.751 |
**注意:**
- 我们在FDDB数据集上使用多尺度测试的方法得到mAP,具体细节请参考[在FDDB数据集上评估](#在FDDB数据集上评估)
#### 推理时间和模型大小比较
| 网络结构 | 类型 | 输入尺寸 | P4(trt32) (ms) | CPU (ms) | 高通骁龙855(armv8) (ms) | 模型大小(MB) |
|:------------:|:--------:|:----:|:--------------:|:--------:|:-------------------------------------:|:---------------:|
| BlazeFace | 原始版本 | 128 | 1.387 | 23.461 | 6.036 | 0.777 |
| BlazeFace | Lite版本 | 128 | 1.323 | 12.802 | 6.193 | 0.68 |
| BlazeFace | NAS版本 | 128 | 1.03 | 6.714 | 2.7152 | 0.234 |
| FaceBoxes | 原始版本 | 128 | 3.144 | 14.972 | 19.2196 | 3.6 |
| FaceBoxes | Lite版本 | 128 | 2.295 | 11.276 | 8.5278 | 2 |
| BlazeFace | 原始版本 | 320 | 3.01 | 132.408 | 70.6916 | 0.777 |
| BlazeFace | Lite版本 | 320 | 2.535 | 69.964 | 69.9438 | 0.68 |
| BlazeFace | NAS版本 | 320 | 2.392 | 36.962 | 39.8086 | 0.234 |
| FaceBoxes | 原始版本 | 320 | 7.556 | 84.531 | 52.1022 | 3.6 |
| FaceBoxes | Lite版本 | 320 | 18.605 | 78.862 | 59.8996 | 2 |
| BlazeFace | 原始版本 | 640 | 8.885 | 519.364 | 149.896 | 0.777 |
| BlazeFace | Lite版本 | 640 | 6.988 | 284.13 | 149.902 | 0.68 |
| BlazeFace | NAS版本 | 640 | 7.448 | 142.91 | 69.8266 | 0.234 |
| FaceBoxes | 原始版本 | 640 | 78.201 | 394.043 | 169.877 | 3.6 |
| FaceBoxes | Lite版本 | 640 | 59.47 | 313.683 | 139.918 | 2 |
**注意:**
- CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz。
- P4(trt32)和CPU的推理时间测试基于PaddlePaddle-1.6.1版本。
- ARM测试环境:
- 高通骁龙855(armv8);
- 单线程;
- Paddle-Lite 2.0.0版本。
## 快速开始
### 数据准备
我们使用[WIDER-FACE数据集](http://shuoyang1213.me/WIDERFACE/)进行训练和模型测试,官方网站提供了详细的数据介绍。
- WIDER-Face数据源:
使用如下目录结构加载`wider_face`类型的数据集:
```
dataset/wider_face/
├── wider_face_split
│ ├── wider_face_train_bbx_gt.txt
│ ├── wider_face_val_bbx_gt.txt
├── WIDER_train
│ ├── images
│ │ ├── 0--Parade
│ │ │ ├── 0_Parade_marchingband_1_100.jpg
│ │ │ ├── 0_Parade_marchingband_1_381.jpg
│ │ │ │ ...
│ │ ├── 10--People_Marching
│ │ │ ...
├── WIDER_val
│ ├── images
│ │ ├── 0--Parade
│ │ │ ├── 0_Parade_marchingband_1_1004.jpg
│ │ │ ├── 0_Parade_marchingband_1_1045.jpg
│ │ │ │ ...
│ │ ├── 10--People_Marching
│ │ │ ...
```
- 手动下载数据集:
要下载WIDER-FACE数据集,请运行以下命令:
```
cd dataset/wider_face && ./download.sh
```
- 自动下载数据集:
如果已经开始训练,但是数据集路径设置不正确或找不到路径, PaddleDetection会从[WIDER-FACE数据集](http://shuoyang1213.me/WIDERFACE/)自动下载它们,
下载后解压的数据集将缓存在`~/.cache/paddle/dataset/`中,并且之后的训练测试会自动加载它们。
#### 数据增强方法
- **尺度变换(Data-anchor-sampling):**
具体操作是:根据随机选择的人脸高和宽,获取到$v=\sqrt{width * height}$,之后再判断`v`的值范围,其中`v`值位于缩放区间`[16,32,64,128]`
假设`v=45`,则选定`32<v<64`, 以均匀分布的概率选取`[16,32,64]`中的任意一个值。若选中`64`,则该人脸的缩放区间在`[64 / 2, min(v * 2, 64 * 2)]`中选定。
- **其他方法:** 包括随机扰动、翻转、裁剪等。具体请参考[READER.md](../advanced_tutorials/READER.md)
### 训练与推理
训练流程与推理流程方法与其他算法一致,请参考[GETTING_STARTED_cn.md](../tutorials/GETTING_STARTED_cn.md)
**注意:**
- `BlazeFace``FaceBoxes`训练是以每卡`batch_size=8`在4卡GPU上进行训练(总`batch_size`是32),并且训练320000轮
(如果你的GPU数达不到4,请参考[学习率计算规则表](../tutorials/GETTING_STARTED_cn.html#faq))。
- 人脸检测模型目前我们不支持边训练边评估。
### 评估
目前我们支持在`WIDER FACE`数据集和`FDDB`数据集上评估。首先运行`tools/face_eval.py`生成评估结果文件,其次使用matlab(WIDER FACE)
或OpenCV(FDDB)计算具体的评估指标。
其中,运行`tools/face_eval.py`的参数列表如下:
- `-f` 或者 `--output_eval`: 评估生成的结果文件保存路径,默认是: `output/pred`
- `-e` 或者 `--eval_mode`: 评估模式,包括 `widerface``fddb`,默认是`widerface`
- `--multi_scale`: 如果在命令中添加此操作按钮,它将选择多尺度评估。默认值为`False`,它将选择单尺度评估。
#### 在WIDER-FACE数据集上评估
评估并生成结果文件:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/face_eval.py -c configs/face_detection/blazeface.yml \
-o weights=output/blazeface/model_final/ \
--eval_mode=widerface
```
评估完成后,将在`output/pred`中生成txt格式的测试结果。
- 下载官方评估脚本来评估AP指标:
```
wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
unzip eval_tools.zip && rm -f eval_tools.zip
```
-`eval_tools/wider_eval.m`中修改保存结果路径和绘制曲线的名称:
```
# Modify the folder name where the result is stored.
pred_dir = './pred';
# Modify the name of the curve to be drawn
legend_name = 'Fluid-BlazeFace';
```
- `wider_eval.m` 是评估模块的主要执行程序。运行命令如下:
```
matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
```
#### 在FDDB数据集上评估
我们提供了一套FDDB数据集的评估流程(目前仅支持Linux系统),其他具体细节请参考[FDDB官网](http://vis-www.cs.umass.edu/fddb/)
- 1)下载安装opencv:
下载OpenCV: 进入[OpenCV library](https://opencv.org/releases/)手动下载
安装OpenCV:请参考[OpenCV官方安装教程](https://docs.opencv.org/master/d7/d9f/tutorial_linux_install.html)通过源码安装。
- 2)下载数据集、评估代码以及格式化数据:
```
./dataset/fddb/download.sh
```
- 3)编译FDDB评估代码:
进入`dataset/fddb/evaluation`目录下,修改MakeFile文件中内容如下:
```
evaluate: $(OBJS)
$(CC) $(OBJS) -o $@ $(LIBS)
```
修改`common.hpp`中内容为如下形式:
```
#define __IMAGE_FORMAT__ ".jpg"
//#define __IMAGE_FORMAT__ ".ppm"
#define __CVLOADIMAGE_WORKING__
```
根据`grep -r "CV_RGB"`命令找到含有`CV_RGB`的代码段,将`CV_RGB`改成`Scalar`,并且在cpp中加入`using namespace cv;`
然后编译:
```
make clean && make
```
- 4)开始评估:
修改config文件中`dataset_dir``annotation`字段内容:
```
EvalReader:
...
dataset:
dataset_dir: dataset/fddb
anno_path: FDDB-folds/fddb_annotFile.txt
...
```
评估并生成结果文件:
```
python -u tools/face_eval.py -c configs/face_detection/blazeface.yml \
-o weights=output/blazeface/model_final/ \
--eval_mode=fddb
```
评估完成后,将在`output/pred/pred_fddb_res.txt`中生成txt格式的测试结果。
生成ContROC与DiscROC数据:
```
cd dataset/fddb/evaluation
./evaluate -a ./FDDB-folds/fddb_annotFile.txt \
-f 0 -i ./ -l ./FDDB-folds/filePath.txt -z .jpg \
-d {RESULT_FILE} \
-r {OUTPUT_DIR}
```
**注意:**
(1)`RESULT_FILE``tools/face_eval.py`输出的FDDB预测结果文件;
(2)`OUTPUT_DIR`是FDDB评估输出结果文件前缀,会生成两个文件`{OUTPUT_DIR}ContROC.txt``{OUTPUT_DIR}DiscROC.txt`
(3)参数用法及注释可通过执行`./evaluate --help`来获取。
## 算法细节
### BlazeFace
**简介:**
[BlazeFace](https://arxiv.org/abs/1907.05047) 是Google Research发布的人脸检测模型。它轻巧并且性能良好,
专为移动GPU推理量身定制。在旗舰设备上,速度可达到200-1000+FPS。
**特点:**
- 锚点策略在8×8(输入128x128)的特征图上停止,在该分辨率下每个像素点6个锚点;
- 5个单BlazeBlock和6个双BlazeBlock:5×5 depthwise卷积,可以保证在相同精度下网络层数更少;
- 用混合策略替换非极大值抑制算法,该策略将边界框的回归参数估计为重叠预测之间的加权平均值。
**版本信息:**
- 原始版本: 参考原始论文复现;
- Lite版本: 使用3x3卷积替换5x5卷积,更少的网络层数和通道数;
- NAS版本: 使用神经网络搜索算法构建网络结构,相比于`Lite`版本,NAS版本需要更少的网络层数和通道数。
### FaceBoxes
**简介:**
[FaceBoxes](https://arxiv.org/abs/1708.05234) 由Shifeng Zhang等人提出的高速和高准确率的人脸检测器,
被称为“高精度CPU实时人脸检测器”。 该论文收录于IJCB(2017)。
**特点:**
- 锚点策略分别在20x20、10x10、5x5(输入640x640)执行,每个像素点分别是3、1、1个锚点,对应密度系数是`1, 2, 4`(20x20)、4(10x10)、4(5x5);
- 在基础网络中个别block中使用CReLU和inception的结构;
- 使用密度先验盒(density_prior_box)可提高检测精度。
**版本信息:**
- 原始版本: 参考原始论文复现;
- Lite版本: 使用更少的网络层数和通道数,具体可参考[代码](https://github.com/PaddlePaddle/PaddleDetection/blob/master/ppdet/modeling/architectures/faceboxes.py)
## 如何贡献代码
我们非常欢迎您可以为PaddleDetection中的人脸检测模型提供代码,您可以提交PR供我们review;也十分感谢您的反馈,可以提交相应issue,我们会及时解答。
English | [简体中文](README.md)
English | [简体中文](FACE_DETECTION.md)
# FaceDetection
## Table of Contents
......@@ -15,9 +15,7 @@ English | [简体中文](README.md)
The goal of FaceDetection is to provide efficient and high-speed face detection solutions,
including cutting-edge and classic models.
<div align="center">
<img src="../../demo/output/12_Group_Group_12_Group_Group_12_935.jpg" />
</div>
![](../images/12_Group_Group_12_Group_Group_12_935.jpg)
## Benchmark and Model Zoo
PaddleDetection Supported architectures is shown in the below table, please refer to
......@@ -48,7 +46,7 @@ optimized network structure.
**NOTES:**
- Get mAP in `Easy/Medium/Hard Set` by multi-scale evaluation in `tools/face_eval.py`.
For details can refer to [Evaluation](#Evaluate-on-the-WIDER-FACE).
- BlazeFace-Lite Training and Testing ues [blazeface.yml](../../configs/face_detection/blazeface.yml)
- BlazeFace-Lite Training and Testing ues [blazeface.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/face_detection/blazeface.yml)
configs file and set `lite_edition: true`.
#### mAP in FDDB
......@@ -147,14 +145,14 @@ according to the randomly selected face height and width, and judge the value of
of uniform distribution. If `64` is selected, the face's interval is selected in `[64 / 2, min(v * 2, 64 * 2)]`.
- **Other methods:** Including `RandomDistort`,`ExpandImage`,`RandomInterpImage`,`RandomFlipImage` etc.
Please refer to [DATA.md](../../docs/DATA.md#APIs) for details.
Please refer to [READER.md](../advanced_tutorials/READER.md) for details.
### Training and Inference
`Training` and `Inference` please refer to [GETTING_STARTED.md](../../docs/GETTING_STARTED.md)
`Training` and `Inference` please refer to [GETTING_STARTED.md](../tutorials/GETTING_STARTED.md)
**NOTES:**
- `BlazeFace` and `FaceBoxes` is trained in 4 GPU with `batch_size=8` per gpu (total batch size as 32)
and trained 320000 iters.(If your GPU count is not 4, please refer to the rule of training parameters
in the table of [calculation rules](../../docs/GETTING_STARTED.md#faq)).
in the table of [calculation rules](../tutorials/GETTING_STARTED.html#faq)).
- Currently we do not support evaluation in training.
### Evaluation
......
......@@ -2,16 +2,12 @@
## 简介
CascadeCA RCNN是百度视觉技术部在Google AI Open Images 2019-Object Detction比赛中的最佳单模型,该单模型助力团队在500多参数队伍中取得第二名。Open Images Dataset V5(OIDV5)包含500个类别、173W训练图像和超过1400W个标注边框,是目前已知规模最大的目标检测公开数据集,数据集地址:[https://storage.googleapis.com/openimages/web/index.html](https://storage.googleapis.com/openimages/web/index.html)。团队在比赛中的技术方案报告地址:[https://arxiv.org/pdf/1911.07171.pdf](https://arxiv.org/pdf/1911.07171.pdf)
<div align="center">
<img src="../demo/oidv5_gt.png"/>
</div>
![](../images/oidv5_gt.png)
## 方法描述
该模型结合了当前较优的检测方法。具体地,它将ResNet200-vd作为检测模型的骨干网络,其imagenet分类预训练模型可以在[这里](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/README_en.md)下载;结合了CascadeCA RCNN、Feature Pyramid Networks、Non-local、Deformable V2等方法。在这里需要注意的是,标准的CascadeRCNN是只预测2个框(前景和背景,使用得分信息去判断最终前景所属的类别),而该模型对每个类别都单独预测了一个框(Cascade Class Aware)。最终模型框图如下图所示。
<div align="center">
<img src="../demo/oidv5_model_framework.png"/>
</div>
![](../images/oidv5_model_framework.png)
由于OIDV5的类别不均衡现象比较严重,在训练时采用了动态采样的策略去选择样本并进行训练;多尺度训练被用于解决边框面积范围太大的情况;此外,团队使用Libra loss替代Smooth L1 loss,来计算预测框的loss;在预测时,使用SoftNMS方法进行后处理,保证更多的框可以被召回。
......@@ -37,13 +33,13 @@ COCO和Objects365 Dataset数据格式相同,目前只支持预测和评估。
## 使用方法
OIDV5数据集格式与COCO不同,目前仅支持单张图片的预测。OIDV5的模型评估方法可以参考[这里](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/challenge_evaluation.md)
OIDV5数据集格式与COCO不同,目前仅支持单张图片的预测。OIDV5的模型评估方法可以参考[文档](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/challenge_evaluation.md.md)
1. 下载模型并解压。
2. 运行预测程序。
```
```bash
python -u tools/infer.py -c configs/oidv5/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml -o weights=./oidv5_cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/ --infer_img=demo/000000570688.jpg
```
......@@ -53,6 +49,4 @@ python -u tools/infer.py -c configs/oidv5/cascade_rcnn_cls_aware_r200_vd_fpn_dcn
## 模型检测效果
<div align="center">
<img src="../demo/oidv5_pred.jpg"/>
</div>
![](../images/oidv5_pred.jpg)
......@@ -24,7 +24,7 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python tools/train.py -c configs/dcn/yolov3_r50vd_dcn.yml
```
更多模型参数请使用``python tools/train.py --help``查看,或参考[训练、评估及参数说明](docs/GETTING_STARTED_cn.md)文档
更多模型参数请使用``python tools/train.py --help``查看,或参考[训练、评估及参数说明](../tutorials/GETTING_STARTED_cn.md)文档
### 模型效果
......
特色模型
===========================================
.. toctree::
:maxdepth: 2
FACE_DETECTION.md
YOLOv3_ENHANCEMENT.md
CACascadeRCNN.md
OIDV5_BASELINE_MODEL.md
CONTRIB_cn.md
欢迎使用 PaddleDetection!
===========================================
.. toctree::
:maxdepth: 2
tutorials/index
advanced_tutorials/index
featured_model/index
MODEL_ZOO_cn.md
CHANGELOG.md
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
set SPHINXPROJ=PaddleDetection
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
:end
popd
sphinx
recommonmark
sphinx_markdown_tables
sphinx_rtd_theme
......@@ -123,7 +123,7 @@ Also, please note mixed precision training currently requires changing `norm_typ
-d dataset/coco
```
The path of model to be evaluted can be both local path and link in [MODEL_ZOO](MODEL_ZOO_cn.md).
The path of model to be evaluted can be both local path and link in [MODEL_ZOO](../MODEL_ZOO_cn.md).
- Evaluate with json
......@@ -195,5 +195,4 @@ batch size could reach 4 per GPU (Tesla V100 16GB).
**Q:** How to change data preprocessing? </br>
**A:** Set `sample_transform` in configuration. Note that **the whole transforms** need to be added in configuration.
For example, `DecodeImage`, `NormalizeImage` and `Permute` in RCNN models. For detail description, please refer
to [config_example](config_example).
For example, `DecodeImage`, `NormalizeImage` and `Permute` in RCNN models.
# 开始
# 入门使用
关于配置运行环境,请参考[安装指南](INSTALL_cn.md)
......@@ -72,7 +72,7 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/0000005
finetune_exclude_pretrained_params=['cls_score','bbox_pred']
```
详细说明请参考[Transfer Learning](TRANSFER_LEARNING_cn.md)
详细说明请参考[迁移学习文档](../advanced_tutorials/TRANSFER_LEARNING_cn.md)
- 使用Paddle OP组建的YOLOv3损失函数训练YOLOv3
......@@ -87,7 +87,7 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/0000005
Paddle OP组建YOLOv3损失函数代码位于`ppdet/modeling/losses/yolo_loss.py`
#### 提示
**提示:**
- `CUDA_VISIBLE_DEVICES` 参数可以指定不同的GPU。例如: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU计算规则可以参考 [FAQ](#faq)
- 若本地未找到数据集,将自动下载数据集并保存在`~/.cache/paddle/dataset`中。
......@@ -121,7 +121,7 @@ python -m paddle.distributed.launch --selected_gpus 0,1,2,3,4,5,6,7 tools/train.
-d dataset/coco
```
评估模型可以为本地路径,例如`output/faster_rcnn_r50_1x/model_final/`, 也可以[MODEL_ZOO](MODEL_ZOO_cn.md)中给出的模型链接。
评估模型可以为本地路径,例如`output/faster_rcnn_r50_1x/model_final/`, 也可以[MODEL_ZOO](../MODEL_ZOO_cn.md)中给出的模型链接。
- 通过json文件评估
......@@ -134,7 +134,7 @@ python -m paddle.distributed.launch --selected_gpus 0,1,2,3,4,5,6,7 tools/train.
json文件必须命名为bbox.json或者mask.json,放在`evaluation/`目录下。
#### 提示
**提示:**
- R-CNN和SSD模型目前暂不支持多GPU评估,将在后续版本支持
......@@ -178,7 +178,7 @@ batch size可以达到每GPU 4 (Tesla V100 16GB)。
**Q:** 如何修改数据预处理? </br>
**A:** 可在配置文件中设置 `sample_transform`。注意需要在配置文件中加入**完整预处理**
例如RCNN模型中`DecodeImage`, `NormalizeImage` and `Permute`更多详细描述请参考[配置案例](config_example)
例如RCNN模型中`DecodeImage`, `NormalizeImage` and `Permute`
**Q:** affine_channel和batch norm是什么关系?
......
......@@ -17,7 +17,7 @@ English | [简体中文](INSTALL_cn.md)
This document covers how to install PaddleDetection, its dependencies
(including PaddlePaddle), together with COCO and Pascal VOC dataset.
For general information about PaddleDetection, please see [README.md](../README.md).
For general information about PaddleDetection, please see [README.md](https://github.com/PaddlePaddle/PaddleDetection/blob/master/).
## PaddlePaddle
......@@ -80,7 +80,7 @@ git clone https://github.com/PaddlePaddle/PaddleDetection.git
**Install Python dependencies:**
Required python packages are specified in [requirements.txt](../requirements.txt), and can be installed with:
Required python packages are specified in [requirements.txt](https://github.com/PaddlePaddle/PaddleDetection/blob/master/requirements.txt), and can be installed with:
```
pip install -r requirements.txt
......@@ -188,7 +188,7 @@ python dataset/voc/create_list.py
**NOTE:** If you set `use_default_label=False` in yaml configs, the `label_list.txt`
of Pascal VOC dataset will be read, otherwise, `label_list.txt` is unnecessary and
the default Pascal VOC label list which defined in
[voc\_loader.py](../ppdet/data/source/voc_loader.py) will be used.
[voc\_loader.py](https://github.com/PaddlePaddle/PaddleDetection/blob/master/ppdet/data/source/voc.py) will be used.
**Download datasets automatically:**
......@@ -200,4 +200,4 @@ will be cached in `~/.cache/paddle/dataset/` and can be discovered automatically
subsequently.
**NOTE:** For further informations on the datasets, please see [DATA.md](DATA.md)
**NOTE:** For further informations on the datasets, please see [READER.md](../advanced_tutorials/READER.md)
# 安装文档
# 安装说明
---
## 目录
- [简介](#introduction)
- [简介](#简介)
- [PaddlePaddle](#paddlepaddle)
- [其他依赖安装](#other-dependencies)
- [PaddleDetection](#paddle-detection)
- [数据集](#datasets)
- [其他依赖安装](#其他依赖安装)
- [PaddleDetection](#PaddleDetection)
- [数据集](#数据集)
## 简介
这份文档介绍了如何安装PaddleDetection及其依赖项(包括PaddlePaddle),以及COCO和Pascal VOC数据集。
PaddleDetection的相关信息,请参考[README.md](../README.md).
PaddleDetection的相关信息,请参考[README.md](https://github.com/PaddlePaddle/PaddleDetection/blob/master/).
## PaddlePaddle
......@@ -33,7 +33,7 @@ PaddleDetection的相关信息,请参考[README.md](../README.md).
python -c "import paddle; print(paddle.__version__)"
```
### 环境需求:
**环境需求:**
- Python2 or Python3 (windows系统仅支持Python3)
- CUDA >= 8.0
......@@ -76,7 +76,7 @@ git clone https://github.com/PaddlePaddle/PaddleDetection.git
**安装Python依赖库:**
Python依赖库在[requirements.txt](../requirements.txt)中给出,可通过如下命令安装:
Python依赖库在[requirements.txt](https://github.com/PaddlePaddle/PaddleDetection/blob/master/requirements.txt)中给出,可通过如下命令安装:
```
pip install -r requirements.txt
......@@ -185,7 +185,7 @@ python dataset/voc/create_list.py
**说明:** 如果你在yaml配置文件中设置`use_default_label=False`, 将从`label_list.txt`
中读取类别列表,反之则可以没有`label_list.txt`文件,检测库会使用Pascal VOC数据集的默
认类别列表,默认类别列表定义在[voc\_loader.py](../ppdet/data/source/voc_loader.py)
认类别列表,默认类别列表定义在[voc.py](https://github.com/PaddlePaddle/PaddleDetection/blob/master/ppdet/data/source/voc.py)
**自动下载数据集:**
......@@ -195,4 +195,4 @@ PaddleDetection将自动从[COCO-2017](http://images.cocodataset.org)或
`〜/.cache/paddle/dataset/`目录下,下次运行时,也可自动从该目录发现数据集。
**说明:** 更多有关数据集的介绍,请参考[DATA.md](DATA_cn.md)
**说明:** 更多有关数据集的介绍,请参考[数据处理文档](../advanced_tutorials/READER.md)
......@@ -6,7 +6,7 @@ This tutorial fine-tunes a tiny dataset by pretrained detection model for users
## Data Preparation
Dataset refers to [Kaggle](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection), which contains 240 images in train dataset and 60 images in test dataset. Data categories are apple, orange and banana. Download [here](https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar) and uncompress the dataset after download, script for data preparation is located at [download_fruit.py](../dataset/fruit/download_fruit.py). Command is as follows:
Dataset refers to [Kaggle](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection), which contains 240 images in train dataset and 60 images in test dataset. Data categories are apple, orange and banana. Download [here](https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar) and uncompress the dataset after download, script for data preparation is located at [download_fruit.py](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dataset/fruit/download_fruit.py). Command is as follows:
```bash
export PYTHONPATH=$PYTHONPATH:.
......@@ -26,7 +26,7 @@ Training:
python -u tools/train.py -c configs/yolov3_mobilenet_v1_fruit.yml \
--use_tb=True \
--tb_log_dir=tb_fruit_dir/scalar \
--eval
--eval
```
Use `yolov3_mobilenet_v1` to fine-tune the model from COCO dataset. Meanwhile, loss and mAP can be observed on tensorboard.
......@@ -37,9 +37,7 @@ tensorboard --logdir tb_fruit_dir/scalar/ --host <host_IP> --port <port_num>
Result on tensorboard is shown below:
<div align="center">
<img src="../demo/tensorboard_fruit.jpg" />
</div>
![tensorboard_fruit.jpg](../images/tensorboard_fruit.jpg)
Model can be downloaded [here](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_fruit.tar)
......@@ -59,9 +57,8 @@ python -u tools/infer.py -c configs/yolov3_mobilenet_v1_fruit.yml \
Inference images are shown below:
<p align="center">
<img src="../demo/orange_71.jpg" height=400 width=400 hspace='10'/>
<img src="../demo/orange_71_detection.jpg" height=400 width=400 hspace='10'/>
</p>
![orange_71.jpg](../../demo/orange_71.jpg)
![orange_71_detection.jpg](../images/orange_71_detection.jpg)
For detailed infomation of training and evalution, please refer to [GETTING_STARTED.md](GETTING_STARTED.md).
......@@ -6,13 +6,15 @@
## 数据准备
数据集参考[Kaggle数据集](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection),其中训练数据集240张图片,测试数据集60张图片,数据类别为3类:苹果,橘子,香蕉。[下载链接](https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar)。数据下载后分别解压即可, 数据准备脚本位于[download_fruit.py](../dataset/fruit/download_fruit.py)。下载数据方式如下:
数据集参考[Kaggle数据集](https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection),其中训练数据集240张图片,测试数据集60张图片,数据类别为3类:苹果,橘子,香蕉。[下载链接](https://dataset.bj.bcebos.com/PaddleDetection_demo/fruit-detection.tar)。数据下载后分别解压即可, 数据准备脚本位于[download_fruit.py](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dataset/fruit/download_fruit.py)。下载数据方式如下:
```bash
export PYTHONPATH=$PYTHONPATH:.
python dataset/fruit/download_fruit.py
```
## 开始训练
- **注:在开始前,运行如下命令并指定GPU**
```bash
......@@ -37,12 +39,13 @@ tensorboard --logdir tb_fruit_dir/scalar/ --host <host_IP> --port <port_num>
tensorboard结果显示如下:
<div align="center">
<img src="../demo/tensorboard_fruit.jpg" />
</div>
![](../images/tensorboard_fruit.jpg)
训练模型[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_fruit.tar)
## 评估预测
评估命令如下:
```bash
......@@ -59,9 +62,8 @@ python -u tools/infer.py -c configs/yolov3_mobilenet_v1_fruit.yml \
预测图片如下:
<p align="center">
<img src="../demo/orange_71.jpg" height=400 width=400 hspace='10'/>
<img src="../demo/orange_71_detection.jpg" height=400 width=400 hspace='10'/>
</p>
更多训练及评估流程,请参考[GETTING_STARTED_cn.md](GETTING_STARTED_cn.md).
![](../../demo/orange_71.jpg)
![](../images/orange_71_detection.jpg)
更多训练及评估流程,请参考[入门使用文档](GETTING_STARTED_cn.md)
初级使用教程
===========================================
.. toctree::
:maxdepth: 2
INSTALL_cn.md
QUICK_STARTED_cn.md
GETTING_STARTED_cn.md
# PaddleDetection C++预测部署方案
## 本文档结构
[1.说明](#1说明)
[2.主要目录和文件](#2主要目录和文件)
[3.编译](#3编译)
[4.预测并可视化结果](#4预测并可视化结果)
## 1.说明
本目录提供一个跨平台的图像检测模型的C++预测部署方案,用户通过一定的配置,加上少量的代码,即可把模型集成到自己的服务中,完成相应的图像检测任务。
主要设计的目标包括以下四点:
- 跨平台,支持在 Windows 和 Linux 完成编译、开发和部署
- 可扩展性,支持用户针对新模型开发自己特殊的数据预处理等逻辑
- 高性能,除了`PaddlePaddle`自身带来的性能优势,我们还针对图像检测的特点对关键步骤进行了性能优化
- 支持多种常见的图像检测模型,如YOLOv3, Faster-RCNN, Faster-RCNN+FPN,用户通过少量配置即可加载模型完成常见检测任务
## 2.主要目录和文件
```bash
deploy
├── detection_demo.cpp # 完成图像检测预测任务C++代码
├── conf
│ ├── detection_rcnn.yaml #示例faster rcnn 目标检测配置
│ └── detection_rcnn_fpn.yaml #示例faster rcnn + fpn目标检测配置
├── images
│ └── detection_rcnn # 示例faster rcnn + fpn目标检测测试图片目录
├── tools
│ └── vis.py # 示例图像检测结果可视化脚本
├── docs
│ ├── linux_build.md # Linux 编译指南
│ ├── windows_vs2015_build.md # windows VS2015编译指南
│ └── windows_vs2019_build.md # Windows VS2019编译指南
├── utils # 一些基础公共函数
├── preprocess # 数据预处理相关代码
├── predictor # 模型加载和预测相关代码
├── CMakeList.txt # cmake编译入口文件
└── external-cmake # 依赖的外部项目cmake(目前仅有yaml-cpp)
```
## 3.编译
支持在`Windows``Linux`平台编译和使用:
- [Linux 编译指南](./docs/linux_build.md)
- [Windows 使用 Visual Studio 2019 Community 编译指南](./docs/windows_vs2019_build.md)
- [Windows 使用 Visual Studio 2015 编译指南](./docs/windows_vs2015_build.md)
`Windows`上推荐使用最新的`Visual Studio 2019 Community`直接编译`CMake`项目。
## 4.预测并可视化结果
完成编译后,便生成了需要的可执行文件和链接库。这里以我们基于`faster rcnn`检测模型为例,介绍部署图像检测模型的通用流程。
### 4.1. 下载模型文件
我们提供faster rcnn,faster rcnn+fpn模型用于预测coco17数据集,可在以下链接下载:[faster rcnn示例模型下载地址](https://paddleseg.bj.bcebos.com/inference/faster_rcnn_pp50.zip)
[faster rcnn + fpn示例模型下载地址](https://paddleseg.bj.bcebos.com/inference/faster_rcnn_pp50_fpn.zip)
下载并解压,解压后目录结构如下:
```
faster_rcnn_pp50/
├── __model__ # 模型文件
└── __params__ # 参数文件
```
解压后把上述目录拷贝到合适的路径:
**假设**`Windows`系统上,我们模型和参数文件所在路径为`D:\projects\models\faster_rcnn_pp50`
**假设**`Linux`上对应的路径则为`/root/projects/models/faster_rcnn_pp50/`
### 4.2. 修改配置
`inference`源代码(即本目录)的`conf`目录下提供了示例基于faster rcnn的配置文件`detection_rcnn.yaml`, 相关的字段含义和说明如下:
```yaml
DEPLOY:
# 是否使用GPU预测
USE_GPU: 1
# 模型和参数文件所在目录路径
MODEL_PATH: "/root/projects/models/faster_rcnn_pp50"
# 模型文件名
MODEL_FILENAME: "__model__"
# 参数文件名
PARAMS_FILENAME: "__params__"
# 预测图片的标准输入,尺寸不一致会resize
EVAL_CROP_SIZE: (608, 608)
# resize方式,支持 UNPADDING和RANGE_SCALING
RESIZE_TYPE: "RANGE_SCALING"
# 短边对齐的长度,仅在RANGE_SCALING下有效
TARGET_SHORT_SIZE : 800
# 均值
MEAN: [0.4647, 0.4647, 0.4647]
# 方差
STD: [0.0834, 0.0834, 0.0834]
# 图片类型, rgb或者rgba
IMAGE_TYPE: "rgb"
# 像素分类数
NUM_CLASSES: 1
# 通道数
CHANNELS : 3
# 预处理器, 目前提供图像检测的通用处理类DetectionPreProcessor
PRE_PROCESSOR: "DetectionPreProcessor"
# 预测模式,支持 NATIVE 和 ANALYSIS
PREDICTOR_MODE: "ANALYSIS"
# 每次预测的 batch_size
BATCH_SIZE : 3
# 长边伸缩的最大长度,-1代表无限制。
RESIZE_MAX_SIZE: 1333
# 输入的tensor数量。
FEEDS_SIZE: 3
```
修改字段`MODEL_PATH`的值为你在**上一步**下载并解压的模型文件所放置的目录即可。更多配置文件字段介绍,请参考文档[预测部署方案配置文件说明](./docs/configuration.md)
**注意**在使用CPU版本预测库时,`USE_GPU`的值必须设为0,否则无法正常预测。
### 4.3. 执行预测
在终端中切换到生成的可执行文件所在目录为当前目录(Windows系统为`cmd`)。
`Linux` 系统中执行以下命令:
```shell
./detection_demo --conf=conf/detection_rcnn.yaml --input_dir=images/detection_rcnn
```
`Windows` 中执行以下命令:
```shell
.\detection_demo.exe --conf=conf\detection_rcnn.yaml --input_dir=images\detection_rcnn\
```
预测使用的两个命令参数说明如下:
| 参数 | 含义 |
|-------|----------|
| conf | 模型配置的Yaml文件路径 |
| input_dir | 需要预测的图片目录 |
·
配置文件说明请参考上一步,样例程序会扫描input_dir目录下的所有图片,并为每一张图片生成对应的预测结果,输出到屏幕,并在`X`同一目录下保存到`X.pb文件`(X为对应图片的文件名)。可使用工具脚本vis.py将检测结果可视化。
**检测结果可视化**
运行可视化脚本时,只需输入命令行参数图片路径、检测结果pb文件路径、目标框阈值以及类别-标签映射文件路径即可得到可视化的图片`X.png` (tools目录下提供coco17的类别标签映射文件coco17.json)。
```bash
python vis.py --img_path=../build/images/detection_rcnn/000000087038.jpg --img_result_path=../build/images/detection_rcnn/000000087038.jpg.pb --threshold=0.1 --c2l_path=coco17.json
```
检测结果(每个图片的结果用空行隔开)
```原图:```
![原图](./demo_images/000000087038.jpg)
```检测结果图:```
![检测结果](./demo_images/000000087038.jpg.png)
**文档教程请参考:** [PaddleDetection C++预测部署方案](../docs/advanced_tutorials/inference/DEPLOYMENT.md) <br/>
**English document please refer:** [PaddleDetection C++ deployment](../docs/advanced_tutorials/inference/DEPLOYMENT.md)
......@@ -250,7 +250,6 @@ class Reader(object):
self._pos = -1
self._epoch = -1
self._drained = False
# multi-process
self._worker_num = worker_num
......@@ -294,7 +293,6 @@ class Reader(object):
self._epoch += 1
self._pos = 0
self._drained = False
def __next__(self):
return self.next()
......
......@@ -145,7 +145,6 @@ class COCODataSet(DataSet):
'gt_bbox': gt_bbox,
'gt_score': gt_score,
'gt_poly': gt_poly,
'difficult': difficult
}
logger.debug('Load file: {}, im_id: {}, h: {}, w: {}.'.format(
......
......@@ -82,9 +82,6 @@ class DataSet(object):
def get_imid2path(self):
return self._imid2path
def get_cname2cid(self):
return self.cname2cid
def _is_valid_file(f, extensions=('.jpg', '.jpeg', '.png', '.bmp')):
return f.lower().endswith(extensions)
......
......@@ -20,6 +20,8 @@ import xml.etree.ElementTree as ET
from ppdet.core.workspace import register, serializable
from .dataset import DataSet
import logging
logger = logging.getLogger(__name__)
@register
......@@ -66,8 +68,9 @@ class VOCDataSet(DataSet):
# 'w': im_w, # width
# 'is_crowd': is_crowd,
# 'gt_class': gt_class,
# 'gt_score': gt_score,
# 'gt_bbox': gt_bbox,
# 'gt_poly': gt_poly,
# 'difficult': difficult
# }
self.roidbs = None
# 'cname2id' is a dict to map category name to class id
......@@ -147,7 +150,6 @@ class VOCDataSet(DataSet):
'gt_class': gt_class,
'gt_score': gt_score,
'gt_bbox': gt_bbox,
'gt_poly': [],
'difficult': difficult
}
if len(objs) != 0:
......@@ -158,6 +160,7 @@ class VOCDataSet(DataSet):
break
assert len(records) > 0, 'not found any voc record in %s' % (
self.anno_path)
logger.info('{} samples in file {}'.format(ct, anno_path))
self.roidbs, self.cname2cid = records, cname2cid
......
>运行该示例前请安装PaddleSlim和Paddle1.6或更高版本
# 检测模型蒸馏示例
# 模型蒸馏教程
## 概述
......@@ -89,7 +89,7 @@ yolo_output_names = [
## 训练
根据[PaddleDetection/tools/train.py](../../tools/train.py)编写压缩脚本`distill.py`
根据[PaddleDetection/tools/train.py](https://github.com/PaddlePaddle/PaddleDetection/blob/master/tools/train.py)编写压缩脚本`distill.py`
在该脚本中定义了teacher_model和student_model,用teacher_model的输出指导student_model的训练
### 执行示例
......@@ -178,7 +178,7 @@ python -u slim/distillation/distill.py \
## 评估
每隔`snap_shot_iter`步后会保存一个checkpoint模型可以用于评估,使用PaddleDetection目录下[tools/eval.py](../../tools/eval.py)评估脚本,并指定`weights`为训练得到的模型路径
每隔`snap_shot_iter`步后会保存一个checkpoint模型可以用于评估,使用PaddleDetection目录下[tools/eval.py](https://github.com/PaddlePaddle/PaddleDetection/blob/master/tools/eval.py)评估脚本,并指定`weights`为训练得到的模型路径
运行命令为:
```bash
......@@ -197,7 +197,7 @@ python -u tools/eval.py -c configs/yolov3_mobilenet_v1.yml \
## 预测
每隔`snap_shot_iter`步后保存的checkpoint模型也可以用于预测,使用PaddleDetection目录下[tools/infer.py](../../tools/infer.py)评估脚本,并指定`weights`为训练得到的模型路径
每隔`snap_shot_iter`步后保存的checkpoint模型也可以用于预测,使用PaddleDetection目录下[tools/infer.py](https://github.com/PaddlePaddle/PaddleDetection/blob/master/tools/infer.py)评估脚本,并指定`weights`为训练得到的模型路径
### Python预测
......
此差异已折叠。
# 卷积层通道剪裁教程
请确保已正确[安装PaddleDetection](../../docs/INSTALL_cn.md)及其依赖。
请确保已正确[安装PaddleDetection](../../docs/tutorials/INSTALL_cn.md)及其依赖。
该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)的卷积通道剪裁接口对检测库中的模型的卷积层的通道数进行剪裁。
......@@ -12,7 +12,7 @@
## 1. 数据准备
请参考检测库[数据下载](../../docs/INSTALL_cn.md)文档准备数据。
请参考检测库[数据下载](../../docs/tutorials/INSTALL_cn.md)文档准备数据。
## 2. 模型选择
......
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册