未验证 提交 9d31d6e3 编写于 作者: W wangguanzhong 提交者: GitHub

Update PaddleDetection (#3787)

* update release/1.6 from develop

* update api

* fix config for obj365 model
上级 5e353a87
......@@ -56,3 +56,9 @@ coverage.xml
/docs/_build/
*.json
dataset/coco/annotations
dataset/coco/train2017
dataset/coco/val2017
dataset/voc/VOCdevkit
......@@ -32,97 +32,102 @@ changes.
- Performance Optimized:
With the help of the underlying PaddlePaddle framework, faster training and
reduced GPU memory footprint is achieved. Notably, Yolo V3 training is
reduced GPU memory footprint is achieved. Notably, YOLOv3 training is
much faster compared to other frameworks. Another example is Mask-RCNN
(ResNet50), we managed to fit up to 4 images per GPU (Tesla V100 16GB) during
multi-GPU training.
Supported Architectures:
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG |
|--------------------|:------:|------------------------------:|:----------:|:-----:|:---------:|:-------:|:---:|
| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Cascade R-CNN | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Yolov3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG |
| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: | :-----: | :--: |
| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| YOLOv3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost.
Advanced Features:
- [x] **Synchronized Batch Norm**: currently used by Yolo V3.
- [x] **Group Norm**: pretrained models to be released.
- [x] **Modulated Deformable Convolution**: pretrained models to be released.
- [x] **Deformable PSRoI Pooling**: pretrained models to be released.
- [x] **Synchronized Batch Norm**: currently used by YOLOv3.
- [x] **Group Norm**
- [x] **Modulated Deformable Convolution**
- [x] **Deformable PSRoI Pooling**
**NOTE:** Synchronized batch normalization can only be used on multiple GPU devices, can not be used on CPU devices or single GPU device.
## Get Started
## Model zoo
Pretrained models are available in the PaddlePaddle [PaddleDetection model zoo](docs/MODEL_ZOO.md).
- [Installation guide](docs/INSTALL.md)
- [Quick start on small dataset](docs/QUICK_STARTED.md)
- [Guide to traing, evaluate and arguments description](docs/GETTING_STARTED.md)
- [Guide to preprocess pipeline and custom dataset](docs/DATA.md)
- [Introduction to the configuration workflow](docs/CONFIG.md)
- [Examples for detailed configuration explanation](docs/config_example/)
- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
- [Transfer learning document](docs/TRANSFER_LEARNING.md)
## Installation
## Model Zoo
Please follow the [installation guide](docs/INSTALL.md).
- Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md).
- [Face detection models](configs/face_detection/README.md)
- [Pretrained models for pedestrian and vehicle detection](contrib/README.md)
## Model compression
## Get Started
- [ Quantification aware training example](slim/quantization)
- [ Pruning compression example](slim/prune)
For inference, simply run the following command and the visualized result will
be saved in `output`.
## Depoly
```bash
export PYTHONPATH=`pwd`:$PYTHONPATH
python tools/infer.py -c configs/mask_rcnn_r50_1x.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_1x.tar \
--infer_img=demo/000000570688.jpg
```
- [Export model for inference depolyment](docs/EXPORT_MODEL.md)
- [C++ inference depolyment](inference/README.md)
For detailed training and evaluation workflow, please refer to [GETTING_STARTED.md](docs/GETTING_STARTED.md).
## Benchmark
For detailed configuration and parameter description, please refer to [Complete config files](docs/config_example/)
- [Inference benchmark](docs/BENCHMARK_INFER_cn.md)
We also recommend users to take a look at the [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
Further information can be found in these documentations:
## Updates
- [Introduction to the configuration workflow.](docs/CONFIG.md)
- [Guide to custom dataset and preprocess pipeline.](docs/DATA.md)
#### 10/2019
- Face detection models included: BlazeFace, Faceboxes.
- Enrich COCO models, box mAP up to 51.9%.
- Add CACacascade RCNN, one of the best single model of Objects365 2019 challenge Full Track champion.
- Add pretrained models for pedestrian and vehicle detection.
- Support mixed-precision training.
- Add C++ inference depolyment.
- Add model compression examples.
## Todo List
#### 2/9/2019
Please note this is a work in progress, substantial changes may come in the
near future.
Some of the planned features include:
- Add retrained models for GroupNorm.
- [ ] Mixed precision training.
- [ ] Distributed training.
- [ ] Inference in 8-bit mode.
- [ ] User defined operations.
- [ ] Larger model zoo.
- Add Cascade-Mask-RCNN+FPN.
#### 5/8/2019
## Updates
- Add a series of models ralated modulated Deformable Convolution.
#### 7/29/2019
- Update Chinese docs for PaddleDetection
- Fix bug in R-CNN models when train and test at the same time
- Add ResNext101-vd + Mask R-CNN + FPN models
- Add Yolo v3 on VOC models
- Add YOLOv3 on VOC models
#### 7/3/2019
- Initial release of PaddleDetection and detection model zoo
- Models included: Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, Yolo v3, and SSD.
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, and SSD.
## Contributing
......
......@@ -2,7 +2,7 @@
# PaddleDetection
PaddleDetection的目的是为工业界和学术界提供大量易使用的目标检测模型。PaddleDetection不仅性能完善,易于部署,同时能够灵活的满足算法研发需求。
PaddleDetection的目的是为工业界和学术界提供丰富、易用的目标检测模型。不仅性能优越、易于部署,而且能够灵活的满足算法研究的需求。
**目前检测库下模型均要求使用PaddlePaddle 1.6及以上版本或适当的develop版本。**
......@@ -17,15 +17,15 @@ PaddleDetection的目的是为工业界和学术界提供大量易使用的目
- 易部署:
PaddleDetection的模型中使用的主要算子均通过C++和CUDA实现,配合PaddlePaddle的高性能预测引擎,使得在服务器环境下易于部署
PaddleDetection的模型中使用的核心算子均通过C++或CUDA实现,同时基于PaddlePaddle的高性能推理引擎可以方便地部署在多种硬件平台上
- 高灵活度:
PaddleDetection各个组件均为功能单元。例如,模型结构,数据预处理流程,用户能够通过修改配置文件轻松实现可定制化
PaddleDetection通过模块化设计来解耦各个组件,基于配置文件可以轻松地搭建各种检测模型
- 高性能:
在PaddlePaddle底层框架的帮助下,实现了更快的模型训练及更少的显存占用量。值得注意的是,Yolo v3的训练速度远快于其他框架。另外,Mask-RCNN(ResNet50)可以在Tesla V100 16GB环境下以每个GPU4张图片输入实现多卡训练
基于PaddlePaddle框架的高性能内核,在模型训练速度、显存占用上有一定的优势。例如,YOLOv3的训练速度快于其他框架,在Tesla V100 16GB环境下,Mask-RCNN(ResNet50)可以单卡Batch Size可以达到4 (甚至到5)
支持的模型结构:
......@@ -35,75 +35,89 @@ PaddleDetection的目的是为工业界和学术界提供大量易使用的目
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Cascade R-CNN | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Yolov3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ |
| Cascade Faster-CNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| Cascade Mask-CNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ |
| YOLOv3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) 模型提供了较大的精度提高和较少的性能损失。
扩展特性:
- [x] **Synchronized Batch Norm**: 目前在Yolo v3中使用。
- [x] **Group Norm**: 预训练模型待发布。
- [x] **Modulated Deformable Convolution**: 预训练模型待发布。
- [x] **Deformable PSRoI Pooling**: 预训练模型待发布。
- [x] **Synchronized Batch Norm**: 目前在YOLOv3中使用。
- [x] **Group Norm**
- [x] **Modulated Deformable Convolution**
- [x] **Deformable PSRoI Pooling**
**注意:** Synchronized batch normalization 只能在多GPU环境下使用,不能在CPU环境或者单GPU环境下使用。
## 模型库
基于PaddlePaddle训练的目标检测模型可参考[PaddleDetection模型库](docs/MODEL_ZOO_cn.md).
## 使用教程
## 安装
- [安装说明](docs/INSTALL_cn.md)
- [快速开始](docs/QUICK_STARTED_cn.md)
- [训练、评估及参数说明](docs/GETTING_STARTED_cn.md)
- [数据预处理及自定义数据集](docs/DATA_cn.md)
- [配置模块设计和介绍](docs/CONFIG_cn.md)
- [详细的配置信息和参数说明示例](docs/config_example/)
- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
- [迁移学习教程](docs/TRANSFER_LEARNING_cn.md)
请参考[安装说明文档](docs/INSTALL_cn.md).
## 模型库
- [模型库](docs/MODEL_ZOO_cn.md)
- [人脸检测模型](configs/face_detection/README.md)
- [行人检测和车辆检测预训练模型](contrib/README_cn.md)
## 开始
## 快速入门
## 模型压缩
- [量化训练压缩示例](slim/quantization)
- [剪枝压缩示例](slim/prune)
PaddleDetection提供了快速开始的demo利于用户能够快速上手,示例请参考[QUICK_STARTED_cn.md](docs/QUICK_STARTED_cn.md)
## 推理部署
更多训练及评估流程,请参考[GETTING_STARTED_cn.md](docs/GETTING_STARTED_cn.md).
- [模型导出教程](docs/EXPORT_MODEL.md)
- [C++推理部署](inference/README.md)
详细的配置信息和参数说明,请参考[示例配置文件](docs/config_example/).
## Benchmark
同时推荐用户参考[IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
- [推理Benchmark](docs/BENCHMARK_INFER_cn.md)
其他更多信息可参考以下文档内容:
- [配置流程介绍](docs/CONFIG_cn.md)
- [自定义数据集和预处理流程介绍](docs/DATA_cn.md)
## 版本更新
## 未来规划
### 10/2019
目前PaddleDetection处在持续更新的状态,接下来将会推出一系列的更新,包括如下特性:
- 增加人脸检测模型BlazeFace、Faceboxes。
- 丰富基于COCO的模型,精度高达51.9%。
- 增加Objects365 2019 Challenge上夺冠的最佳单模型之一CACascade-RCNN。
- 增加行人检测和车辆检测预训练模型。
- 支持FP16训练。
- 增加跨平台的C++推理部署方案。
- 增加模型压缩示例。
- [ ] 混合精度训练
- [ ] 分布式训练
- [ ] Int8模式预测
- [ ] 用户自定义算子
- [ ] 进一步丰富模型库
### 2/9/2019
- 增加GroupNorm模型。
- 增加CascadeRCNN+Mask模型。
## 版本更新
#### 5/8/2019
- 增加Modulated Deformable Convolution系列模型。
#### 7/22/2019
- 增加检测库中文文档
- 修复R-CNN系列模型训练同时进行评估的问题
- 新增ResNext101-vd + Mask R-CNN + FPN模型
- 新增基于VOC数据集的Yolo v3模型
- 新增基于VOC数据集的YOLOv3模型
#### 7/3/2019
- 首次发布PaddleDetection检测库和检测模型库
- 模型包括:Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, Yolo v3, 和SSD.
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD.
## 如何贡献代码
......
architecture: CascadeRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 90000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
weights: output/cascade_rcnn_r50_fpn_1x/model_final
metric: COCO
num_classes: 81
CascadeRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: CascadeBBoxHead
bbox_assigner: CascadeBBoxAssigner
ResNet:
norm_type: affine_channel
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
variant: b
FPN:
min_level: 2
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 7
sampling_ratio: 2
CascadeBBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [10, 20, 30]
bg_thresh_lo: [0.0, 0.0, 0.0]
bg_thresh_hi: [0.5, 0.6, 0.7]
fg_thresh: [0.5, 0.6, 0.7]
fg_fraction: 0.25
CascadeBBoxHead:
head: CascadeTwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
CascadeTwoFCHead:
mlp_dim: 1024
MultiScaleTEST:
score_thresh: 0.05
nms_thresh: 0.5
detections_per_im: 100
enable_voting: true
vote_thresh: 0.9
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
batch_size: 2
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_train2017.json
image_dir: train2017
batch_transforms:
- !PadBatch
pad_to_stride: 32
drop_last: false
num_workers: 2
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
sample_transforms:
- !DecodeImage
to_rgb: true
- !NormalizeImage
is_channel_first: false
is_scale: true
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
- !MultiscaleTestResize
origin_target_size: 800
origin_max_size: 1333
target_size:
- 400
- 500
- 600
- 700
- 900
- 1000
- 1100
- 1200
max_size: 2000
use_flip: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadMSTest
pad_to_stride: 32
num_scale: 18
num_workers: 2
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: dataset/coco/annotations/instances_val2017.json
batch_transforms:
- !PadBatch
pad_to_stride: 32
drop_last: false
num_workers: 2
architecture: CascadeMaskRCNN
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
max_iters: 300000
snapshot_iter: 10
use_gpu: true
log_iter: 20
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_caffe_pretrained.tar
weights: output/cascade_mask_rcnn_dcn_se154_vd_fpn_gn_s1x/model_final/
metric: COCO
num_classes: 81
CascadeMaskRCNN:
backbone: SENet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: CascadeBBoxHead
bbox_assigner: CascadeBBoxAssigner
mask_assigner: MaskAssigner
mask_head: MaskHead
SENet:
depth: 152
feature_maps: [2, 3, 4, 5]
freeze_at: 2
group_width: 4
groups: 64
norm_type: bn
freeze_norm: True
variant: d
dcn_v2_stages: [3, 4, 5]
std_senet: True
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
freeze_norm: False
norm_type: gn
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
mask_resolution: 14
MaskHead:
dilation: 1
conv_dim: 256
num_convs: 4
resolution: 28
norm_type: gn
CascadeBBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [10, 20, 30]
bg_thresh_hi: [0.5, 0.6, 0.7]
bg_thresh_lo: [0.0, 0.0, 0.0]
fg_fraction: 0.25
fg_thresh: [0.5, 0.6, 0.7]
MaskAssigner:
resolution: 28
CascadeBBoxHead:
head: CascadeXConvNormHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
CascadeXConvNormHead:
norm_type: gn
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [240000, 280000]
- !LinearWarmup
start_factor: 0.01
steps: 2000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
MaskRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
sample_transforms:
- !DecodeImage
to_rgb: False
with_mixup: False
- !RandomFlipImage
is_mask_flip: true
is_normalized: false
prob: 0.5
- !NormalizeImage
is_channel_first: false
is_scale: False
mean:
- 102.9801
- 115.9465
- 122.7717
std:
- 1.0
- 1.0
- 1.0
- !ResizeImage
interp: 1
target_size:
- 416
- 448
- 480
- 512
- 544
- 576
- 608
- 640
- 672
- 704
- 736
- 768
- 800
- 832
- 864
- 896
- 928
- 960
- 992
- 1024
- 1056
- 1088
- 1120
- 1152
- 1184
- 1216
- 1248
- 1280
- 1312
- 1344
- 1376
- 1408
max_size: 1600
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 8
MaskRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017.json
image_dir: val2017
sample_transforms:
- !DecodeImage
to_rgb: False
with_mixup: False
- !NormalizeImage
is_channel_first: false
is_scale: False
mean:
- 102.9801
- 115.9465
- 122.7717
std:
- 1.0
- 1.0
- 1.0
- !ResizeImage
interp: 1
target_size:
- 800
max_size: 1333
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
MaskRCNNTestFeed:
batch_size: 1
dataset:
annotation: dataset/coco/annotations/instances_val2017.json
sample_transforms:
- !DecodeImage
to_rgb: False
with_mixup: False
- !NormalizeImage
is_channel_first: false
is_scale: False
mean:
- 102.9801
- 115.9465
- 122.7717
std:
- 1.0
- 1.0
- 1.0
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
architecture: CascadeMaskRCNN
train_feed: MaskRCNNTrainFeed
eval_feed: MaskRCNNEvalFeed
test_feed: MaskRCNNTestFeed
max_iters: 300000
snapshot_iter: 10000
use_gpu: true
log_iter: 20
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_caffe_pretrained.tar
weights: output/cascade_mask_rcnn_dcn_se154_vd_fpn_gn_s1x/model_final/
metric: COCO
num_classes: 81
CascadeMaskRCNN:
backbone: SENet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: CascadeBBoxHead
bbox_assigner: CascadeBBoxAssigner
mask_assigner: MaskAssigner
mask_head: MaskHead
SENet:
depth: 152
feature_maps: [2, 3, 4, 5]
freeze_at: 2
group_width: 4
groups: 64
norm_type: bn
freeze_norm: True
variant: d
dcn_v2_stages: [3, 4, 5]
std_senet: True
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
freeze_norm: False
norm_type: gn
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
mask_resolution: 14
MaskHead:
dilation: 1
conv_dim: 256
num_convs: 4
resolution: 28
norm_type: gn
CascadeBBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [10, 20, 30]
bg_thresh_hi: [0.5, 0.6, 0.7]
bg_thresh_lo: [0.0, 0.0, 0.0]
fg_fraction: 0.25
fg_thresh: [0.5, 0.6, 0.7]
MaskAssigner:
resolution: 28
CascadeBBoxHead:
head: CascadeXConvNormHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
CascadeXConvNormHead:
norm_type: gn
MultiScaleTEST:
score_thresh: 0.05
nms_thresh: 0.5
detections_per_im: 100
enable_voting: true
vote_thresh: 0.9
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [240000, 280000]
- !LinearWarmup
start_factor: 0.01
steps: 2000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
MaskRCNNTrainFeed:
# batch size per device
batch_size: 1
dataset:
dataset_dir: dataset/coco
image_dir: train2017
annotation: annotations/instances_train2017.json
sample_transforms:
- !DecodeImage
to_rgb: False
with_mixup: False
- !RandomFlipImage
is_mask_flip: true
is_normalized: false
prob: 0.5
- !NormalizeImage
is_channel_first: false
is_scale: False
mean:
- 102.9801
- 115.9465
- 122.7717
std:
- 1.0
- 1.0
- 1.0
- !ResizeImage
interp: 1
target_size:
- 416
- 448
- 480
- 512
- 544
- 576
- 608
- 640
- 672
- 704
- 736
- 768
- 800
- 832
- 864
- 896
- 928
- 960
- 992
- 1024
- 1056
- 1088
- 1120
- 1152
- 1184
- 1216
- 1248
- 1280
- 1312
- 1344
- 1376
- 1408
max_size: 1600
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 8
MaskRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/coco
annotation: annotations/instances_val2017_debug_139.json
image_dir: val2017
sample_transforms:
- !DecodeImage
to_rgb: False
- !NormalizeImage
is_channel_first: false
is_scale: False
mean:
- 102.9801
- 115.9465
- 122.7717
std:
- 1.0
- 1.0
- 1.0
- !MultiscaleTestResize
origin_target_size: 800
origin_max_size: 1333
target_size:
- 400
- 500
- 600
- 700
- 900
- 1000
- 1100
- 1200
max_size: 2000
use_flip: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadMSTest
pad_to_stride: 32
# num_scale = (len(target_size) + 1) * (1 + use_flip)
num_scale: 32
num_workers: 2
MaskRCNNTestFeed:
batch_size: 1
dataset:
annotation: dataset/coco/annotations/instances_val2017.json
sample_transforms:
- !DecodeImage
to_rgb: False
- !NormalizeImage
is_channel_first: false
is_scale: False
mean:
- 102.9801
- 115.9465
- 122.7717
std:
- 1.0
- 1.0
- 1.0
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 2
English | [简体中文](README_cn.md)
# FaceDetection
The goal of FaceDetection is to provide efficient and high-speed face detection solutions,
including cutting-edge and classic models.
<div align="center">
<img src="../../demo/output/12_Group_Group_12_Group_Group_12_935.jpg" />
</div>
## Data Pipline
We use the [WIDER FACE dataset](http://shuoyang1213.me/WIDERFACE/) to carry out the training
and testing of the model, the official website gives detailed data introduction.
- WIDER Face data source:
Loads `wider_face` type dataset with directory structures like this:
```
dataset/wider_face/
├── wider_face_split
│ ├── wider_face_train_bbx_gt.txt
│ ├── wider_face_val_bbx_gt.txt
├── WIDER_train
│ ├── images
│ │ ├── 0--Parade
│ │ │ ├── 0_Parade_marchingband_1_100.jpg
│ │ │ ├── 0_Parade_marchingband_1_381.jpg
│ │ │ │ ...
│ │ ├── 10--People_Marching
│ │ │ ...
├── WIDER_val
│ ├── images
│ │ ├── 0--Parade
│ │ │ ├── 0_Parade_marchingband_1_1004.jpg
│ │ │ ├── 0_Parade_marchingband_1_1045.jpg
│ │ │ │ ...
│ │ ├── 10--People_Marching
│ │ │ ...
```
- Download dataset manually:
To download the WIDER FACE dataset, run the following commands:
```
cd dataset/wider_face && ./download.sh
```
- Download dataset automatically:
If a training session is started but the dataset is not setup properly
(e.g, not found in dataset/wider_face), PaddleDetection can automatically
download them from [WIDER FACE dataset](http://shuoyang1213.me/WIDERFACE/),
the decompressed datasets will be cached in ~/.cache/paddle/dataset/ and can be discovered
automatically subsequently.
### Data Augmentation
- **Data-anchor-sampling:** Randomly transform the scale of the image to a certain range of scales,
greatly enhancing the scale change of the face. The specific operation is to obtain $v=\sqrt{width * height}$
according to the randomly selected face height and width, and judge the value of `v` in which interval of
`[16,32,64,128]`. Assuming `v=45` && `32<v<64`, and any value of `[16,32,64]` is selected with a probability
of uniform distribution. If `64` is selected, the face's interval is selected in `[64 / 2, min(v * 2, 64 * 2)]`.
- **Other methods:** Including `RandomDistort`,`ExpandImage`,`RandomInterpImage`,`RandomFlipImage` etc.
Please refer to [DATA.md](../../docs/DATA.md#APIs) for details.
## Benchmark and Model Zoo
Supported architectures is shown in the below table, please refer to
[Algorithm Description](#Algorithm-Description) for details of the algorithm.
| | Original | Lite <sup>[1](#lite)</sup> | NAS <sup>[2](#nas)</sup> |
|:------------------------:|:--------:|:--------------------------:|:------------------------:|
| [BlazeFace](#BlazeFace) | ✓ | ✓ | ✓ |
| [FaceBoxes](#FaceBoxes) | ✓ | ✓ | x |
<a name="lite">[1]</a> `Lite` edition means reduces the number of network layers and channels.
<a name="nas">[2]</a> `NAS` edition means use `Neural Architecture Search` algorithm to
optimized network structure.
**Todo List:**
- [ ] HamBox
- [ ] Pyramidbox
### Model Zoo
#### mAP in WIDER FACE
| Architecture | Type | Size | Img/gpu | Lr schd | Easy Set | Medium Set | Hard Set | Download |
|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:|
| BlazeFace | Original | 640 | 8 | 32w | **0.915** | **0.892** | **0.797** | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_original.tar) |
| BlazeFace | Lite | 640 | 8 | 32w | 0.909 | 0.885 | 0.781 | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) |
| BlazeFace | NAS | 640 | 8 | 32w | 0.837 | 0.807 | 0.658 | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) |
| FaceBoxes | Original | 640 | 8 | 32w | 0.875 | 0.848 | 0.568 | [model](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_original.tar) |
| FaceBoxes | Lite | 640 | 8 | 32w | 0.898 | 0.872 | 0.752 | [model](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_lite.tar) |
**NOTES:**
- Get mAP in `Easy/Medium/Hard Set` by multi-scale evaluation in `tools/face_eval.py`.
For details can refer to [Evaluation](#Evaluate-on-the-WIDER-FACE).
- BlazeFace-Lite Training and Testing ues [blazeface.yml](../../configs/face_detection/blazeface.yml)
configs file and set `lite_edition: true`.
#### mAP in FDDB
| Architecture | Type | Size | DistROC | ContROC |
|:------------:|:--------:|:----:|:-------:|:-------:|
| BlazeFace | Original | 640 | **0.992** | **0.762** |
| BlazeFace | Lite | 640 | 0.990 | 0.756 |
| BlazeFace | NAS | 640 | 0.981 | 0.741 |
| FaceBoxes | Original | 640 | 0.985 | 0.731 |
| FaceBoxes | Lite | 640 | 0.987 | 0.741 |
**NOTES:**
- Get mAP by multi-scale evaluation on the FDDB dataset.
For details can refer to [Evaluation](#Evaluate-on-the-FDDB).
#### Infer Time and Model Size comparison
| Architecture | Type | Size | P4 (ms) | CPU (ms) | ARM (ms) | File size (MB) | Flops |
|:------------:|:--------:|:----:|:---------:|:--------:|:----------:|:--------------:|:---------:|
| BlazeFace | Original | 128 | - | - | - | - | - |
| BlazeFace | Lite | 128 | - | - | - | - | - |
| BlazeFace | NAS | 128 | - | - | - | - | - |
| FaceBoxes | Original | 128 | - | - | - | - | - |
| FaceBoxes | Lite | 128 | - | - | - | - | - |
| BlazeFace | Original | 320 | - | - | - | - | - |
| BlazeFace | Lite | 320 | - | - | - | - | - |
| BlazeFace | NAS | 320 | - | - | - | - | - |
| FaceBoxes | Original | 320 | - | - | - | - | - |
| FaceBoxes | Lite | 320 | - | - | - | - | - |
| BlazeFace | Original | 640 | - | - | - | - | - |
| BlazeFace | Lite | 640 | - | - | - | - | - |
| BlazeFace | NAS | 640 | - | - | - | - | - |
| FaceBoxes | Original | 640 | - | - | - | - | - |
| FaceBoxes | Lite | 640 | - | - | - | - | - |
**NOTES:**
- CPU: i5-7360U @ 2.30GHz. Single core and single thread.
## Get Started
`Training` and `Inference` please refer to [GETTING_STARTED.md](../../docs/GETTING_STARTED.md)
- **NOTES:**
- `BlazeFace` and `FaceBoxes` is trained in 4 GPU with `batch_size=8` per gpu (total batch size as 32)
and trained 320000 iters.(If your GPU count is not 4, please refer to the rule of training parameters
in the table of [calculation rules](../../docs/GETTING_STARTED.md#faq))
- Currently we do not support evaluation in training.
### Evaluation
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python tools/face_eval.py -c configs/face_detection/blazeface.yml
```
- Optional arguments
- `-d` or `--dataset_dir`: Dataset path, same as dataset_dir of configs. Such as: `-d dataset/wider_face`.
- `-f` or `--output_eval`: Evaluation file directory, default is `output/pred`.
- `-e` or `--eval_mode`: Evaluation mode, include `widerface` and `fddb`, default is `widerface`.
- `--multi_scale`: If you add this action button in the command, it will select `multi_scale` evaluation.
Default is `False`, it will select `single-scale` evaluation.
After the evaluation is completed, the test result in txt format will be generated in `output/pred`,
and then mAP will be calculated according to different data sets. If you set `--eval_mode=widerface`,
it will [Evaluate on the WIDER FACE](#Evaluate-on-the-WIDER-FACE).If you set `--eval_mode=fddb`,
it will [Evaluate on the FDDB](#Evaluate-on-the-FDDB).
#### Evaluate on the WIDER FACE
- Download the official evaluation script to evaluate the AP metrics:
```
wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
unzip eval_tools.zip && rm -f eval_tools.zip
```
- Modify the result path and the name of the curve to be drawn in `eval_tools/wider_eval.m`:
```
# Modify the folder name where the result is stored.
pred_dir = './pred';
# Modify the name of the curve to be drawn
legend_name = 'Fluid-BlazeFace';
```
- `wider_eval.m` is the main execution program of the evaluation module. The run command is as follows:
```
matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
```
#### Evaluate on the FDDB
[FDDB dataset](http://vis-www.cs.umass.edu/fddb/) details can refer to FDDB's official website.
- Download the official dataset and evaluation script to evaluate the ROC metrics:
```
#external link to the Faces in the Wild data set
wget http://tamaraberg.com/faceDataset/originalPics.tar.gz
#The annotations are split into ten folds. See README for details.
wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz
#information on directory structure and file formats
wget http://vis-www.cs.umass.edu/fddb/README.txt
```
- Install OpenCV: Requires [OpenCV library](http://sourceforge.net/projects/opencvlibrary/)
If the utility 'pkg-config' is not available for your operating system,
edit the Makefile to manually specify the OpenCV flags as following:
```
INCS = -I/usr/local/include/opencv
LIBS = -L/usr/local/lib -lcxcore -lcv -lhighgui -lcvaux -lml
```
- Compile FDDB evaluation code: execute `make` in evaluation folder.
- Generate full image path list and groundtruth in FDDB-folds. The run command is as follows:
```
cat `ls|grep -v"ellipse"` > filePath.txt` and `cat *ellipse* > fddb_annotFile.txt`
```
- Evaluation
Finally evaluation command is:
```
./evaluate -a ./FDDB/FDDB-folds/fddb_annotFile.txt \
-d DETECTION_RESULT.txt -f 0 \
-i ./FDDB -l ./FDDB/FDDB-folds/filePath.txt \
-r ./OUTPUT_DIR -z .jpg
```
**NOTES:** The interpretation of the argument can be performed by `./evaluate --help`.
## Algorithm Description
### BlazeFace
**Introduction:**
[BlazeFace](https://arxiv.org/abs/1907.05047) is Google Research published face detection model.
It's lightweight but good performance, and tailored for mobile GPU inference. It runs at a speed
of 200-1000+ FPS on flagship devices.
**Particularity:**
- Anchor scheme stops at 8×8(input 128x128), 6 anchors per pixel at that resolution.
- 5 single, and 6 double BlazeBlocks: 5×5 depthwise convs, same accuracy with fewer layers.
- Replace the non-maximum suppression algorithm with a blending strategy that estimates the
regression parameters of a bounding box as a weighted mean between the overlapping predictions.
**Edition information:**
- Original: Reference original paper reproduction.
- Lite: Replace 5x5 conv with 3x3 conv, fewer network layers and conv channels.
- NAS: use `Neural Architecture Search` algorithm to optimized network structure,
less network layer and conv channel number than `Lite`.
### FaceBoxes
**Introduction:**
[FaceBoxes](https://arxiv.org/abs/1708.05234) which named A CPU Real-time Face Detector
with High Accuracy is face detector proposed by Shifeng Zhang, with high performance on
both speed and accuracy. This paper is published by IJCB(2017).
**Particularity:**
- Anchor scheme stops at 20x20, 10x10, 5x5, which network input size is 640x640,
including 3, 1, 1 anchors per pixel at each resolution. The corresponding densities
are 1, 2, 4(20x20), 4(10x10) and 4(5x5).
- 2 convs with CReLU, 2 poolings, 3 inceptions and 2 convs with ReLU.
- Use density prior box to improve detection accuracy.
**Edition information:**
- Original: Reference original paper reproduction.
- Lite: 2 convs with CReLU, 1 pooling, 2 convs with ReLU, 3 inceptions and 2 convs with ReLU.
Anchor scheme stops at 80x80 and 40x40, including 3, 1 anchors per pixel at each resolution.
The corresponding densities are 1, 2, 4(80x80) and 4(40x40), using less conv channel number than lite.
## Contributing
Contributions are highly welcomed and we would really appreciate your feedback!!
......@@ -89,7 +89,7 @@ SSDEvalFeed:
fields: ['image', 'im_id', 'gt_box']
dataset:
dataset_dir: dataset/wider_face
annotation: annotFile.txt #wider_face_split/wider_face_val_bbx_gt.txt
annotation: wider_face_split/wider_face_val_bbx_gt.txt
image_dir: WIDER_val/images
drop_last: false
image_shape: [3, 640, 640]
......
......@@ -21,7 +21,7 @@ FasterRCNN:
bbox_assigner: BBoxAssigner
ResNet:
norm_type: affine_channel
norm_type: bn
norm_decay: 0.
depth: 50
feature_maps: [2, 3, 4, 5]
......
......@@ -82,8 +82,8 @@ LearningRate:
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
......
......@@ -24,7 +24,7 @@ ResNet:
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: affine_channel
norm_type: bn
FPN:
max_level: 6
......
architecture: CascadeRCNN
train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 500000
snapshot_iter: 10000
use_gpu: true
log_iter: 20
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_coco_pretrained.tar
weights: output/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas/model_final
metric: COCO
num_classes: 81
CascadeRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: CascadeBBoxHead
bbox_assigner: CascadeBBoxAssigner
SENet:
depth: 152
feature_maps: [2, 3, 4, 5]
freeze_at: 2
group_width: 4
groups: 64
norm_type: bn
freeze_norm: True
variant: d
dcn_v2_stages: [3, 4, 5]
std_senet: True
FPN:
min_level: 2
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
freeze_norm: False
norm_type: gn
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 7
sampling_ratio: 2
CascadeBBoxAssigner:
batch_size_per_im: 1024
bbox_reg_weights: [10, 20, 30]
bg_thresh_lo: [0.0, 0.0, 0.0]
bg_thresh_hi: [0.5, 0.6, 0.7]
fg_thresh: [0.5, 0.6, 0.7]
fg_fraction: 0.25
CascadeBBoxHead:
head: CascadeXConvNormHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
CascadeXConvNormHead:
norm_type: gn
CascadeTwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [400000, 460000]
- !LinearWarmup
start_factor: 0.01
steps: 2000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
FasterRCNNTrainFeed:
batch_size: 1
dataset:
dataset_dir: dataset/objects365
annotation: annotations/train.json
image_dir: train
sample_transforms:
- !DecodeImage
to_rgb: False
with_mixup: False
- !RandomFlipImage
is_mask_flip: true
is_normalized: false
prob: 0.5
- !NormalizeImage
is_channel_first: false
is_scale: False
mean:
- 102.9801
- 115.9465
- 122.7717
std:
- 1.0
- 1.0
- 1.0
- !ResizeImage
interp: 1
target_size:
- 416
- 448
- 480
- 512
- 544
- 576
- 608
- 640
- 672
- 704
- 736
- 768
- 800
- 832
- 864
- 896
- 928
- 960
- 992
- 1024
- 1056
- 1088
- 1120
- 1152
- 1184
- 1216
- 1248
- 1280
- 1312
- 1344
- 1376
- 1408
max_size: 1600
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
num_workers: 4
class_aware_sampling: true
FasterRCNNEvalFeed:
batch_size: 1
dataset:
dataset_dir: dataset/objects365
annotation: annotations/val.json
image_dir: val
sample_transforms:
- !DecodeImage
to_rgb: False
with_mixup: False
- !NormalizeImage
is_channel_first: false
is_scale: False
mean:
- 102.9801
- 115.9465
- 122.7717
std:
- 1.0
- 1.0
- 1.0
- !ResizeImage
target_size: 800
max_size: 1333
interp: 1
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
FasterRCNNTestFeed:
batch_size: 1
dataset:
annotation: dataset/obj365/annotations/val.json
sample_transforms:
- !DecodeImage
to_rgb: False
with_mixup: False
- !NormalizeImage
is_channel_first: false
is_scale: False
mean:
- 102.9801
- 115.9465
- 122.7717
std:
- 1.0
- 1.0
- 1.0
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
drop_last: false
num_workers: 2
architecture: YOLOv3
train_feed: YoloTrainFeed
eval_feed: YoloEvalFeed
test_feed: YoloTestFeed
use_gpu: true
max_iters: 200000
log_smooth_window: 20
save_dir: output
snapshot_iter: 5000
metric: COCO
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar
weights: https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar
num_classes: 1
YOLOv3:
backbone: DarkNet
yolo_head: YOLOv3Head
DarkNet:
norm_type: sync_bn
norm_decay: 0.
depth: 53
YOLOv3Head:
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
norm_decay: 0.
ignore_thresh: 0.7
label_smooth: true
nms:
background_label: -1
keep_top_k: 100
nms_threshold: 0.45
nms_top_k: 1000
normalized: false
score_threshold: 0.01
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 150000
- 180000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
YoloTrainFeed:
batch_size: 8
dataset:
dataset_dir: dataset/pedestrian
annotation: annotations/instances_train2017.json
image_dir: train2017
num_workers: 8
bufsize: 128
use_process: true
YoloEvalFeed:
batch_size: 8
image_shape: [3, 608, 608]
dataset:
dataset_dir: dataset/pedestrian
annotation: annotations/instances_val2017.json
image_dir: val2017
YoloTestFeed:
batch_size: 1
image_shape: [3, 608, 608]
dataset:
annotation: contrib/PedestrianDetection/pedestrian.json
# PaddleDetection applied for specific scenarios
We provide some models implemented by PaddlePaddle to detect objects in specific scenarios, users can download the models and use them in these scenarios.
| Task | Algorithm | Box AP | Download |
|:---------------------|:---------:|:------:| :-------------------------------------------------------------------------------------: |
| Vehicle Detection | YOLOv3 | 54.5 | [model](https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar) |
| Pedestrian Detection | YOLOv3 | 51.8 | [model](https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar) |
## Vehicle Detection
One of major applications of vehichle detection is traffic monitoring. In this scenary, vehicles to be detected are mostly captured by the cameras mounted on top of traffic light columns.
### 1. Network
The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53.
### 2. Configuration for training
PaddleDetection provides users with a configuration file [yolov3_darnet.yml](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleDetection/configs/yolov3_darknet.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for vehicle detection:
* max_iters: 120000
* num_classes: 6
* anchors: [[8, 9], [10, 23], [19, 15], [23, 33], [40, 25], [54, 50], [101, 80], [139, 145], [253, 224]]
* label_smooth: false
* nms/nms_top_k: 400
* nms/score_threshold: 0.005
* milestones: [60000, 80000]
* dataset_dir: dataset/vehicle
### 3. Accuracy
The accuracy of the model trained and evaluated on our private data is shown as followed:
AP at IoU=.50:.05:.95 is 0.545.
AP at IoU=.50 is 0.764.
### 4. Inference
Users can employ the model to conduct the inference:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/infer.py -c contrib/VehicleDetection/vehicle_yolov3_darknet.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar \
--infer_dir contrib/VehicleDetection/demo \
--draw_threshold 0.2 \
--output_dir contrib/VehicleDetection/demo/output
```
Some inference results are visualized below:
![](VehicleDetection/demo/output/001.jpeg)
![](VehicleDetection/demo/output/005.png)
## Pedestrian Detection
The main applications of pedetestrian detection include intelligent monitoring. In this scenary, photos of pedetestrians are taken by surveillance cameras in public areas, then pedestrian detection are conducted on these photos.
### 1. Network
The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53.
### 2. Configuration for training
PaddleDetection provides users with a configuration file [yolov3_darnet.yml](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleDetection/configs/yolov3_darknet.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for pedestrian detection:
* max_iters: 200000
* num_classes: 1
* snapshot_iter: 5000
* milestones: [150000, 180000]
* dataset_dir: dataset/pedestrian
### 3. Accuracy
The accuracy of the model trained and evaluted on our private data is shown as followed:
AP at IoU=.50:.05:.95 is 0.518.
AP at IoU=.50 is 0.792.
### 4. Inference
Users can employ the model to conduct the inference:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/infer.py -c contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar \
--infer_dir contrib/PedestrianDetection/demo \
--draw_threshold 0.3 \
--output_dir contrib/PedestrianDetection/demo/output
```
Some inference results are visualized below:
![](PedestrianDetection/demo/output/001.png)
![](PedestrianDetection/demo/output/004.png)
# PaddleDetection 特色垂类检测模型
我们提供了针对不同场景的基于PaddlePaddle的检测模型,用户可以下载模型进行使用。
| 任务 | 算法 | 精度(Box AP) | 下载 |
|:---------------------|:---------:|:------:| :---------------------------------------------------------------------------------: |
| 车辆检测 | YOLOv3 | 54.5 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar) |
| 行人检测 | YOLOv3 | 51.8 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar) |
## 车辆检测(Vehicle Detection)
车辆检测的主要应用之一是交通监控。在这样的监控场景中,待检测的车辆多为道路红绿灯柱上的摄像头拍摄所得。
### 1. 模型结构
Backbone为Dacknet53的YOLOv3。
### 2. 训练参数配置
PaddleDetection提供了使用COCO数据集对YOLOv3进行训练的参数配置文件[yolov3_darnet.yml](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleDetection/configs/yolov3_darknet.yml),与之相比,在进行车辆检测的模型训练时,我们对以下参数进行了修改:
* max_iters: 120000
* num_classes: 6
* anchors: [[8, 9], [10, 23], [19, 15], [23, 33], [40, 25], [54, 50], [101, 80], [139, 145], [253, 224]]
* label_smooth: false
* nms/nms_top_k: 400
* nms/score_threshold: 0.005
* milestones: [60000, 80000]
* dataset_dir: dataset/vehicle
### 3. 精度指标
模型在我们内部数据上的精度指标为:
IOU=.50:.05:.95时的AP为 0.545。
IOU=.5时的AP为 0.764。
### 4. 预测
用户可以使用我们训练好的模型进行车辆检测:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/infer.py -c contrib/VehicleDetection/vehicle_yolov3_darknet.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar \
--infer_dir contrib/VehicleDetection/demo \
--draw_threshold 0.2 \
--output_dir contrib/VehicleDetection/demo/output
```
预测结果示例:
![](VehicleDetection/demo/output/001.jpeg)
![](VehicleDetection/demo/output/005.png)
## 行人检测(Pedestrian Detection)
行人检测的主要应用有智能监控。在监控场景中,大多是从公共区域的监控摄像头视角拍摄行人,获取图像后再进行行人检测。
### 1. 模型结构
Backbone为Dacknet53的YOLOv3。
### 2. 训练参数配置
PaddleDetection提供了使用COCO数据集对YOLOv3进行训练的参数配置文件[yolov3_darnet.yml](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleDetection/configs/yolov3_darknet.yml),与之相比,在进行行人检测的模型训练时,我们对以下参数进行了修改:
* max_iters: 200000
* num_classes: 1
* snapshot_iter: 5000
* milestones: [150000, 180000]
* dataset_dir: dataset/pedestrian
### 2. 精度指标
模型在我们针对监控场景的内部数据上精度指标为:
IOU=.5时的AP为 0.792。
IOU=.5-.95时的AP为 0.518。
### 3. 预测
用户可以使用我们训练好的模型进行行人检测:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/infer.py -c contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar \
--infer_dir contrib/PedestrianDetection/demo \
--draw_threshold 0.3 \
--output_dir contrib/PedestrianDetection/demo/output
```
预测结果示例:
![](PedestrianDetection/demo/output/001.png)
![](PedestrianDetection/demo/output/004.png)
architecture: YOLOv3
train_feed: YoloTrainFeed
eval_feed: YoloEvalFeed
test_feed: YoloTestFeed
use_gpu: true
max_iters: 120000
log_smooth_window: 20
save_dir: output
snapshot_iter: 2000
metric: COCO
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar
weights: https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar
num_classes: 6
YOLOv3:
backbone: DarkNet
yolo_head: YOLOv3Head
DarkNet:
norm_type: sync_bn
norm_decay: 0.
depth: 53
YOLOv3Head:
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors: [[8, 9], [10, 23], [19, 15],
[23, 33], [40, 25], [54, 50],
[101, 80], [139, 145], [253, 224]]
norm_decay: 0.
ignore_thresh: 0.7
label_smooth: false
nms:
background_label: -1
keep_top_k: 100
nms_threshold: 0.45
nms_top_k: 400
normalized: false
score_threshold: 0.005
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 60000
- 80000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
YoloTrainFeed:
batch_size: 8
dataset:
dataset_dir: dataset/vehicle
annotation: annotations/instances_train2017.json
image_dir: train2017
num_workers: 8
bufsize: 128
use_process: true
YoloEvalFeed:
batch_size: 8
image_shape: [3, 608, 608]
dataset:
dataset_dir: dataset/vehicle
annotation: annotations/instances_val2017.json
image_dir: val2017
YoloTestFeed:
batch_size: 1
image_shape: [3, 608, 608]
dataset:
annotation: contrib/VehicleDetection/vehicle.json
# All rights `PaddleDetection` reserved
# References:
# @inproceedings{yang2016wider,
# Author = {Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou},
# Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
# Title = {WIDER FACE: A Face Detection Benchmark},
# Year = {2016}}
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget https://dataset.bj.bcebos.com/wider_face/WIDER_train.zip
wget https://dataset.bj.bcebos.com/wider_face/WIDER_val.zip
wget https://dataset.bj.bcebos.com/wider_face/wider_face_split.zip
# Extract the data.
echo "Extracting..."
unzip WIDER_train.zip
unzip WIDER_val.zip
unzip wider_face_split.zip
# 推理Benchmark
- 测试环境:
- CUDA 9.0
- CUDNN 7.5
- TensorRT-5.1.2.2
- PaddlePaddle v1.6
- GPU分别为: Tesla V100和Tesla P4
- 测试方式:
- 为了方面比较不同模型的推理速度,输入采用同样大小的图片,为 3x640x640,采用 `demo/000000014439_640x640.jpg` 图片。
- Batch Size=1
- 去掉前10轮warmup时间,测试100轮的平均时间,单位ms/image,包括输入数据拷贝至GPU的时间、计算时间、数据拷贝只CPU的时间。
- 采用Fluid C++预测引擎: 包含Fluid C++预测、Fluid-TensorRT预测,下面同时测试了Float32 (FP32) 和Float16 (FP16)的推理速度。
- 测试时开启了 FLAGS_cudnn_exhaustive_search=True,使用exhaustive方式搜索卷积计算算法。
### 推理速度
| 模型 | Tesla V100 Fluid (ms/image) | Tesla V100 Fluid-TensorRT-FP32 (ms/image) | Tesla V100 Fluid-TensorRT-FP16 (ms/image) | Tesla P4 Fluid (ms/image) | Tesla P4 Fluid-TensorRT-FP32 (ms/image) |
| ------------------------------------- | ----------------------------- | ------------------------------------------- | ------------------------------------------- | --------------------------- | ----------------------------------------- |
| faster_rcnn_r50_1x | 147.488 | 146.124 | 142.416 | 471.547 | 471.631 |
| faster_rcnn_r50_2x | 147.636 | 147.73 | 141.664 | 471.548 | 472.86 |
| faster_rcnn_r50_vd_1x | 146.588 | 144.767 | 141.208 | 459.357 | 457.852 |
| faster_rcnn_r50_fpn_1x | 25.11 | 24.758 | 20.744 | 59.411 | 57.585 |
| faster_rcnn_r50_fpn_2x | 25.351 | 24.505 | 20.509 | 59.594 | 57.591 |
| faster_rcnn_r50_vd_fpn_2x | 25.514 | 25.292 | 21.097 | 61.026 | 58.377 |
| faster_rcnn_r50_fpn_gn_2x | 36.959 | 36.173 | 32.356 | 101.339 | 101.212 |
| faster_rcnn_dcn_r50_fpn_1x | 28.707 | 28.162 | 27.503 | 68.154 | 67.443 |
| faster_rcnn_dcn_r50_vd_fpn_2x | 28.576 | 28.271 | 27.512 | 68.959 | 68.448 |
| faster_rcnn_r101_1x | 153.267 | 150.985 | 144.849 | 490.104 | 486.836 |
| faster_rcnn_r101_fpn_1x | 30.949 | 30.331 | 24.021 | 73.591 | 69.736 |
| faster_rcnn_r101_fpn_2x | 30.918 | 29.126 | 23.677 | 73.563 | 70.32 |
| faster_rcnn_r101_vd_fpn_1x | 31.144 | 30.202 | 23.57 | 74.767 | 70.773 |
| faster_rcnn_r101_vd_fpn_2x | 30.678 | 29.969 | 23.327 | 74.882 | 70.842 |
| faster_rcnn_x101_vd_64x4d_fpn_1x | 60.36 | 58.461 | 45.172 | 132.178 | 131.734 |
| faster_rcnn_x101_vd_64x4d_fpn_2x | 59.003 | 59.163 | 46.065 | 131.422 | 132.186 |
| faster_rcnn_dcn_r101_vd_fpn_1x | 36.862 | 37.205 | 36.539 | 93.273 | 92.616 |
| faster_rcnn_dcn_x101_vd_64x4d_fpn_1x | 78.476 | 78.335 | 77.559 | 185.976 | 185.996 |
| faster_rcnn_se154_vd_fpn_s1x | 166.282 | 90.508 | 80.738 | 304.653 | 193.234 |
| mask_rcnn_r50_1x | 160.185 | 160.4 | 160.322 | - | - |
| mask_rcnn_r50_2x | 159.821 | 159.527 | 160.41 | - | - |
| mask_rcnn_r50_fpn_1x | 95.72 | 95.719 | 92.455 | 259.8 | 258.04 |
| mask_rcnn_r50_fpn_2x | 84.545 | 83.567 | 79.269 | 227.284 | 222.975 |
| mask_rcnn_r50_vd_fpn_2x | 82.07 | 82.442 | 77.187 | 223.75 | 221.683 |
| mask_rcnn_r50_fpn_gn_2x | 94.936 | 94.611 | 91.42 | 265.468 | 263.76 |
| mask_rcnn_dcn_r50_fpn_1x | 97.828 | 97.433 | 93.76 | 256.295 | 258.056 |
| mask_rcnn_dcn_r50_vd_fpn_2x | 77.831 | 79.453 | 76.983 | 205.469 | 204.499 |
| mask_rcnn_r101_fpn_1x | 95.543 | 97.929 | 90.314 | 252.997 | 250.782 |
| mask_rcnn_r101_vd_fpn_1x | 98.046 | 97.647 | 90.272 | 261.286 | 262.108 |
| mask_rcnn_x101_vd_64x4d_fpn_1x | 115.461 | 115.756 | 102.04 | 296.066 | 293.62 |
| mask_rcnn_x101_vd_64x4d_fpn_2x | 107.144 | 107.29 | 97.275 | 267.636 | 267.577 |
| mask_rcnn_dcn_r101_vd_fpn_1x | 85.504 | 84.875 | 84.907 | 225.202 | 226.585 |
| mask_rcnn_dcn_x101_vd_64x4d_fpn_1x | 129.937 | 129.934 | 127.804 | 326.786 | 326.161 |
| mask_rcnn_se154_vd_fpn_s1x | 214.188 | 139.807 | 121.516 | 440.391 | 439.727 |
| cascade_rcnn_r50_fpn_1x | 36.866 | 36.949 | 36.637 | 101.851 | 101.912 |
| cascade_mask_rcnn_r50_fpn_1x | 110.344 | 106.412 | 100.367 | 301.703 | 297.739 |
| cascade_rcnn_dcn_r50_fpn_1x | 40.412 | 39.58 | 39.853 | 110.346 | 110.077 |
| cascade_mask_rcnn_r50_fpn_gn_2x | 170.092 | 168.758 | 163.298 | 527.998 | 529.59 |
| cascade_rcnn_dcn_r101_vd_fpn_1x | 48.414 | 48.849 | 48.701 | 134.9 | 134.846 |
| cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x | 90.062 | 90.218 | 90.009 | 228.67 | 228.396 |
| retinanet_r101_fpn_1x | 55.59 | 54.636 | 48.489 | 90.394 | 83.951 |
| retinanet_r50_fpn_1x | 50.048 | 47.932 | 44.385 | 73.819 | 70.282 |
| retinanet_x101_vd_64x4d_fpn_1x | 83.329 | 83.446 | 70.76 | 145.936 | 146.168 |
| yolov3_darknet | 21.427 | 20.252 | 13.856 | 55.173 | 55.692 |
| yolov3_darknet_voc | 17.58 | 16.241 | 9.473 | 51.049 | 51.249 |
| yolov3_mobilenet_v1 | 12.869 | 11.834 | 9.408 | 24.887 | 21.352 |
| yolov3_mobilenet_v1_voc | 9.118 | 8.146 | 5.575 | 20.787 | 17.169 |
| yolov3_r34 | 14.914 | 14.125 | 11.176 | 20.798 | 20.822 |
| yolov3_r34_voc | 11.288 | 10.73 | 7.7 | 25.874 | 22.399 |
| ssd_mobilenet_v1_voc | 5.763 | 5.854 | 4.589 | 11.75 | 9.485 |
| ssd_vgg16_300 | 28.722 | 29.644 | 20.399 | 73.707 | 74.531 |
| ssd_vgg16_300_voc | 18.425 | 19.288 | 11.298 | 56.297 | 56.201 |
| ssd_vgg16_512 | 27.471 | 28.328 | 19.328 | 68.685 | 69.808 |
| ssd_vgg16_512_voc | 18.721 | 19.636 | 12.004 | 54.688 | 56.174 |
1. RCNN系列模型Fluid-TensorRT速度相比Fluid预测没有优势,原因是: TensorRT仅支持定长输入,当前基于ResNet系列的RCNN模型,只有backbone部分采用了TensorRT子图计算,比较耗时的stage-5没有基于TensorRT计算。 Fluid对CNN模型也做了一系列的融合优化。后续TensorRT版本升级、或有其他优化策略时再更新数据。
2. YOLO v3系列模型,Fluid-TensorRT相比Fluid预测加速5% - 10%不等。
3. SSD和YOLOv3系列模型 TensorRT-FP16预测速度有一定的优势,加速约20% - 40%不等。具体如下图。
<div align="center">
<img src="images/bench_ssd_yolo_infer.png" />
</div>
# CACascade RCNN
## 简介
CACascade RCNN是百度视觉技术部在Objects365 2019 Challenge上夺冠的最佳单模型之一,Objects365是在通用物体检测领域的一个全新的数据集,旨在促进对自然场景不同对象的检测研究。Objects365在63万张图像上标注了365个对象类,训练集中共有超过1000万个边界框。这里放出的是Full Track任务中最好的单模型之一。
<div align="center">
<img src="../demo/obj365_gt.png"/>
</div>
## 方法描述
针对大规模物体检测算法的特点,我们提出了一种基于图片包含物体类别的数量的采样方式(Class Aware Sampling)。基于这种方式进行训练模型可以在更短的时间使模型收敛到更好的效果。
<div align="center">
<img src="../demo/cas.png"/>
</div>
本次公布的最好单模型是一个基于Cascade RCNN的两阶段检测模型,在此基础上将Backbone替换为更加强大的SENet154模型,Deformable Conv模块以及更复杂二阶段网络结构,针对BatchSize比较小的情况增加了Group Normalization操作并同时使用了多尺度训练,最终达到了非常理想的效果。预训练模型先后分别在ImageNet和COCO数据集上进行了训练,其中在COCO数据集上训练时增加了Mask分支,其余结构与CACascade RCNN相同, 会在启动训练时自动下载。
## 使用方法
1.准备数据
数据需要通过[Objects365官方网站](https://www.objects365.org/download.html)进行申请下载,数据下载后将数据放置在dataset目录中。
```
${THIS REPO ROOT}
\--dataset
\-- objects365
\-- annotations
|-- train.json
|-- val.json
\-- train
\-- val
```
2.启动训练模型
```bash
python tools/train.py -c configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn.yml
```
3.模型预测结果
| 模型 | 验证集 mAP | 下载链接 |
| :-----------------: | :--------: | :----------------------------------------------------------: |
| CACascadeRCNN SE154 | 31.7 | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas_obj365.tar) |
## 模型效果
<div align="center">
<img src="../demo/obj365_pred.png"/>
</div>
English | [简体中文](CONFIG_cn.md)
# Config Pipline
## Introduction
......
English | [简体中文](DATA_cn.md)
# Data Pipline
## Introduction
......@@ -126,6 +128,8 @@ the corresponding data stream. Many aspect of the `Reader`, such as storage
location, preprocessing pipeline, acceleration mode can be configured with yaml
files.
### APIs
The main APIs are as follows:
1. Data parsing
......@@ -139,7 +143,7 @@ The main APIs are as follows:
- `source/loader.py`: Roidb dataset parser. [source](../ppdet/data/source/loader.py)
2. Operator
`transform/operators.py`: Contains a variety of data enhancement methods, including:
`transform/operators.py`: Contains a variety of data augmentation methods, including:
- `DecodeImage`: Read images in RGB format.
- `RandomFlipImage`: Horizontal flip.
- `RandomDistort`: Distort brightness, contrast, saturation, and hue.
......@@ -150,7 +154,7 @@ The main APIs are as follows:
- `NormalizeImage`: Normalize image pixel values.
- `NormalizeBox`: Normalize the bounding box.
- `Permute`: Arrange the channels of the image and optionally convert image to BGR format.
- `MixupImage`: Mixup two images with given fraction<sup>[1](#vd)</sup>.
- `MixupImage`: Mixup two images with given fraction<sup>[1](#mix)</sup>.
<a name="mix">[1]</a> Please refer to [this paper](https://arxiv.org/pdf/1710.09412.pdf)
......@@ -177,16 +181,18 @@ whole data pipeline is fully customizable through the yaml configuration files.
#### Custom Datasets
- Option 1: Convert the dataset to COCO or VOC format.
- Option 1: Convert the dataset to COCO format.
```sh
# a small utility (`tools/labelme2coco.py`) is provided to convert
# Labelme-annotated dataset to COCO format.
python ./ppdet/data/tools/labelme2coco.py --json_input_dir ./labelme_annos/
# a small utility (`tools/x2coco.py`) is provided to convert
# Labelme-annotated dataset or cityscape dataset to COCO format.
python ./ppdet/data/tools/x2coco.py --dataset_type labelme
--json_input_dir ./labelme_annos/
--image_input_dir ./labelme_imgs/
--output_dir ./cocome/
--train_proportion 0.8
--val_proportion 0.2
--test_proportion 0.0
# --dataset_type: The data format which is need to be converted. Currently supported are: 'labelme' and 'cityscape'
# --json_input_dir:The path of json files which are annotated by Labelme.
# --image_input_dir:The path of images.
# --output_dir:The path of coverted COCO dataset.
......
......@@ -105,9 +105,9 @@ python ./ppdet/data/tools/generate_data_for_training.py
4. 数据获取接口
为方便训练时的数据获取,我们将多个`data.Dataset`组合在一起构成一个`data.Reader`为用户提供数据,用户只需要调用`Reader.[train|eval|infer]`即可获得对应的数据流。`Reader`支持yaml文件配置数据地址、预处理过程、加速方式等。
主要的APIs如下:
### APIs
主要的APIs如下:
1. 数据解析
......@@ -165,15 +165,17 @@ coco = Reader(ccfg.DATA, ccfg.TRANSFORM, maxiter=-1)
```
#### 如何使用自定义数据集?
- 选择1:将数据集转换为VOC格式或者COCO格式。
- 选择1:将数据集转换为COCO格式。
```
# 在./tools/中提供了labelme2coco.py用于将labelme标注的数据集转换为COCO数据集
python ./ppdet/data/tools/labelme2coco.py --json_input_dir ./labelme_annos/
# 在./tools/中提供了x2coco.py用于将labelme标注的数据集或cityscape数据集转换为COCO数据集
python ./ppdet/data/tools/x2coco.py --dataset_type labelme
--json_input_dir ./labelme_annos/
--image_input_dir ./labelme_imgs/
--output_dir ./cocome/
--train_proportion 0.8
--val_proportion 0.2
--test_proportion 0.0
# --dataset_type:需要转换的数据格式,目前支持:’labelme‘和’cityscape‘
# --json_input_dir:使用labelme标注的json文件所在文件夹
# --image_input_dir:图像文件所在文件夹
# --output_dir:转换后的COCO格式数据集存放位置
......
# 模型导出
训练得到一个满足要求的模型后,如果想要将该模型接入到C++预测库或者Serving服务,需要通过`tools/export_model.py`导出该模型。
## 启动参数说明
| FLAG | 用途 | 默认值 | 备注 |
|:--------------:|:--------------:|:------------:|:-----------------------------------------:|
| -c | 指定配置文件 | None | |
| --output_dir | 模型保存路径 | `./output` | 模型默认保存在`output/配置文件名/`路径下 |
## 使用示例
使用[训练/评估/推断](GETTING_STARTED_cn.md)中训练得到的模型进行试用,脚本如下
```bash
# 导出FasterRCNN模型, 模型中data层默认的shape为3x800x1333
python tools/export_model.py -c configs/faster_rcnn_r50_1x.yml \
--output_dir=./inference_model \
-o weights=output/faster_rcnn_r50_1x/model_final \
```
预测模型会导出到`inference_model/faster_rcnn_r50_1x`目录下,模型名和参数名分别为`__model__``__params__`
## 设置导出模型的输入大小
使用Fluid-TensorRT进行预测时,由于<=TensorRT 5.1的版本仅支持定长输入,保存模型的`data`层的图片大小需要和实际输入图片大小一致。而Fluid C++预测引擎没有此限制。可通过设置TestFeed的`image_shape`可以修改保存模型中的输入图片大小。示例如下:
```bash
# 导出FasterRCNN模型,输入是3x640x640
python tools/export_model.py -c configs/faster_rcnn_r50_1x.yml \
--output_dir=./inference_model \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar \
FasterRCNNTestFeed.image_shape=[3,640,640]
# 导出YOLOv3模型,输入是3x320x320
python tools/export_model.py -c configs/yolov3_darknet.yml \
--output_dir=./inference_model \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar \
YoloTestFeed.image_shape=[3,320,320]
# 导出SSD模型,输入是3x300x300
python tools/export_model.py -c configs/ssd/ssd_mobilenet_v1_voc.yml \
--output_dir=./inference_model \
-o weights= https://paddlemodels.bj.bcebos.com/object_detection/ssd_mobilenet_v1_voc.tar \
SSDTestFeed.image_shape=[3,300,300]
```
English | [简体中文](GETTING_STARTED_cn.md)
# Getting Started
For setting up the running environment, please refer to [installation
instructions](INSTALL.md).
## Training
#### Single-GPU Training
## Training/Evaluation/Inference
PaddleDetection provides scripots for training, evalution and inference with various features according to different configure.
```bash
export CUDA_VISIBLE_DEVICES=0
# set PYTHONPATH
export PYTHONPATH=$PYTHONPATH:.
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```
#### Multi-GPU Training
```bash
# training in single-GPU and multi-GPU. specify different GPU numbers by CUDA_VISIBLE_DEVICES
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
# GPU evalution
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
# Inference
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
```
#### CPU Training
### Optional argument list
```bash
export CPU_NUM=8
export PYTHONPATH=$PYTHONPATH:.
python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
```
list below can be viewed by `--help`
##### Optional arguments
| FLAG | script supported | description | default | remark |
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
| -c | ALL | Select config file | None | **The whole description of configure can refer to [config_example](config_example)** |
| -o | ALL | Set parameters in configure file | None | `-o` has higher priority to file configured by `-c`. Such as `-o use_gpu=False max_iter=10000` |
| -r/--resume_checkpoint | train | Checkpoint path for resuming training | None | `-r output/faster_rcnn_r50_1x/10000` |
| --eval | train | Whether to perform evaluation in training | False | |
| --output_eval | train/eval | json path in evalution | current path | `--output_eval ./json_result` |
| -d/--dataset_dir | train/eval | path for dataset, same as dataset_dir in configs | None | `-d dataset/coco` |
| --fp16 | train | Whether to enable mixed precision training | False | GPU training is required |
| --loss_scale | train | Loss scaling factor for mixed precision training | 8.0 | enable when `--fp16` is True |
| --json_eval | eval | Whether to evaluate with already existed bbox.json or mask.json | False | json path is set in `--output_eval` |
| --output_dir | infer | Directory for storing the output visualization files | `./output` | `--output_dir output` |
| --draw_threshold | infer | Threshold to reserve the result for visualization | 0.5 | `--draw_threshold 0.7` |
| --infer_dir | infer | Directory for images to perform inference on | None | |
| --infer_img | infer | Image path | None | higher priority over --infer_dir |
| --use_tb | train/infer | Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard | False | |
| --tb\_log_dir | train/infer | tb-paddle logging directory for image | train:`tb_log_dir/scalar` infer: `tb_log_dir/image` | |
- `-r` or `--resume_checkpoint`: Checkpoint path for resuming training. Such as: `-r output/faster_rcnn_r50_1x/10000`
- `--eval`: Whether to perform evaluation in training, default is `False`
- `--output_eval`: If perform evaluation in training, this edits evaluation directory, default is current directory.
- `-d` or `--dataset_dir`: Dataset path, same as `dataset_dir` of configs. Such as: `-d dataset/coco`
- `-c`: Select config file and all files are saved in `configs/`
- `-o`: Set configuration options in config file. Such as: `-o max_iters=180000`. `-o` has higher priority to file configured by `-c`
- `--use_tb`: Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard, default is `False`
- `--tb_log_dir`: tb-paddle logging directory for scalar, default is `tb_log_dir/scalar`
- `--fp16`: Whether to enable mixed precision training (requires GPU), default is `False`
- `--loss_scale`: Loss scaling factor for mixed precision training, default is `8.0`
## Examples
##### Examples
### Training
- Perform evaluation in training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
```
Alternating between training epoch and evaluation run is possible, simply pass
in `--eval` to do so and evaluate at each snapshot_iter. It can be modified at `snapshot_iter` of the configuration file. If evaluation dataset is large and
causes time-consuming in training, we suggest decreasing evaluation times or evaluating after training. When perform evaluation in training,
the best model with highest MAP is saved at each `snapshot_iter`. `best_model` has the same path as `model_final`.
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
```
Perform training and evalution alternatively and evaluate at each snapshot_iter. Meanwhile, the best model with highest MAP is saved at each `snapshot_iter` which has the same path as `model_final`.
- Configure dataset path
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
-d dataset/coco
```
If evaluation dataset is large, we suggest decreasing evaluation times or evaluating after training.
- Fine-tune other task
When using pre-trained model to fine-tune other task, the excluded pre-trained parameters can be set by finetune_exclude_pretrained_params in YAML config or -o finetune_exclude_pretrained_params in the arguments.
When using pre-trained model to fine-tune other task, two methods can be used:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
-o pretrain_weights=output/faster_rcnn_r50_1x/model_final/ \
finetune_exclude_pretrained_params = ['cls_score','bbox_pred']
```
1. The excluded pre-trained parameters can be set by `finetune_exclude_pretrained_params` in YAML config
2. Set -o finetune\_exclude\_pretrained_params in the arguments.
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
-o pretrain_weights=output/faster_rcnn_r50_1x/model_final/ \
finetune_exclude_pretrained_params = ['cls_score','bbox_pred']
```
##### NOTES
- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU calculation rules can refer [FAQ](#faq)
- Dataset is stored in `dataset/coco` by default (configurable).
- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
- Model checkpoints are saved in `output` by default (configurable).
- When finetuning, users could set `pretrain_weights` to the models published by PaddlePaddle. Parameters matched by fields in finetune_exclude_pretrained_params will be ignored in loading and fields can be wildcard matching. For detailed information, please refer to [Transfer Learning](TRANSFER_LEARNING.md).
- To check out hyper parameters used, please refer to the [configs](../configs).
- Checkpoints are saved in `output` by default, and can be revised from save_dir in configure files.
- RCNN models training on CPU is not supported on PaddlePaddle<=1.5.1 and will be fixed on later version.
### Mixed Precision Training
Mixed precision training can be enabled with `--fp16` flag. Currently Faster-FPN, Mask-FPN and Yolov3 have been verified to be working with little to no loss of precision (less than 0.2 mAP)
## Evaluation
To speed up mixed precision training, it is recommended to train in multi-process mode, for example
```bash
# run on GPU with:
export PYTHONPATH=$PYTHONPATH:.
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
python -m paddle.distributed.launch --selected_gpus 0,1,2,3,4,5,6,7 tools/train.py --fp16 -c configs/faster_rcnn_r50_fpn_1x.yml
```
#### Optional arguments
If loss becomes `NaN` during training, try tweak the `--loss_scale` value. Please refer to the Nvidia [documentation](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#mptrain) on mixed precision training for details.
- `-d` or `--dataset_dir`: Dataset path, same as dataset_dir of configs. Such as: `-d dataset/coco`
- `--output_eval`: Evaluation directory, default is current directory.
- `-o`: Set configuration options in config file. Such as: `-o weights=output/faster_rcnn_r50_1x/model_final`
- `--json_eval`: Whether to eval with already existed bbox.json or mask.json. Default is `False`. Json file directory is assigned by `-f` argument.
Also, please note mixed precision training currently requires changing `norm_type` from `affine_channel` to `bn`.
#### Examples
### Evaluation
- Evaluate by specified weights path and dataset path
```bash
# run on GPU with:
export PYTHONPATH=$PYTHONPATH:.
export CUDA_VISIBLE_DEVICES=0
python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
-o weights=output/faster_rcnn_r50_1x/model_final \
-d dataset/coco
```
```bash
export CUDA_VISIBLE_DEVICES=0
python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar \
-d dataset/coco
```
The path of model to be evaluted can be both local path and link in [MODEL_ZOO](MODEL_ZOO_cn.md).
- Evaluate with json
```bash
# run on GPU with:
export PYTHONPATH=$PYTHONPATH:.
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
--json_eval \
-f evaluation/
```
```
The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory. Or without the `-f` parameter, default is the current directory.
The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory.
#### NOTES
- Checkpoint is loaded from `output` by default (configurable)
- Multi-GPU evaluation for R-CNN and SSD models is not supported at the
moment, but it is a planned feature
## Inference
- Run inference on a single image:
```bash
# run on GPU with:
export PYTHONPATH=$PYTHONPATH:.
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
```
- Multi-image inference:
```bash
# run on GPU with:
export PYTHONPATH=$PYTHONPATH:.
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
```
#### Optional arguments
- `--output_dir`: Directory for storing the output visualization files.
- `--draw_threshold`: Threshold to reserve the result for visualization. Default is 0.5.
- `--save_inference_model`: Save inference model in output_dir if True.
- `--use_tb`: Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard, default is `False`
- `--tb_log_dir`: tb-paddle logging directory for image, default is `tb_log_dir/image`
#### Examples
### Inference
- Output specified directory && Set up threshold
```bash
# run on GPU with:
export PYTHONPATH=$PYTHONPATH:.
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
--infer_img=demo/000000570688.jpg \
--output_dir=infer_output/ \
--draw_threshold=0.5 \
-o weights=output/faster_rcnn_r50_1x/model_final \
--use_tb=Ture
```
```
The visualization files are saved in `output` by default, to specify a different path, simply add a `--output_dir=` flag.
`--draw_threshold` is an optional argument. Default is 0.5.
Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
If users want to infer according to customized model path, `-o weights` can be set for specified path.
`--use_tb` is an optional argument, if `--use_tb` is `True`, the tb-paddle will record data in directory,
so users can see the results in Tensorboard.
`--draw_threshold` is an optional argument. Default is 0.5.
Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
- Save inference model
```bash
# run on GPU with:
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
--infer_img=demo/000000570688.jpg \
--save_inference_model
```
- Export model
Save inference model by set `--save_inference_model`, which can be loaded by PaddlePaddle predict library.
```bash
python tools/export_model.py -c configs/faster_rcnn_r50_1x.yml \
--output_dir=inference_model \
-o weights=output/faster_rcnn_r50_1x/model_final \
FasterRCNNTestFeed.image_shape=[3,800,1333]
```
Save inference model `tools/export_model.py`, which can be loaded by PaddlePaddle predict library.
## FAQ
......
......@@ -3,206 +3,146 @@
关于配置运行环境,请参考[安装指南](INSTALL_cn.md)
## 训练
#### 单GPU训练
## 训练/评估/推断
PaddleDetection提供了训练/训练/评估三个功能的使用脚本,支持通过不同可选参数实现特定功能
```bash
export CUDA_VISIBLE_DEVICES=0
# 设置PYTHONPATH路径
export PYTHONPATH=$PYTHONPATH:.
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```
#### 多GPU训练
```bash
# GPU训练 支持单卡,多卡训练,通过CUDA_VISIBLE_DEVICES指定卡号
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
# GPU评估
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
# 推断
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
```
#### CPU训练
### 可选参数列表
```bash
export CPU_NUM=8
export PYTHONPATH=$PYTHONPATH:.
python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
```
以下列表可以通过`--help`查看
| FLAG | 支持脚本 | 用途 | 默认值 | 备注 |
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
| -c | ALL | 指定配置文件 | None | **完整配置说明请参考[配置案例](config_example)** |
| -o | ALL | 设置配置文件里的参数内容 | None | 使用-o配置相较于-c选择的配置文件具有更高的优先级。例如:`-o use_gpu=False max_iter=10000` |
| -r/--resume_checkpoint | train | 从某一检查点恢复训练 | None | `-r output/faster_rcnn_r50_1x/10000` |
| --eval | train | 是否边训练边测试 | False | |
| --output_eval | train/eval | 编辑评测保存json路径 | 当前路径 | `--output_eval ./json_result` |
| -d/--dataset_dir | train/eval | 数据集路径, 同配置文件里的dataset_dir | None | `-d dataset/coco` |
| --fp16 | train | 是否使用混合精度训练模式 | False | 需使用GPU训练 |
| --loss_scale | train | 设置混合精度训练模式中损失值的缩放比例 | 8.0 | 需先开启`--fp16`后使用 |
| --json_eval | eval | 是否通过已存在的bbox.json或者mask.json进行评估 | False | json文件路径在`--output_eval`中设置 |
| --output_dir | infer | 输出推断后可视化文件 | `./output` | `--output_dir output` |
| --draw_threshold | infer | 可视化时分数阈值 | 0.5 | `--draw_threshold 0.7` |
| --infer_dir | infer | 用于推断的图片文件夹路径 | None | |
| --infer_img | infer | 用于推断的图片路径 | None | 相较于`--infer_dir`具有更高优先级 |
| --use_tb | train/infer | 是否使用[tb-paddle](https://github.com/linshuliang/tb-paddle)记录数据,进而在TensorBoard中显示 | False | |
| --tb\_log_dir | train/infer | 指定 tb-paddle 记录数据的存储路径 | train:`tb_log_dir/scalar` infer: `tb_log_dir/image` | |
##### 可选参数
- `-r` or `--resume_checkpoint`: 从某一检查点恢复训练,例如: `-r output/faster_rcnn_r50_1x/10000`
- `--eval`: 是否边训练边测试,默认是 `False`
- `--output_eval`: 如果边训练边测试, 这个参数可以编辑评测保存json路径, 默认是当前目录。
- `-d` or `--dataset_dir`: 数据集路径, 同配置文件里的`dataset_dir`. 例如: `-d dataset/coco`
- `-c`: 选择配置文件,所有配置文件在`configs/`
- `-o`: 设置配置文件里的参数内容。例如: `-o max_iters=180000`。使用`-o`配置相较于`-c`选择的配置文件具有更高的优先级。
- `--use_tb`: 是否使用[tb-paddle](https://github.com/linshuliang/tb-paddle)记录数据,进而在TensorBoard中显示,默认是False。
- `--tb_log_dir`: 指定 tb-paddle 记录数据的存储路径,默认是`tb_log_dir/scalar`
- `--fp16`: 是否使用混合精度训练模式(需GPU训练),默认是`False`
- `--loss_scale`: 设置混合精度训练模式中损失值的缩放比例,默认是`8.0`
## 使用示例
##### 例子
### 模型训练
- 边训练边测试
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
```
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval -d dataset/coco
```
可通过设置`--eval`在训练epoch中交替执行评估, 评估在每个snapshot\_iter时开始。可在配置文件的`snapshot_iter`处修改。
如果验证集很大,测试将会比较耗时,影响训练速度,建议减少评估次数,或训练完再进行评估。
当边训练边测试时,在每次snapshot\_iter会评测出最佳mAP模型保存到
`best_model`文件夹下,`best_model`的路径和`model_final`的路径相同。
在训练中交替执行评估, 评估在每个snapshot\_iter时开始。每次评估后还会评出最佳mAP模型保存到`best_model`文件夹下。
- 指定数据集路径
如果验证集很大,测试将会比较耗时,建议减少评估次数,或训练完再进行评估。
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
-d dataset/coco
```
- Fine-tune其他任务
使用预训练模型fine-tune其他任务时,在YAML配置文件中设置`finetune_exclude_pretrained_params`或在命令行中添加`-o finetune_exclude_pretrained_params`对预训练模型进行选择性加载。
使用预训练模型fine-tune其他任务时,可采用如下两种方式:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
1. 在YAML配置文件中设置`finetune_exclude_pretrained_params`
2. 在命令行中添加-o finetune\_exclude\_pretrained_params对预训练模型进行选择性加载。
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
-o pretrain_weights=output/faster_rcnn_r50_1x/model_final/ \
finetune_exclude_pretrained_params = ['cls_score','bbox_pred']
```
finetune_exclude_pretrained_params=['cls_score','bbox_pred']
```
##### 提示
详细说明请参考[Transfer Learning](TRANSFER_LEARNING_cn.md)
#### 提示
- `CUDA_VISIBLE_DEVICES` 参数可以指定不同的GPU。例如: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU计算规则可以参考 [FAQ](#faq)
- 数据集默认存储在`dataset/coco`中(可配置)。
- 若本地未找到数据集,将自动下载数据集并保存在`~/.cache/paddle/dataset`中。
- 预训练模型自动下载并保存在`〜/.cache/paddle/weights`中。
- 模型checkpoints默认保存在`output`中(可配置)。
- 进行模型fine-tune时,用户可将`pretrain_weights`配置为PaddlePaddle发布的模型,加载模型时finetune_exclude_pretrained_params中的字段匹配的参数不被加载,可以为通配符匹配方式。详细说明请参考[Transfer Learning](TRANSFER_LEARNING_cn.md)
- 更多参数配置,请参考[配置文件](../configs)
- RCNN系列模型CPU训练在PaddlePaddle 1.5.1及以下版本暂不支持,将在下个版本修复。
- 模型checkpoints默认保存在`output`中,可通过修改配置文件中save_dir进行配置。
- RCNN系列模型CPU训练在PaddlePaddle 1.5.1及以下版本暂不支持。
### 混合精度训练
## 评估
通过设置 `--fp16` 命令行选项可以启用混合精度训练。目前混合精度训练已经在Faster-FPN, Mask-FPN 及 Yolov3 上进行验证,几乎没有精度损失(小于0.2 mAP)。
建议使用多进程方式来进一步加速混合精度训练。示例如下。
```bash
# GPU评估
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
python -m paddle.distributed.launch --selected_gpus 0,1,2,3,4,5,6,7 tools/train.py --fp16 -c configs/faster_rcnn_r50_fpn_1x.yml
```
#### 可选参数
如果训练过程中loss出现`NaN`,请尝试调节`--loss_scale`选项数值,细节请参看混合精度训练相关的[Nvidia文档](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#mptrain)
- `-d` or `--dataset_dir`: 数据集路径, 同配置文件里的`dataset_dir`。例如: `-d dataset/coco`
- `--output_eval`: 这个参数可以编辑评测保存json路径, 默认是当前目录。
- `-o`: 设置配置文件里的参数内容。 例如: `-o weights=output/faster_rcnn_r50_1x/model_final`
- `--json_eval`: 是否通过已存在的bbox.json或者mask.json进行评估。默认是`False`。json文件路径通过`-f`指令来设置。
另外,请注意将配置文件中的 `norm_type``affine_channel` 改为 `bn`
#### 例子
- 指定数据集路径
```bash
# GPU评估
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
-o weights=output/faster_rcnn_r50_1x/model_final \
### 模型评估
- 指定权重和数据集路径
```bash
export CUDA_VISIBLE_DEVICES=0
python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar \
-d dataset/coco
```
```
评估模型可以为本地路径,例如`output/faster_rcnn_r50_1x/model_final/`, 也可以为[MODEL_ZOO](MODEL_ZOO_cn.md)中给出的模型链接。
- 通过json文件评估
```bash
# GPU评估
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
```bash
export CUDA_VISIBLE_DEVICES=0
python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
--json_eval \
-f evaluation/
```
--output_eval evaluation/
```
json文件必须命名为bbox.json或者mask.json,放在`evaluation/`目录下,或者不加`-f`参数,默认为当前目录
json文件必须命名为bbox.json或者mask.json,放在`evaluation/`目录下
#### 提示
- 默认从`output`加载checkpoint(可配置)
- R-CNN和SSD模型目前暂不支持多GPU评估,将在后续版本支持
## 推断
- 单图片推断
```bash
# GPU推断
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
```
- 多图片推断
```bash
# GPU推断
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
```
#### 可选参数
- `--output_dir`: 输出推断后可视化文件。
- `--draw_threshold`: 设置推断的阈值。默认是0.5.
- `--save_inference_model`: 设为`True`时,将预测模型保存到output\_dir中.
- `--use_tb`: 是否使用[tb-paddle](https://github.com/linshuliang/tb-paddle)记录数据,进而在TensorBoard中显示,默认是False。
- `--tb_log_dir`: 指定 tb-paddle 记录数据的存储路径,默认是`tb_log_dir/image`
#### 例子
### 模型推断
- 设置输出路径 && 设置推断阈值
```bash
# GPU推断
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
```bash
export CUDA_VISIBLE_DEVICES=0
python -u tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
--infer_img=demo/000000570688.jpg \
--output_dir=infer_output/ \
--draw_threshold=0.5 \
-o weights=output/faster_rcnn_r50_1x/model_final \
--use_tb=True
```
```
可视化文件默认保存在`output`中,可通过`--output_dir=`指定不同的输出路径。
`--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,
不同阈值会产生不同的结果。如果用户需要对自定义路径的模型进行推断,可以设置`-o weights`指定模型路径。
`--use_tb`是个可选参数,当为`True`时,可使用 TensorBoard 来可视化参数的变化趋势和图片。
- 保存推断模型
```bash
# GPU推断
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg \
--save_inference_model
```
通过设置`--save_inference_model`保存可供PaddlePaddle预测库加载的推断模型。
`--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,
不同阈值会产生不同的结果。如果用户需要对自定义路径的模型进行推断,可以设置`-o weights`指定模型路径。
## FAQ
......@@ -227,3 +167,7 @@ batch size可以达到每GPU 4 (Tesla V100 16GB)。
**Q:** 如何修改数据预处理? </br>
**A:** 可在配置文件中设置 `sample_transform`。注意需要在配置文件中加入**完整预处理**
例如RCNN模型中`DecodeImage`, `NormalizeImage` and `Permute`。更多详细描述请参考[配置案例](config_example)
**Q:** affine_channel和batch norm是什么关系?
**A:** 在RCNN系列模型加载预训练模型初始化,有时候会固定住batch norm的参数, 使用预训练模型中的全局均值和方式,并且batch norm的scale和bias参数不更新,已发布的大多ResNet系列的RCNN模型采用这种方式。这种情况下可以在config中设置norm_type为bn或affine_channel, freeze_norm为true (默认为true),两种方式等价。affne_channel的计算方式为`scale * x + bias`。只不过设置affine_channel时,内部对batch norm的参数自动做了融合。如果训练使用的affine_channel,用保存的模型做初始化,训练其他任务时,即可使用affine_channel, 也可使用batch norm, 参数均可正确加载。
English | [简体中文](INSTALL_cn.md)
# Installation
---
......@@ -36,7 +38,7 @@ python -c "import paddle; print(paddle.__version__)"
### Requirements:
- Python2 or Python3
- Python2 or Python3 (Only support Python3 for windows)
- CUDA >= 8.0
- cuDNN >= 5.0
- nccl >= 2.1.2
......@@ -58,6 +60,12 @@ COCO-API is needed for running. Installation is as follows:
# not to install the COCO API into global site-packages
python setup.py install --user
**Installation of COCO-API in windows:**
# if cython is not installed
pip install Cython
# Because the origin version of cocoapi does not support windows, another version is used which only supports Python3
pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
## PaddleDetection
......
......@@ -35,7 +35,7 @@ python -c "import paddle; print(paddle.__version__)"
### 环境需求:
- Python2 or Python3
- Python2 or Python3 (windows系统仅支持Python3)
- CUDA >= 8.0
- cuDNN >= 5.0
- nccl >= 2.1.2
......@@ -56,6 +56,12 @@ python -c "import paddle; print(paddle.__version__)"
# 若您没有权限或更倾向不安装至全局site-packages
python setup.py install --user
**windows用户安装COCO-API方式:**
# 若Cython未安装,请安装Cython
pip install Cython
# 由于原版cocoapi不支持windows,采用第三方实现版本,该版本仅支持Python3
pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
## PaddleDetection
......
English | [简体中文](MODEL_ZOO_cn.md)
# Model Zoo and Benchmark
## Environment
......@@ -76,6 +78,7 @@ The backbone models pretrained on ImageNet are available. All backbone models ar
| ResNet50-FPN | Cascade Faster | c3-c5 | 2 | 1x | - | 44.2 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_fpn_1x.tar) |
| ResNet101-vd-FPN | Cascade Faster | c3-c5 | 2 | 1x | - | 46.4 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r101_vd_fpn_1x.tar) |
| ResNeXt101-vd-FPN | Cascade Faster | c3-c5 | 2 | 1x | - | 47.3 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x.tar) |
| SENet154-vd-FPN | Cascade Mask | c3-c5 | 1 | 1.44x | - | 51.9 | 43.9 | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x.tar) |
#### Notes:
- Deformable ConvNets v2(dcn_v2) reference from [Deformable ConvNets v2](https://arxiv.org/abs/1811.11168).
......@@ -155,3 +158,8 @@ results of image size 608/416/320 above.
**NOTE**: MobileNet-SSD is trained in 2 GPU with totoal batch size as 64 and trained 120 epoches. VGG-SSD is trained in 4 GPU with total batch size as 32 and trained 240 epoches. SSD training data augmentations: randomly color distortion,
randomly cropping, randomly expansion, randomly flipping.
## Face Detection
Please refer [face detection models](../configs/face_detection) for details.
......@@ -75,6 +75,7 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
| ResNet50-FPN | Cascade Faster | c3-c5 | 2 | 1x | - | 44.2 | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_fpn_1x.tar) |
| ResNet101-vd-FPN | Cascade Faster | c3-c5 | 2 | 1x | - | 46.4 | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r101_vd_fpn_1x.tar) |
| ResNeXt101-vd-FPN | Cascade Faster | c3-c5 | 2 | 1x | - | 47.3 | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x.tar) |
| SENet154-vd-FPN | Cascade Mask | c3-c5 | 1 | 1.44x | - | 51.9 | 43.9 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x.tar) |
#### 注意事项:
- Deformable卷积网络v2(dcn_v2)参考自论文[Deformable ConvNets v2](https://arxiv.org/abs/1811.11168).
......@@ -149,3 +150,7 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
| VGG16 | 512 | 8 | 240e | 65.975 | 80.2 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ssd_vgg16_512_voc.tar) |
**注意事项:** MobileNet-SSD在2卡,总batch size为64下训练120周期。VGG-SSD在总batch size为32下训练240周期。数据增强包括:随机颜色失真,随机剪裁,随机扩张,随机翻转。
## 人脸检测
详细请参考[人脸检测模型](../configs/face_detection).
......@@ -2,7 +2,7 @@ English | [简体中文](QUICK_STARTED_cn.md)
# Quick Start
This tutorial fine-tunes a tiny dataset by pretrained detection model for users to get a model and learn PaddleDetection quickly. The model can be trained in around 15min with good performance.
This tutorial fine-tunes a tiny dataset by pretrained detection model for users to get a model and learn PaddleDetection quickly. The model can be trained in around 20min with good performance.
## Data Preparation
......
......@@ -2,7 +2,7 @@
# 快速开始
为了使得用户能够在很短的时间内快速产出模型,掌握PaddleDetection的使用方式,这篇教程通过一个预训练检测模型对小数据集进行finetune。在P40上单卡大约15min即可产出一个效果不错的模型。
为了使得用户能够在很短的时间内快速产出模型,掌握PaddleDetection的使用方式,这篇教程通过一个预训练检测模型对小数据集进行finetune。在P40上单卡大约20min即可产出一个效果不错的模型。
## 数据准备
......
English | [简体中文](TRANSFER_LEARNING_cn.md)
# Transfer Learning
Transfer learning aims at learning new knowledge from existing knowledge. For example, take pretrained model from ImageNet to initialize detection models, or take pretrained model from COCO dataset to initialize train detection models in PascalVOC dataset.
......@@ -6,7 +8,10 @@ In transfer learning, if different dataset and the number of classes is used, th
## Transfer Learning in PaddleDetection
In transfer learning, it's needed to load pretrained model selectively. Set `finetune_exclude_pretrained_params` in YAML configuration files or set `-o finetune_exclude_pretrained_params` in command line.
In transfer learning, it's needed to load pretrained model selectively. The following two methods can be used:
1. Set `finetune_exclude_pretrained_params` in YAML configuration files. Please refer to [configure file](../configs/yolov3_mobilenet_v1_fruit.yml#L15)
2. Set -o finetune_exclude_pretrained_params in command line. For example:
```python
export PYTHONPATH=$PYTHONPATH:.
......
......@@ -6,7 +6,10 @@
## PaddleDetection进行迁移学习
在迁移学习中,对预训练模型进行选择性加载,可通过在 YMAL 配置文件中通过设置 finetune_exclude_pretrained_params字段,也可通过在 train.py的启动参数中设置 -o finetune_exclude_pretrained_params。
在迁移学习中,对预训练模型进行选择性加载,可通过如下两种方式实现:
1. 在 YMAL 配置文件中通过设置`finetune_exclude_pretrained_params`字段。可参考[配置文件](../configs/yolov3_mobilenet_v1_fruit.yml#L15)
2. 在 train.py的启动参数中设置 -o finetune_exclude_pretrained_params。例如:
```python
export PYTHONPATH=$PYTHONPATH:.
......
cmake_minimum_required(VERSION 3.0)
project(cpp_inference_demo CXX C)
message("cmake module path: ${CMAKE_MODULE_PATH}")
message("cmake root path: ${CMAKE_ROOT}")
option(WITH_MKL "Compile demo with MKL/OpenBlas support,defaultuseMKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." ON)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
SET(PADDLE_DIR "" CACHE PATH "Location of libraries")
SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
SET(CUDA_LIB "" CACHE PATH "Location of libraries")
include(external-cmake/yaml-cpp.cmake)
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
if (WITH_MKL)
ADD_DEFINITIONS(-DUSE_MKL)
endif()
if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
endif()
if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
endif()
include_directories("${CMAKE_SOURCE_DIR}/")
include_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/src/ext-yaml-cpp/include")
include_directories("${PADDLE_DIR}/")
include_directories("${PADDLE_DIR}/third_party/install/protobuf/include")
include_directories("${PADDLE_DIR}/third_party/install/glog/include")
include_directories("${PADDLE_DIR}/third_party/install/gflags/include")
include_directories("${PADDLE_DIR}/third_party/install/xxhash/include")
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/include")
include_directories("${PADDLE_DIR}/third_party/install/snappy/include")
endif()
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/include")
include_directories("${PADDLE_DIR}/third_party/install/snappystream/include")
endif()
include_directories("${PADDLE_DIR}/third_party/install/zlib/include")
include_directories("${PADDLE_DIR}/third_party/boost")
include_directories("${PADDLE_DIR}/third_party/eigen3")
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
link_directories("${PADDLE_DIR}/third_party/install/snappy/lib")
endif()
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
link_directories("${PADDLE_DIR}/third_party/install/snappystream/lib")
endif()
link_directories("${PADDLE_DIR}/third_party/install/zlib/lib")
link_directories("${PADDLE_DIR}/third_party/install/protobuf/lib")
link_directories("${PADDLE_DIR}/third_party/install/glog/lib")
link_directories("${PADDLE_DIR}/third_party/install/gflags/lib")
link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib")
link_directories("${PADDLE_DIR}/paddle/lib/")
link_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/lib")
link_directories("${CMAKE_CURRENT_BINARY_DIR}")
if (WIN32)
include_directories("${PADDLE_DIR}/paddle/fluid/inference")
link_directories("${PADDLE_DIR}/paddle/fluid/inference")
include_directories("${OPENCV_DIR}/build/include")
include_directories("${OPENCV_DIR}/opencv/build/include")
link_directories("${OPENCV_DIR}/build/x64/vc14/lib")
else ()
include_directories("${PADDLE_DIR}/paddle/include")
link_directories("${PADDLE_DIR}/paddle/lib")
include_directories("${OPENCV_DIR}/include")
link_directories("${OPENCV_DIR}/lib")
endif ()
if (WIN32)
add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
if (WITH_STATIC_LIB)
safe_set_static_flag()
add_definitions(-DSTATIC_LIB)
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -o2 -std=c++11")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
endif()
# TODO let users define cuda lib path
if (WITH_GPU)
if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64")
endif()
if (NOT WIN32)
if (NOT DEFINED CUDNN_LIB)
message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64")
endif()
endif(NOT WIN32)
endif()
if (NOT WIN32)
if (USE_TENSORRT AND WITH_GPU)
include_directories("${PADDLE_DIR}/third_party/install/tensorrt/include")
link_directories("${PADDLE_DIR}/third_party/install/tensorrt/lib")
endif()
endif(NOT WIN32)
if (NOT WIN32)
set(NGRAPH_PATH "${PADDLE_DIR}/third_party/install/ngraph")
if(EXISTS ${NGRAPH_PATH})
include(GNUInstallDirs)
include_directories("${NGRAPH_PATH}/include")
link_directories("${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}")
set(NGRAPH_LIB ${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}/libngraph${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if(WITH_MKL)
include_directories("${PADDLE_DIR}/third_party/install/mklml/include")
if (WIN32)
set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.lib
${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.lib)
else ()
set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
endif ()
set(MKLDNN_PATH "${PADDLE_DIR}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
if (WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
else ()
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif ()
endif()
else()
set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
if(WITH_STATIC_LIB)
if (WIN32)
set(DEPS
${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else ()
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
else()
if (WIN32)
set(DEPS
${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else ()
set(DEPS
${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if (NOT WIN32)
set(EXTERNAL_LIB "-lrt -ldl -lpthread")
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf yaml-cpp z xxhash
${EXTERNAL_LIB})
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
set(DEPS ${DEPS} snappystream)
endif()
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
set(DEPS ${DEPS} snappy)
endif()
else()
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
opencv_world346 glog libyaml-cppmt gflags_static libprotobuf zlibstatic xxhash ${EXTERNAL_LIB})
set(DEPS ${DEPS} libcmt shlwapi)
if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
set(DEPS ${DEPS} snappy)
endif()
if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
set(DEPS ${DEPS} snappystream)
endif()
endif(NOT WIN32)
if(WITH_GPU)
if(NOT WIN32)
if (USE_TENSORRT)
set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
if (NOT WIN32)
set(OPENCV_LIB_DIR ${OPENCV_DIR}/lib)
if(EXISTS "${OPENCV_LIB_DIR}")
message("OPENCV_LIB:" ${OPENCV_LIB_DIR})
else()
set(OPENCV_LIB_DIR ${OPENCV_DIR}/lib64)
message("OPENCV_LIB:" ${OPENCV_LIB_DIR})
endif()
set(OPENCV_3RD_LIB_DIR ${OPENCV_DIR}/share/OpenCV/3rdparty/lib)
if(EXISTS "${OPENCV_3RD_LIB_DIR}")
message("OPENCV_3RD_LIB_DIR:" ${OPENCV_3RD_LIB_DIR})
else()
set(OPENCV_3RD_LIB_DIR ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64)
message("OPENCV_3RD_LIB_DIR:" ${OPENCV_3RD_LIB_DIR})
endif()
set(DEPS ${DEPS} ${OPENCV_LIB_DIR}/libopencv_imgcodecs${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_LIB_DIR}/libopencv_imgproc${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_LIB_DIR}/libopencv_core${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_LIB_DIR}/libopencv_highgui${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/libIlmImf${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/liblibjasper${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/liblibpng${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/liblibtiff${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/libittnotify${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/liblibjpeg-turbo${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/liblibwebp${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/libzlib${CMAKE_STATIC_LIBRARY_SUFFIX})
if(EXISTS "${OPENCV_3RD_LIB_DIR}/libippiw${CMAKE_STATIC_LIBRARY_SUFFIX}")
set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/libippiw${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
if(EXISTS "${OPENCV_3RD_LIB_DIR}/libippicv${CMAKE_STATIC_LIBRARY_SUFFIX}")
set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/libippicv${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
# message(${CMAKE_CXX_FLAGS})
# set(CMAKE_CXX_FLAGS "-g ${CMAKE_CXX_FLAGS}")
SET(PADDLESEG_INFERENCE_SRCS preprocessor/preprocessor.cpp
preprocessor/preprocessor_detection.cpp predictor/detection_predictor.cpp
utils/detection_result.pb.cc)
ADD_LIBRARY(libpaddleseg_inference STATIC ${PADDLESEG_INFERENCE_SRCS})
target_link_libraries(libpaddleseg_inference ${DEPS})
add_executable(detection_demo detection_demo.cpp)
ADD_DEPENDENCIES(libpaddleseg_inference ext-yaml-cpp)
ADD_DEPENDENCIES(detection_demo ext-yaml-cpp libpaddleseg_inference)
target_link_libraries(detection_demo ${DEPS} libpaddleseg_inference)
if (WIN32)
add_custom_command(TARGET detection_demo POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll
)
endif()
execute_process(COMMAND cp -r ${CMAKE_SOURCE_DIR}/images ${CMAKE_SOURCE_DIR}/conf ${CMAKE_CURRENT_BINARY_DIR})
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# PaddleDetection C++预测部署方案
## 本文档结构
[1.说明](#1说明)
[2.主要目录和文件](#2主要目录和文件)
[3.编译](#3编译)
[4.预测并可视化结果](#4预测并可视化结果)
## 1.说明
本目录提供一个跨平台的图像检测模型的C++预测部署方案,用户通过一定的配置,加上少量的代码,即可把模型集成到自己的服务中,完成相应的图像检测任务。
主要设计的目标包括以下四点:
- 跨平台,支持在 Windows 和 Linux 完成编译、开发和部署
- 可扩展性,支持用户针对新模型开发自己特殊的数据预处理等逻辑
- 高性能,除了`PaddlePaddle`自身带来的性能优势,我们还针对图像检测的特点对关键步骤进行了性能优化
- 支持多种常见的图像检测模型,如YOLOv3, Faster-RCNN, Faster-RCNN+FPN,用户通过少量配置即可加载模型完成常见检测任务
## 2.主要目录和文件
```bash
deploy
├── detection_demo.cpp # 完成图像检测预测任务C++代码
├── conf
│ ├── detection_rcnn.yaml #示例faster rcnn 目标检测配置
│ └── detection_rcnn_fpn.yaml #示例faster rcnn + fpn目标检测配置
├── images
│ └── detection_rcnn # 示例faster rcnn + fpn目标检测测试图片目录
├── tools
│ └── vis.py # 示例图像检测结果可视化脚本
├── docs
│ ├── linux_build.md # Linux 编译指南
│ ├── windows_vs2015_build.md # windows VS2015编译指南
│ └── windows_vs2019_build.md # Windows VS2019编译指南
├── utils # 一些基础公共函数
├── preprocess # 数据预处理相关代码
├── predictor # 模型加载和预测相关代码
├── CMakeList.txt # cmake编译入口文件
└── external-cmake # 依赖的外部项目cmake(目前仅有yaml-cpp)
```
## 3.编译
支持在`Windows``Linux`平台编译和使用:
- [Linux 编译指南](./docs/linux_build.md)
- [Windows 使用 Visual Studio 2019 Community 编译指南](./docs/windows_vs2019_build.md)
- [Windows 使用 Visual Studio 2015 编译指南](./docs/windows_vs2015_build.md)
`Windows`上推荐使用最新的`Visual Studio 2019 Community`直接编译`CMake`项目。
## 4.预测并可视化结果
完成编译后,便生成了需要的可执行文件和链接库。这里以我们基于`faster rcnn`检测模型为例,介绍部署图像检测模型的通用流程。
### 1. 下载模型文件
我们提供faster rcnn,faster rcnn+fpn模型用于预测coco17数据集,可在以下链接下载:[faster rcnn示例模型下载地址](https://paddleseg.bj.bcebos.com/inference/faster_rcnn_pp50.zip)
[faster rcnn + fpn示例模型下载地址](https://paddleseg.bj.bcebos.com/inference/faster_rcnn_pp50_fpn.zip)
下载并解压,解压后目录结构如下:
```
faster_rcnn_pp50/
├── __model__ # 模型文件
└── __params__ # 参数文件
```
解压后把上述目录拷贝到合适的路径:
**假设**`Windows`系统上,我们模型和参数文件所在路径为`D:\projects\models\faster_rcnn_pp50`
**假设**`Linux`上对应的路径则为`/root/projects/models/faster_rcnn_pp50/`
### 2. 修改配置
`inference`源代码(即本目录)的`conf`目录下提供了示例基于faster rcnn的配置文件`detection_rcnn.yaml`, 相关的字段含义和说明如下:
```yaml
DEPLOY:
# 是否使用GPU预测
USE_GPU: 1
# 模型和参数文件所在目录路径
MODEL_PATH: "/root/projects/models/faster_rcnn_pp50"
# 模型文件名
MODEL_FILENAME: "__model__"
# 参数文件名
PARAMS_FILENAME: "__params__"
# 预测图片的标准输入,尺寸不一致会resize
EVAL_CROP_SIZE: (608, 608)
# resize方式,支持 UNPADDING和RANGE_SCALING
RESIZE_TYPE: "RANGE_SCALING"
# 短边对齐的长度,仅在RANGE_SCALING下有效
TARGET_SHORT_SIZE : 800
# 均值
MEAN: [0.4647, 0.4647, 0.4647]
# 方差
STD: [0.0834, 0.0834, 0.0834]
# 图片类型, rgb或者rgba
IMAGE_TYPE: "rgb"
# 像素分类数
NUM_CLASSES: 1
# 通道数
CHANNELS : 3
# 预处理器, 目前提供图像检测的通用处理类DetectionPreProcessor
PRE_PROCESSOR: "DetectionPreProcessor"
# 预测模式,支持 NATIVE 和 ANALYSIS
PREDICTOR_MODE: "ANALYSIS"
# 每次预测的 batch_size
BATCH_SIZE : 3
# 长边伸缩的最大长度,-1代表无限制。
RESIZE_MAX_SIZE: 1333
# 输入的tensor数量。
FEEDS_SIZE: 3
```
修改字段`MODEL_PATH`的值为你在**上一步**下载并解压的模型文件所放置的目录即可。更多配置文件字段介绍,请参考文档[预测部署方案配置文件说明](./docs/configuration.md)
### 3. 执行预测
在终端中切换到生成的可执行文件所在目录为当前目录(Windows系统为`cmd`)。
`Linux` 系统中执行以下命令:
```shell
./detection_demo --conf=conf/detection_rcnn.yaml --input_dir=images/detection_rcnn
```
`Windows` 中执行以下命令:
```shell
.\detection_demo.exe --conf=conf\detection_rcnn.yaml --input_dir=images\detection_rcnn\
```
预测使用的两个命令参数说明如下:
| 参数 | 含义 |
|-------|----------|
| conf | 模型配置的Yaml文件路径 |
| input_dir | 需要预测的图片目录 |
·
配置文件说明请参考上一步,样例程序会扫描input_dir目录下的所有图片,并为每一张图片生成对应的预测结果,输出到屏幕,并在`X`同一目录下保存到`X.pb文件`(X为对应图片的文件名)。可使用工具脚本vis.py将检测结果可视化。
**检测结果可视化**
运行可视化脚本时,只需输入命令行参数图片路径、检测结果pb文件路径、目标框阈值以及类别-标签映射文件路径即可得到可视化的图片`X.png` (tools目录下提供coco17的类别标签映射文件coco17.json)。
```bash
python vis.py --img_path=../build/images/detection_rcnn/000000087038.jpg --img_result_path=../build/images/detection_rcnn/000000087038.jpg.pb --threshold=0.1 --c2l_path=coco17.json
```
检测结果(每个图片的结果用空行隔开)
```原图:```
![原图](./demo_images/000000087038.jpg)
```检测结果图:```
![检测结果](./demo_images/000000087038.jpg.png)
DEPLOY:
USE_GPU: 1
MODEL_PATH: "/root/projects/models/faster_rcnn_pp50"
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
EVAL_CROP_SIZE: (608, 608)
RESIZE_TYPE: "RANGE_SCALING"
TARGET_SHORT_SIZE : 800
MEAN: [0.485, 0.456, 0.406]
STD: [0.229, 0.224, 0.225]
IMAGE_TYPE: "rgb"
NUM_CLASSES: 1
CHANNELS : 3
PRE_PROCESSOR: "DetectionPreProcessor"
PREDICTOR_MODE: "ANALYSIS"
BATCH_SIZE : 3
RESIZE_MAX_SIZE: 1333
FEEDS_SIZE: 3
DEPLOY:
USE_GPU: 1
MODEL_PATH: "/root/projects/models/faster_rcnn_pp50_fpn"
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
EVAL_CROP_SIZE: (608, 608)
RESIZE_TYPE: "RANGE_SCALING"
TARGET_SHORT_SIZE : 800
MEAN: [0.485, 0.456, 0.406]
STD: [0.229, 0.224, 0.225]
IMAGE_TYPE: "rgb"
NUM_CLASSES: 1
CHANNELS : 3
PRE_PROCESSOR: "DetectionPreProcessor"
PREDICTOR_MODE: "ANALYSIS"
BATCH_SIZE : 1
RESIZE_MAX_SIZE: 1333
FEEDS_SIZE: 3
COARSEST_STRIDE: 32
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <glog/logging.h>
#include <utils/utils.h>
#include <predictor/detection_predictor.h>
DEFINE_string(conf, "", "Configuration File Path");
DEFINE_string(input_dir, "", "Directory of Input Images");
int main(int argc, char** argv) {
// 0. parse args
google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_conf.empty() || FLAGS_input_dir.empty()) {
std::cout << "Usage: ./predictor --conf=/config/path/to/your/model --input_dir=/directory/of/your/input/images";
return -1;
}
// 1. create a predictor and init it with conf
PaddleSolution::DetectionPredictor predictor;
if (predictor.init(FLAGS_conf) != 0) {
LOG(FATAL) << "Fail to init predictor";
return -1;
}
// 2. get all the images with extension '.jpeg' at input_dir
auto imgs = PaddleSolution::utils::get_directory_images(FLAGS_input_dir, ".jpeg|.jpg|.JPEG|.JPG|.bmp|.BMP|.png|.PNG");
// 3. predict
predictor.predict(imgs);
return 0;
}
# 预测部署方案配置文件说明
## 基本概念
预测部署方案的配置文件旨在给用户提供一个预测部署方案定制化接口。用户仅需理解该配置文件相关字段的含义,无需编写任何代码,即可定制化预测部署方案。为了更好地表达每个字段的含义,首先介绍配置文件中字段的类型。
### 字段类型
- **required**: 表明该字段必须显式定义,否则无法正常启动预测部署程序。
- **optional**: 表明该字段可忽略不写,预测部署系统会提供默认值,相关默认值将在下文介绍。
### 字段值类型
- **int**:表明该字段必须赋予整型类型的值。
- **string**:表明该字段必须赋予字符串类型的值。
- **list**:表明该字段必须赋予列表的值。
- **tuple**: 表明该字段必须赋予双元素元组的值。
## 字段介绍
```yaml
# 预测部署时所有配置字段需在DEPLOY字段下
DEPLOY:
# 类型:required int
# 含义:是否使用GPU预测。 0:不使用 1:使用
USE_GPU: 1
# 类型:required string
# 含义:模型和参数文件所在目录
MODEL_PATH: "/path/to/model_directory"
# 类型:required string
# 含义:模型文件名
MODEL_FILENAME: "__model__"
# 类型:required string
# 含义:参数文件名
PARAMS_FILENAME: "__params__"
# 类型:optional string
# 含义:图像resize的类型。支持 UNPADDING 和 RANGE_SCALING模式。默认是UNPADDING模式。
RESIZE_TYPE: "UNPADDING"
# 类型:required tuple
# 含义:当使用UNPADDING模式时,会将图像直接resize到该尺寸。
EVAL_CROP_SIZE: (513, 513)
# 类型:optional int
# 含义:当使用RANGE_SCALING模式时,图像短边需要对齐该字段的值,长边会同比例
# 的缩放,从而在保持图像长宽比例不变的情况下resize到新的尺寸。默认值为0。
TARGET_SHORT_SIZE: 800
# 类型:optional int
# 含义: 当使用RANGE_SCALING模式时,长边不能缩放到比该字段的值大。默认值为0。
RESIZE_MAX_SIZE: 1333
# 类型:required list
# 含义:图像进行归一化预处理时的均值
MEAN: [104.008, 116.669, 122.675]
# 类型:required list
# 含义:图像进行归一化预处理时的方差
STD: [1.0, 1.0, 1.0]
# 类型:string
# 含义:图片类型, rgb 或者 rgba
IMAGE_TYPE: "rgb"
# 类型:required int
# 含义:图像分类类型数
NUM_CLASSES: 2
# 类型:required int
# 含义:图片通道数
CHANNELS : 3
# 类型:required string
# 含义:预处理方式,目前提供图像检测的通用预处理类DetectionPreProcessor.
PRE_PROCESSOR: "DetectionPreProcessor"
# 类型:required string
# 含义:预测模式,支持 NATIVE 和 ANALYSIS
PREDICTOR_MODE: "ANALYSIS"
# 类型:required int
# 含义:每次预测的 batch_size
BATCH_SIZE : 3
# 类型:optional int
# 含义: 输入张量的个数。大部分模型不需要设置。 默认值为1.
FEEDS_SIZE: 2
# 类型: optional int
# 含义: 将图像的边变为该字段的值的整数倍。默认值为1。
COARSEST_STRIDE: 32
```
# Linux平台 编译指南
## 说明
本文档在 `Linux`平台使用`GCC 4.8.5``GCC 4.9.4`测试过,如果需要使用更高G++版本编译使用,则需要重新编译Paddle预测库,请参考: [从源码编译Paddle预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_usage/deploy/inference/build_and_install_lib_cn.html#id15)
## 前置条件
* G++ 4.8.2 ~ 4.9.4
* CUDA 8.0/ CUDA 9.0
* CMake 3.0+
请确保系统已经安装好上述基本软件,**下面所有示例以工作目录为 `/root/projects/`演示**
### Step1: 下载代码
1. `mkdir -p /root/projects/paddle_models && cd /root/projects/paddle_models`
2. `git clone https://github.com/PaddlePaddle/models.git`
`C++`预测代码在`/root/projects/paddle_models/models/PaddleCV/PaddleDetection/inference` 目录,该目录不依赖任何`PaddleDetection`下其他目录。
### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
目前仅支持`CUDA 8``CUDA 9`,请点击 [PaddlePaddle预测库下载地址](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_usage/deploy/inference/build_and_install_lib_cn.html)下载对应的版本(develop版本)。
下载并解压后`/root/projects/fluid_inference`目录包含内容为:
```
fluid_inference
├── paddle # paddle核心库和头文件
|
├── third_party # 第三方依赖库和头文件
|
└── version.txt # 版本和编译信息
```
### Step3: 安装配置OpenCV
```shell
# 0. 切换到/root/projects目录
cd /root/projects
# 1. 下载OpenCV3.4.6版本源代码
wget -c https://paddleseg.bj.bcebos.com/inference/opencv-3.4.6.zip
# 2. 解压
unzip opencv-3.4.6.zip && cd opencv-3.4.6
# 3. 创建build目录并编译, 这里安装到/usr/local/opencv3目录
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/root/projects/opencv3 -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DWITH_IPP=OFF -DBUILD_IPP_IW=OFF -DWITH_LAPACK=OFF -DWITH_EIGEN=OFF -DCMAKE_INSTALL_LIBDIR=lib64 -DWITH_ZLIB=ON -DBUILD_ZLIB=ON -DWITH_JPEG=ON -DBUILD_JPEG=ON -DWITH_PNG=ON -DBUILD_PNG=ON -DWITH_TIFF=ON -DBUILD_TIFF=ON
make -j4
make install
```
**注意:** 上述操作完成后,`opencv` 被安装在 `/root/projects/opencv3` 目录。
### Step4: 编译
`CMake`编译时,涉及到四个编译参数用于指定核心依赖库的路径, 他们的定义如下:
| 参数名 | 含义 |
| ---- | ---- |
| CUDA_LIB | cuda的库路径 |
| CUDNN_LIB | cuDnn的库路径|
| OPENCV_DIR | OpenCV的安装路径, |
| PADDLE_DIR | Paddle预测库的路径 |
执行下列操作时,**注意**把对应的参数改为你的上述依赖库实际路径:
```shell
cd /root/projects/paddle_models/models/PaddleCV/PaddleDetection/inference
mkdir build && cd build
cmake .. -DWITH_GPU=ON -DPADDLE_DIR=/root/projects/fluid_inference -DCUDA_LIB=/usr/local/cuda/lib64/ -DOPENCV_DIR=/root/projects/opencv3/ -DCUDNN_LIB=/usr/local/cuda/lib64/
make
```
### Step5: 预测及可视化
执行命令:
```
./detection_demo --conf=/path/to/your/conf --input_dir=/path/to/your/input/data/directory
```
更详细说明请参考ReadMe文档: [预测和可视化部分](../README.md)
# Windows平台使用 Visual Studio 2015 编译指南
本文档步骤,我们同时在`Visual Studio 2015``Visual Studio 2019 Community` 两个版本进行了测试,我们推荐使用[`Visual Studio 2019`直接编译`CMake`项目](./windows_vs2019_build.md)
## 前置条件
* Visual Studio 2015
* CUDA 8.0/ CUDA 9.0
* CMake 3.0+
请确保系统已经安装好上述基本软件,**下面所有示例以工作目录为 `D:\projects`演示**
### Step1: 下载代码
1. 打开`cmd`, 执行 `cd D:\projects\paddle_models`
2. `git clone https://github.com/PaddlePaddle/models.git`
`C++`预测库代码在`D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference` 目录,该目录不依赖任何`PaddleDetection`下其他目录。
### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
根据Windows环境,下载相应版本的PaddlePaddle预测库,并解压到`D:\projects\`目录
| CUDA | GPU | 下载地址 |
|------|------|--------|
| 8.0 | Yes | [fluid_inference.zip](https://bj.bcebos.com/v1/paddleseg/fluid_inference_win.zip) |
| 9.0 | Yes | [fluid_inference_cuda90.zip](https://paddleseg.bj.bcebos.com/fluid_inference_cuda9_cudnn7.zip) |
解压后`D:\projects\fluid_inference`目录包含内容为:
```
fluid_inference
├── paddle # paddle核心库和头文件
|
├── third_party # 第三方依赖库和头文件
|
└── version.txt # 版本和编译信息
```
### Step3: 安装配置OpenCV
1. 在OpenCV官网下载适用于Windows平台的3.4.6版本, [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)
2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\projects\opencv`
3. 配置环境变量,如下流程所示
- 我的电脑->属性->高级系统设置->环境变量
- 在系统变量中找到Path(如没有,自行创建),并双击编辑
- 新建,将opencv路径填入并保存,如`D:\projects\opencv\build\x64\vc14\bin`
### Step4: 以VS2015为例编译代码
以下命令需根据自己系统中各相关依赖的路径进行修改
* 调用VS2015, 请根据实际VS安装路径进行调整,打开cmd命令行工具执行以下命令
* 其他vs版本(比如vs2019),请查找到对应版本的`vcvarsall.bat`路径,替换本命令即可
```
call "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" amd64
```
* CMAKE编译工程
* PADDLE_DIR: fluid_inference预测库路径
* CUDA_LIB: CUDA动态库目录, 请根据实际安装情况调整
* OPENCV_DIR: OpenCV解压目录
```
# 切换到预测库所在目录
cd /d D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference
# 创建构建目录, 重新构建只需要删除该目录即可
mkdir build
cd build
# cmake构建VS项目
D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference\build> cmake .. -G "Visual Studio 14 2015 Win64" -DWITH_GPU=ON -DPADDLE_DIR=D:\projects\fluid_inference -DCUDA_LIB=D:\projects\cudalib\v9.0\lib\x64 -DOPENCV_DIR=D:\projects\opencv -T host=x64
```
这里的`cmake`参数`-G`, 表示生成对应的VS版本的工程,可以根据自己的`VS`版本调整,具体请参考[cmake文档](https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html)
* 生成可执行文件
```
D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference\build> msbuild /m /p:Configuration=Release cpp_inference_demo.sln
```
### Step5: 预测及可视化
上述`Visual Studio 2015`编译产出的可执行文件在`build\release`目录下,切换到该目录:
```
cd /d D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference\build\release
```
之后执行命令:
```
detection_demo.exe --conf=/path/to/your/conf --input_dir=/path/to/your/input/data/directory
```
更详细说明请参考ReadMe文档: [预测和可视化部分](../README.md)
# Visual Studio 2019 Community CMake 编译指南
Windows 平台下,我们使用`Visual Studio 2015``Visual Studio 2019 Community` 进行了测试。微软从`Visual Studio 2017`开始即支持直接管理`CMake`跨平台编译项目,但是直到`2019`才提供了稳定和完全的支持,所以如果你想使用CMake管理项目编译构建,我们推荐你使用`Visual Studio 2019`环境下构建。
你也可以使用和`VS2015`一样,通过把`CMake`项目转化成`VS`项目来编译,其中**有差别的部分**在文档中我们有说明,请参考:[使用Visual Studio 2015 编译指南](./windows_vs2015_build.md)
## 前置条件
* Visual Studio 2019
* CUDA 8.0/ CUDA 9.0
* CMake 3.0+
请确保系统已经安装好上述基本软件,我们使用的是`VS2019`的社区版。
**下面所有示例以工作目录为 `D:\projects`演示**
### Step1: 下载代码
1. 点击下载源代码:[下载地址](https://github.com/PaddlePaddle/models/archive/develop.zip)
2. 解压,解压后目录重命名为`paddle_models`
以下代码目录路径为`D:\projects\paddle_models` 为例。
### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
根据Windows环境,下载相应版本的PaddlePaddle预测库,并解压到`D:\projects\`目录
| CUDA | GPU | 下载地址 |
|------|------|--------|
| 8.0 | Yes | [fluid_inference.zip](https://bj.bcebos.com/v1/paddleseg/fluid_inference_win.zip) |
| 9.0 | Yes | [fluid_inference_cuda90.zip](https://paddleseg.bj.bcebos.com/fluid_inference_cuda9_cudnn7.zip) |
解压后`D:\projects\fluid_inference`目录包含内容为:
```
fluid_inference
├── paddle # paddle核心库和头文件
|
├── third_party # 第三方依赖库和头文件
|
└── version.txt # 版本和编译信息
```
**注意:** `CUDA90`版本解压后目录名称为`fluid_inference_cuda90`。
### Step3: 安装配置OpenCV
1. 在OpenCV官网下载适用于Windows平台的3.4.6版本, [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)
2. 运行下载的可执行文件,将OpenCV解压至指定目录,如`D:\projects\opencv`
3. 配置环境变量,如下流程所示
- 我的电脑->属性->高级系统设置->环境变量
- 在系统变量中找到Path(如没有,自行创建),并双击编辑
- 新建,将opencv路径填入并保存,如`D:\projects\opencv\build\x64\vc14\bin`
### Step4: 使用Visual Studio 2019直接编译CMake
1. 打开Visual Studio 2019 Community,点击`继续但无需代码`
![step2](https://paddleseg.bj.bcebos.com/inference/vs2019_step1.png)
2. 点击: `文件`->`打开`->`CMake`
![step2.1](https://paddleseg.bj.bcebos.com/inference/vs2019_step2.png)
选择项目代码所在路径,并打开`CMakeList.txt`:
![step2.2](https://paddleseg.bj.bcebos.com/inference/vs2019_step3.png)
3. 点击:`项目`->`cpp_inference_demo的CMake设置`
![step3](https://paddleseg.bj.bcebos.com/inference/vs2019_step4.png)
4. 点击`浏览`,分别设置编译选项指定`CUDA`、`OpenCV`、`Paddle预测库`的路径
![step4](https://paddleseg.bj.bcebos.com/inference/vs2019_step5.png)
三个编译参数的含义说明如下:
| 参数名 | 含义 |
| ---- | ---- |
| CUDA_LIB | cuda的库路径 |
| OPENCV_DIR | OpenCV的安装路径, |
| PADDLE_DIR | Paddle预测库的路径 |
**设置完成后**, 点击上图中`保存并生成CMake缓存以加载变量`。
5. 点击`生成`->`全部生成`
![step6](https://paddleseg.bj.bcebos.com/inference/vs2019_step6.png)
### Step5: 预测及可视化
上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release`目录下,打开`cmd`,并切换到该目录:
```
cd D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference\build\x64-Release
```
之后执行命令:
```
detection_demo.exe --conf=/path/to/your/conf --input_dir=/path/to/your/input/data/directory
```
更详细说明请参考ReadMe文档: [预测和可视化部分](../README.md)
find_package(Git REQUIRED)
include(ExternalProject)
message("${CMAKE_BUILD_TYPE}")
ExternalProject_Add(
ext-yaml-cpp
GIT_REPOSITORY https://github.com/jbeder/yaml-cpp.git
GIT_TAG e0e01d53c27ffee6c86153fa41e7f5e57d3e5c90
CMAKE_ARGS
-DYAML_CPP_BUILD_TESTS=OFF
-DYAML_CPP_BUILD_TOOLS=OFF
-DYAML_CPP_INSTALL=OFF
-DYAML_CPP_BUILD_CONTRIB=OFF
-DMSVC_SHARED_RT=OFF
-DBUILD_SHARED_LIBS=OFF
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
-DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
-DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}
-DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib
-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib
PREFIX "${CMAKE_BINARY_DIR}/ext/yaml-cpp"
# Disable install step
INSTALL_COMMAND ""
LOG_DOWNLOAD ON
)
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "detection_predictor.h"
#include <cstring>
#include <cmath>
#include <fstream>
#include "utils/detection_result.pb.h"
namespace PaddleSolution {
/* lod_buffer: every item in lod_buffer is an image matrix after preprocessing
* input_buffer: same data with lod_buffer after flattening to 1-D vector and padding, needed to be empty before using this function
*/
void padding_minibatch(const std::vector<std::vector<float>> &lod_buffer, std::vector<float> &input_buffer,
std::vector<int> &resize_heights, std::vector<int> &resize_widths, int channels, int coarsest_stride = 1) {
int batch_size = lod_buffer.size();
int max_h = -1;
int max_w = -1;
for(int i = 0; i < batch_size; ++i) {
max_h = (max_h > resize_heights[i])? max_h:resize_heights[i];
max_w = (max_w > resize_widths[i])? max_w:resize_widths[i];
}
max_h = static_cast<int>(ceil(static_cast<float>(max_h) / static_cast<float>(coarsest_stride)) * coarsest_stride);
max_w = static_cast<int>(ceil(static_cast<float>(max_w) / static_cast<float>(coarsest_stride)) * coarsest_stride);
std::cout << "max_w: " << max_w << " max_h: " << max_h << std::endl;
input_buffer.insert(input_buffer.end(), batch_size * channels * max_h * max_w, 0);
// flatten tensor and padding
for(int i = 0; i < lod_buffer.size(); ++i) {
float *input_buffer_ptr = input_buffer.data() + i * channels * max_h * max_w;
const float *lod_ptr = lod_buffer[i].data();
for(int c = 0; c < channels; ++c) {
for(int h = 0; h < resize_heights[i]; ++h) {
memcpy(input_buffer_ptr, lod_ptr, resize_widths[i] * sizeof(float));
lod_ptr += resize_widths[i];
input_buffer_ptr += max_w;
}
input_buffer_ptr += (max_h - resize_heights[i]) * max_w;
}
}
// change resize w, h
for(int i = 0; i < batch_size; ++i){
resize_widths[i] = max_w;
resize_heights[i] = max_h;
}
}
void output_detection_result(const float* out_addr, const std::vector<std::vector<size_t>> &lod_vector, const std::vector<std::string> &imgs_batch){
for(int i = 0; i < lod_vector[0].size() - 1; ++i) {
DetectionResult detection_result;
detection_result.set_filename(imgs_batch[i]);
std::cout << imgs_batch[i] << ":" << std::endl;
for (int j = lod_vector[0][i]; j < lod_vector[0][i+1]; ++j) {
DetectionBox *box_ptr = detection_result.add_detection_boxes();
box_ptr->set_class_(static_cast<int>(round(out_addr[0 + j * 6])));
box_ptr->set_score(out_addr[1 + j * 6]);
box_ptr->set_left_top_x(out_addr[2 + j * 6]);
box_ptr->set_left_top_y(out_addr[3 + j * 6]);
box_ptr->set_right_bottom_x(out_addr[4 + j * 6]);
box_ptr->set_right_bottom_y(out_addr[5 + j * 6]);
printf("Class %d, score = %f, left top = [%f, %f], right bottom = [%f, %f]\n",
static_cast<int>(round(out_addr[0 + j * 6])), out_addr[1 + j * 6], out_addr[2 + j * 6],
out_addr[3 + j * 6], out_addr[4 + j * 6], out_addr[5 + j * 6]);
}
printf("\n");
std::ofstream output(imgs_batch[i] + ".pb", std::ios::out | std::ios::trunc | std::ios::binary);
detection_result.SerializeToOstream(&output);
output.close();
}
}
int DetectionPredictor::init(const std::string& conf) {
if (!_model_config.load_config(conf)) {
LOG(FATAL) << "Fail to load config file: [" << conf << "]";
return -1;
}
_preprocessor = PaddleSolution::create_processor(conf);
if (_preprocessor == nullptr) {
LOG(FATAL) << "Failed to create_processor";
return -1;
}
bool use_gpu = _model_config._use_gpu;
const auto& model_dir = _model_config._model_path;
const auto& model_filename = _model_config._model_file_name;
const auto& params_filename = _model_config._param_file_name;
// load paddle model file
if (_model_config._predictor_mode == "NATIVE") {
paddle::NativeConfig config;
auto prog_file = utils::path_join(model_dir, model_filename);
auto param_file = utils::path_join(model_dir, params_filename);
config.prog_file = prog_file;
config.param_file = param_file;
config.fraction_of_gpu_memory = 0;
config.use_gpu = use_gpu;
config.device = 0;
_main_predictor = paddle::CreatePaddlePredictor(config);
} else if (_model_config._predictor_mode == "ANALYSIS") {
paddle::AnalysisConfig config;
if (use_gpu) {
config.EnableUseGpu(100, 0);
}
auto prog_file = utils::path_join(model_dir, model_filename);
auto param_file = utils::path_join(model_dir, params_filename);
config.SetModel(prog_file, param_file);
config.SwitchUseFeedFetchOps(false);
config.SwitchSpecifyInputNames(true);
config.EnableMemoryOptim();
_main_predictor = paddle::CreatePaddlePredictor(config);
} else {
return -1;
}
return 0;
}
int DetectionPredictor::predict(const std::vector<std::string>& imgs) {
if (_model_config._predictor_mode == "NATIVE") {
return native_predict(imgs);
}
else if (_model_config._predictor_mode == "ANALYSIS") {
return analysis_predict(imgs);
}
return -1;
}
int DetectionPredictor::native_predict(const std::vector<std::string>& imgs) {
int config_batch_size = _model_config._batch_size;
int channels = _model_config._channels;
int eval_width = _model_config._resize[0];
int eval_height = _model_config._resize[1];
std::size_t total_size = imgs.size();
int default_batch_size = std::min(config_batch_size, (int)total_size);
int batch = total_size / default_batch_size + ((total_size % default_batch_size) != 0);
int batch_buffer_size = default_batch_size * channels * eval_width * eval_height;
auto& input_buffer = _buffer;
auto& imgs_batch = _imgs_batch;
float sr;
// DetectionResultsContainer result_container;
for (int u = 0; u < batch; ++u) {
int batch_size = default_batch_size;
if (u == (batch - 1) && (total_size % default_batch_size)) {
batch_size = total_size % default_batch_size;
}
int real_buffer_size = batch_size * channels * eval_width * eval_height;
std::vector<paddle::PaddleTensor> feeds;
input_buffer.clear();
imgs_batch.clear();
for (int i = 0; i < batch_size; ++i) {
int idx = u * default_batch_size + i;
imgs_batch.push_back(imgs[idx]);
}
std::vector<int> ori_widths;
std::vector<int> ori_heights;
std::vector<int> resize_widths;
std::vector<int> resize_heights;
std::vector<float> scale_ratios;
ori_widths.resize(batch_size);
ori_heights.resize(batch_size);
resize_widths.resize(batch_size);
resize_heights.resize(batch_size);
scale_ratios.resize(batch_size);
std::vector<std::vector<float>> lod_buffer(batch_size);
if (!_preprocessor->batch_process(imgs_batch, lod_buffer, ori_widths.data(), ori_heights.data(),
resize_widths.data(), resize_heights.data(), scale_ratios.data())) {
return -1;
}
// flatten and padding
padding_minibatch(lod_buffer, input_buffer, resize_heights, resize_widths, channels, _model_config._coarsest_stride);
paddle::PaddleTensor im_tensor, im_size_tensor, im_info_tensor;
im_tensor.name = "image";
im_tensor.shape = std::vector<int>({ batch_size, channels, resize_heights[0], resize_widths[0] });
im_tensor.data.Reset(input_buffer.data(), input_buffer.size() * sizeof(float));
im_tensor.dtype = paddle::PaddleDType::FLOAT32;
std::vector<float> image_infos;
for(int i = 0; i < batch_size; ++i) {
image_infos.push_back(resize_heights[i]);
image_infos.push_back(resize_widths[i]);
image_infos.push_back(scale_ratios[i]);
}
im_info_tensor.name = "info";
im_info_tensor.shape = std::vector<int>({batch_size, 3});
im_info_tensor.data.Reset(image_infos.data(), batch_size * 3 * sizeof(float));
im_info_tensor.dtype = paddle::PaddleDType::FLOAT32;
std::vector<int> image_size;
for(int i = 0; i < batch_size; ++i) {
image_size.push_back(ori_heights[i]);
image_size.push_back(ori_widths[i]);
}
std::vector<float> image_size_f;
for(int i = 0; i < batch_size; ++i) {
image_size_f.push_back(ori_heights[i]);
image_size_f.push_back(ori_widths[i]);
image_size_f.push_back(1.0);
}
int feeds_size = _model_config._feeds_size;
im_size_tensor.name = "im_size";
if(feeds_size == 2) {
im_size_tensor.shape = std::vector<int>({ batch_size, 2});
im_size_tensor.data.Reset(image_size.data(), batch_size * 2 * sizeof(int));
im_size_tensor.dtype = paddle::PaddleDType::INT32;
}
else if(feeds_size == 3) {
im_size_tensor.shape = std::vector<int>({ batch_size, 3});
im_size_tensor.data.Reset(image_size_f.data(), batch_size * 3 * sizeof(float));
im_size_tensor.dtype = paddle::PaddleDType::FLOAT32;
}
std::cout << "Feed size = " << feeds_size << std::endl;
feeds.push_back(im_tensor);
if(_model_config._feeds_size > 2) {
feeds.push_back(im_info_tensor);
}
feeds.push_back(im_size_tensor);
_outputs.clear();
auto t1 = std::chrono::high_resolution_clock::now();
if (!_main_predictor->Run(feeds, &_outputs, batch_size)) {
LOG(ERROR) << "Failed: NativePredictor->Run() return false at batch: " << u;
continue;
}
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
std::cout << "runtime = " << duration << std::endl;
std::cout << "Number of outputs:" << _outputs.size() << std::endl;
int out_num = 1;
// print shape of first output tensor for debugging
std::cout << "size of outputs[" << 0 << "]: (";
for (int j = 0; j < _outputs[0].shape.size(); ++j) {
out_num *= _outputs[0].shape[j];
std::cout << _outputs[0].shape[j] << ",";
}
std::cout << ")" << std::endl;
// const size_t nums = _outputs.front().data.length() / sizeof(float);
// if (out_num % batch_size != 0 || out_num != nums) {
// LOG(ERROR) << "outputs data size mismatch with shape size.";
// return -1;
// }
float* out_addr = (float *)(_outputs[0].data.data());
output_detection_result(out_addr, _outputs[0].lod, imgs_batch);
}
return 0;
}
int DetectionPredictor::analysis_predict(const std::vector<std::string>& imgs) {
int config_batch_size = _model_config._batch_size;
int channels = _model_config._channels;
int eval_width = _model_config._resize[0];
int eval_height = _model_config._resize[1];
auto total_size = imgs.size();
int default_batch_size = std::min(config_batch_size, (int)total_size);
int batch = total_size / default_batch_size + ((total_size % default_batch_size) != 0);
int batch_buffer_size = default_batch_size * channels * eval_width * eval_height;
auto& input_buffer = _buffer;
auto& imgs_batch = _imgs_batch;
//DetectionResultsContainer result_container;
for (int u = 0; u < batch; ++u) {
int batch_size = default_batch_size;
if (u == (batch - 1) && (total_size % default_batch_size)) {
batch_size = total_size % default_batch_size;
}
int real_buffer_size = batch_size * channels * eval_width * eval_height;
std::vector<paddle::PaddleTensor> feeds;
//input_buffer.resize(real_buffer_size);
input_buffer.clear();
imgs_batch.clear();
for (int i = 0; i < batch_size; ++i) {
int idx = u * default_batch_size + i;
imgs_batch.push_back(imgs[idx]);
}
std::vector<int> ori_widths;
std::vector<int> ori_heights;
std::vector<int> resize_widths;
std::vector<int> resize_heights;
std::vector<float> scale_ratios;
ori_widths.resize(batch_size);
ori_heights.resize(batch_size);
resize_widths.resize(batch_size);
resize_heights.resize(batch_size);
scale_ratios.resize(batch_size);
std::vector<std::vector<float>> lod_buffer(batch_size);
if (!_preprocessor->batch_process(imgs_batch, lod_buffer, ori_widths.data(), ori_heights.data(),
resize_widths.data(), resize_heights.data(), scale_ratios.data())){
std::cout << "Failed to preprocess!" << std::endl;
return -1;
}
//flatten tensor
padding_minibatch(lod_buffer, input_buffer, resize_heights, resize_widths, channels, _model_config._coarsest_stride);
std::vector<std::string> input_names = _main_predictor->GetInputNames();
auto im_tensor = _main_predictor->GetInputTensor(input_names.front());
im_tensor->Reshape({ batch_size, channels, resize_heights[0], resize_widths[0] });
im_tensor->copy_from_cpu(input_buffer.data());
if(input_names.size() > 2){
std::vector<float> image_infos;
for(int i = 0; i < batch_size; ++i) {
image_infos.push_back(resize_heights[i]);
image_infos.push_back(resize_widths[i]);
image_infos.push_back(scale_ratios[i]);
}
auto im_info_tensor = _main_predictor->GetInputTensor(input_names[1]);
im_info_tensor->Reshape({batch_size, 3});
im_info_tensor->copy_from_cpu(image_infos.data());
}
std::vector<int> image_size;
for(int i = 0; i < batch_size; ++i) {
image_size.push_back(ori_heights[i]);
image_size.push_back(ori_widths[i]);
}
std::vector<float> image_size_f;
for(int i = 0; i < batch_size; ++i) {
image_size_f.push_back(static_cast<float>(ori_heights[i]));
image_size_f.push_back(static_cast<float>(ori_widths[i]));
image_size_f.push_back(1.0);
}
auto im_size_tensor = _main_predictor->GetInputTensor(input_names.back());
if(input_names.size() > 2) {
im_size_tensor->Reshape({batch_size, 3});
im_size_tensor->copy_from_cpu(image_size_f.data());
}
else{
im_size_tensor->Reshape({batch_size, 2});
im_size_tensor->copy_from_cpu(image_size.data());
}
auto t1 = std::chrono::high_resolution_clock::now();
_main_predictor->ZeroCopyRun();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
std::cout << "runtime = " << duration << std::endl;
auto output_names = _main_predictor->GetOutputNames();
auto output_t = _main_predictor->GetOutputTensor(output_names[0]);
std::vector<float> out_data;
std::vector<int> output_shape = output_t->shape();
int out_num = 1;
std::cout << "size of outputs[" << 0 << "]: (";
for (int j = 0; j < output_shape.size(); ++j) {
out_num *= output_shape[j];
std::cout << output_shape[j] << ",";
}
std::cout << ")" << std::endl;
out_data.resize(out_num);
output_t->copy_to_cpu(out_data.data());
float* out_addr = (float *)(out_data.data());
auto lod_vector = output_t->lod();
output_detection_result(out_addr, lod_vector, imgs_batch);
}
return 0;
}
}
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <memory>
#include <string>
#include <vector>
#include <thread>
#include <chrono>
#include <algorithm>
#include <glog/logging.h>
#include <yaml-cpp/yaml.h>
#include <opencv2/opencv.hpp>
#include <paddle_inference_api.h>
#include <utils/conf_parser.h>
#include <utils/utils.h>
#include <preprocessor/preprocessor.h>
namespace PaddleSolution {
class DetectionPredictor {
public:
// init a predictor with a yaml config file
int init(const std::string& conf);
// predict api
int predict(const std::vector<std::string>& imgs);
private:
int native_predict(const std::vector<std::string>& imgs);
int analysis_predict(const std::vector<std::string>& imgs);
private:
std::vector<float> _buffer;
std::vector<std::string> _imgs_batch;
std::vector<paddle::PaddleTensor> _outputs;
PaddleSolution::PaddleModelConfigPaser _model_config;
std::shared_ptr<PaddleSolution::ImagePreProcessor> _preprocessor;
std::unique_ptr<paddle::PaddlePredictor> _main_predictor;
};
}
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <glog/logging.h>
#include "preprocessor.h"
#include "preprocessor_detection.h"
namespace PaddleSolution {
std::shared_ptr<ImagePreProcessor> create_processor(const std::string& conf_file) {
auto config = std::make_shared<PaddleSolution::PaddleModelConfigPaser>();
if (!config->load_config(conf_file)) {
LOG(FATAL) << "fail to laod conf file [" << conf_file << "]";
return nullptr;
}
if (config->_pre_processor == "DetectionPreProcessor") {
auto p = std::make_shared<DetectionPreProcessor>();
if (!p->init(config)) {
return nullptr;
}
return p;
}
LOG(FATAL) << "unknown processor_name [" << config->_pre_processor << "]";
return nullptr;
}
}
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <vector>
#include <string>
#include <memory>
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include "utils/conf_parser.h"
namespace PaddleSolution {
class ImagePreProcessor {
protected:
ImagePreProcessor() {};
public:
virtual ~ImagePreProcessor() {}
virtual bool single_process(const std::string& fname, float* data, int* ori_w, int* ori_h) {
return true;
}
virtual bool batch_process(const std::vector<std::string>& imgs, float* data, int* ori_w, int* ori_h) {
return true;
}
virtual bool single_process(const std::string& fname, float* data) {
return true;
}
virtual bool batch_process(const std::vector<std::string>& imgs, float* data) {
return true;
}
virtual bool single_process(const std::string& fname, std::vector<float> &data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio) {
return true;
}
virtual bool batch_process(const std::vector<std::string>& imgs, std::vector<std::vector<float>> &data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio) {
return true;
}
}; // end of class ImagePreProcessor
std::shared_ptr<ImagePreProcessor> create_processor(const std::string &config_file);
} // end of namespace paddle_solution
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <thread>
#include <mutex>
#include <glog/logging.h>
#include "preprocessor_detection.h"
#include "utils/utils.h"
namespace PaddleSolution {
bool DetectionPreProcessor::single_process(const std::string& fname, std::vector<float> &vec_data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio) {
cv::Mat im1 = cv::imread(fname, -1);
cv::Mat im;
if(_config->_feeds_size == 3) { // faster rcnn
im1.convertTo(im, CV_32FC3, 1/255.0);
}
else if(_config->_feeds_size == 2){ //yolo v3
im = im1;
}
if (im.data == nullptr || im.empty()) {
LOG(ERROR) << "Failed to open image: " << fname;
return false;
}
int channels = im.channels();
if (channels == 1) {
cv::cvtColor(im, im, cv::COLOR_GRAY2BGR);
}
channels = im.channels();
if (channels != 3 && channels != 4) {
LOG(ERROR) << "Only support rgb(gray) and rgba image.";
return false;
}
*ori_w = im.cols;
*ori_h = im.rows;
cv::cvtColor(im, im, cv::COLOR_BGR2RGB);
//channels = im.channels();
//resize
int rw = im.cols;
int rh = im.rows;
float im_scale_ratio;
utils::scaling(_config->_resize_type, rw, rh, _config->_resize[0], _config->_resize[1], _config->_target_short_size, _config->_resize_max_size, im_scale_ratio);
cv::Size resize_size(rw, rh);
*resize_w = rw;
*resize_h = rh;
*scale_ratio = im_scale_ratio;
if (*ori_h != rh || *ori_w != rw) {
cv::Mat im_temp;
if(_config->_resize_type == utils::SCALE_TYPE::UNPADDING) {
cv::resize(im, im_temp, resize_size, 0, 0, cv::INTER_LINEAR);
}
else if(_config->_resize_type == utils::SCALE_TYPE::RANGE_SCALING) {
cv::resize(im, im_temp, cv::Size(), im_scale_ratio, im_scale_ratio, cv::INTER_LINEAR);
}
im = im_temp;
}
vec_data.resize(channels * rw * rh);
float *data = vec_data.data();
float* pmean = _config->_mean.data();
float* pscale = _config->_std.data();
for (int h = 0; h < rh; ++h) {
const uchar* uptr = im.ptr<uchar>(h);
const float* fptr = im.ptr<float>(h);
int im_index = 0;
for (int w = 0; w < rw; ++w) {
for (int c = 0; c < channels; ++c) {
int top_index = (c * rh + h) * rw + w;
float pixel;// = static_cast<float>(fptr[im_index]);// / 255.0;
if(_config->_feeds_size == 2){ //yolo v3
pixel = static_cast<float>(uptr[im_index++]) / 255.0;
}
else if(_config->_feeds_size == 3){
pixel = fptr[im_index++];
}
pixel = (pixel - pmean[c]) / pscale[c];
data[top_index] = pixel;
}
}
}
return true;
}
bool DetectionPreProcessor::batch_process(const std::vector<std::string>& imgs, std::vector<std::vector<float>> &data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio) {
auto ic = _config->_channels;
auto iw = _config->_resize[0];
auto ih = _config->_resize[1];
std::vector<std::thread> threads;
for (int i = 0; i < imgs.size(); ++i) {
std::string path = imgs[i];
int* width = &ori_w[i];
int* height = &ori_h[i];
int* resize_width = &resize_w[i];
int* resize_height = &resize_h[i];
float* sr = &scale_ratio[i];
threads.emplace_back([this, &data, i, path, width, height, resize_width, resize_height, sr] {
std::vector<float> buffer;
single_process(path, buffer, width, height, resize_width, resize_height, sr);
data[i] = buffer;
});
}
for (auto& t : threads) {
if (t.joinable()) {
t.join();
}
}
return true;
}
bool DetectionPreProcessor::init(std::shared_ptr<PaddleSolution::PaddleModelConfigPaser> config) {
_config = config;
return true;
}
}
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "preprocessor.h"
namespace PaddleSolution {
class DetectionPreProcessor : public ImagePreProcessor {
public:
DetectionPreProcessor() : _config(nullptr) {
};
bool init(std::shared_ptr<PaddleSolution::PaddleModelConfigPaser> config);
bool single_process(const std::string& fname, std::vector<float> &data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio);
bool batch_process(const std::vector<std::string>& imgs, std::vector<std::vector<float>> &data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio);
private:
std::shared_ptr<PaddleSolution::PaddleModelConfigPaser> _config;
};
}
# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: detection_result.proto
import sys
_b = sys.version_info[0] < 3 and (lambda x: x) or (lambda x: x.encode('latin1'))
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
from google.protobuf import descriptor_pb2
# @@protoc_insertion_point(imports)
_sym_db = _symbol_database.Default()
DESCRIPTOR = _descriptor.FileDescriptor(
name='detection_result.proto',
package='PaddleSolution',
syntax='proto2',
serialized_pb=_b(
'\n\x16\x64\x65tection_result.proto\x12\x0ePaddleSolution\"\x84\x01\n\x0c\x44\x65tectionBox\x12\r\n\x05\x63lass\x18\x01 \x01(\x05\x12\r\n\x05score\x18\x02 \x01(\x02\x12\x12\n\nleft_top_x\x18\x03 \x01(\x02\x12\x12\n\nleft_top_y\x18\x04 \x01(\x02\x12\x16\n\x0eright_bottom_x\x18\x05 \x01(\x02\x12\x16\n\x0eright_bottom_y\x18\x06 \x01(\x02\"Z\n\x0f\x44\x65tectionResult\x12\x10\n\x08\x66ilename\x18\x01 \x01(\t\x12\x35\n\x0f\x64\x65tection_boxes\x18\x02 \x03(\x0b\x32\x1c.PaddleSolution.DetectionBox'
))
_sym_db.RegisterFileDescriptor(DESCRIPTOR)
_DETECTIONBOX = _descriptor.Descriptor(
name='DetectionBox',
full_name='PaddleSolution.DetectionBox',
filename=None,
file=DESCRIPTOR,
containing_type=None,
fields=[
_descriptor.FieldDescriptor(
name='class',
full_name='PaddleSolution.DetectionBox.class',
index=0,
number=1,
type=5,
cpp_type=1,
label=1,
has_default_value=False,
default_value=0,
message_type=None,
enum_type=None,
containing_type=None,
is_extension=False,
extension_scope=None,
options=None),
_descriptor.FieldDescriptor(
name='score',
full_name='PaddleSolution.DetectionBox.score',
index=1,
number=2,
type=2,
cpp_type=6,
label=1,
has_default_value=False,
default_value=float(0),
message_type=None,
enum_type=None,
containing_type=None,
is_extension=False,
extension_scope=None,
options=None),
_descriptor.FieldDescriptor(
name='left_top_x',
full_name='PaddleSolution.DetectionBox.left_top_x',
index=2,
number=3,
type=2,
cpp_type=6,
label=1,
has_default_value=False,
default_value=float(0),
message_type=None,
enum_type=None,
containing_type=None,
is_extension=False,
extension_scope=None,
options=None),
_descriptor.FieldDescriptor(
name='left_top_y',
full_name='PaddleSolution.DetectionBox.left_top_y',
index=3,
number=4,
type=2,
cpp_type=6,
label=1,
has_default_value=False,
default_value=float(0),
message_type=None,
enum_type=None,
containing_type=None,
is_extension=False,
extension_scope=None,
options=None),
_descriptor.FieldDescriptor(
name='right_bottom_x',
full_name='PaddleSolution.DetectionBox.right_bottom_x',
index=4,
number=5,
type=2,
cpp_type=6,
label=1,
has_default_value=False,
default_value=float(0),
message_type=None,
enum_type=None,
containing_type=None,
is_extension=False,
extension_scope=None,
options=None),
_descriptor.FieldDescriptor(
name='right_bottom_y',
full_name='PaddleSolution.DetectionBox.right_bottom_y',
index=5,
number=6,
type=2,
cpp_type=6,
label=1,
has_default_value=False,
default_value=float(0),
message_type=None,
enum_type=None,
containing_type=None,
is_extension=False,
extension_scope=None,
options=None),
],
extensions=[],
nested_types=[],
enum_types=[],
options=None,
is_extendable=False,
syntax='proto2',
extension_ranges=[],
oneofs=[],
serialized_start=43,
serialized_end=175, )
_DETECTIONRESULT = _descriptor.Descriptor(
name='DetectionResult',
full_name='PaddleSolution.DetectionResult',
filename=None,
file=DESCRIPTOR,
containing_type=None,
fields=[
_descriptor.FieldDescriptor(
name='filename',
full_name='PaddleSolution.DetectionResult.filename',
index=0,
number=1,
type=9,
cpp_type=9,
label=1,
has_default_value=False,
default_value=_b("").decode('utf-8'),
message_type=None,
enum_type=None,
containing_type=None,
is_extension=False,
extension_scope=None,
options=None),
_descriptor.FieldDescriptor(
name='detection_boxes',
full_name='PaddleSolution.DetectionResult.detection_boxes',
index=1,
number=2,
type=11,
cpp_type=10,
label=3,
has_default_value=False,
default_value=[],
message_type=None,
enum_type=None,
containing_type=None,
is_extension=False,
extension_scope=None,
options=None),
],
extensions=[],
nested_types=[],
enum_types=[],
options=None,
is_extendable=False,
syntax='proto2',
extension_ranges=[],
oneofs=[],
serialized_start=177,
serialized_end=267, )
_DETECTIONRESULT.fields_by_name['detection_boxes'].message_type = _DETECTIONBOX
DESCRIPTOR.message_types_by_name['DetectionBox'] = _DETECTIONBOX
DESCRIPTOR.message_types_by_name['DetectionResult'] = _DETECTIONRESULT
DetectionBox = _reflection.GeneratedProtocolMessageType(
'DetectionBox',
(_message.Message, ),
dict(
DESCRIPTOR=_DETECTIONBOX,
__module__='detection_result_pb2'
# @@protoc_insertion_point(class_scope:PaddleSolution.DetectionBox)
))
_sym_db.RegisterMessage(DetectionBox)
DetectionResult = _reflection.GeneratedProtocolMessageType(
'DetectionResult',
(_message.Message, ),
dict(
DESCRIPTOR=_DETECTIONRESULT,
__module__='detection_result_pb2'
# @@protoc_insertion_point(class_scope:PaddleSolution.DetectionResult)
))
_sym_db.RegisterMessage(DetectionResult)
# @@protoc_insertion_point(module_scope)
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -101,7 +101,8 @@ def load(anno_path, sample_num=-1, with_background=True):
gt_class[i][0] = catid2clsid[catid]
gt_bbox[i, :] = box['clean_bbox']
is_crowd[i][0] = box['iscrowd']
gt_poly[i] = box['segmentation']
if 'segmentation' in box:
gt_poly[i] = box['segmentation']
coco_rec = {
'im_file': im_fname,
......
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册