未验证 提交 d7548aad 编写于 作者: littletomatodonkey's avatar littletomatodonkey 提交者: GitHub

add res2net and hrnet model (#117)

* add res2net model

* modify yml files

* add res2net model zoo

* add hrnet model

* modify model zoo

* minor fix readme

* update readme in dirs

* minor fix softnms bug

* refine model zoo

* modify yml files

* modify pretrain names
上级 8192c758
......@@ -29,17 +29,17 @@ PaddleDetection的目的是为工业界和学术界提供丰富、易用的目
支持的模型结构:
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG |
|--------------------|:------:|------------------------------:|:----------:|:-----:|:---------:|:-------:|:---:|
| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Cascade Faster-CNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| Cascade Mask-CNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ |
| YOLOv3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG | HRNet | Res2Net |
|--------------------|:------:|------------------------------:|:----------:|:-----:|:---------:|:-------:|:---:|:-----:| :--: |
| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ |
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| YOLOv3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ |
<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) 模型提供了较大的精度提高和较少的性能损失。
......@@ -91,6 +91,11 @@ PaddleDetection的目的是为工业界和学术界提供丰富、易用的目
## 版本更新
### 12/2019
- 增加Res2Net模型。
- 增加HRNet模型。
### 21/11/2019
- 增加CascadeClsAware RCNN模型。
- 增加CBNet,ResNet200和Non-local模型。
......@@ -131,3 +136,4 @@ PaddleDetection的目的是为工业界和学术界提供丰富、易用的目
## 如何贡献代码
我们非常欢迎你可以为PaddleDetection提供代码,也十分感谢你的反馈。
✗ |
......@@ -39,17 +39,17 @@ multi-GPU training.
Supported Architectures:
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG |
| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: | :-----: | :--: |
| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| YOLOv3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG | HRNet | Res2Net |
| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: | :-----: | :--: | :--: | :--: |
| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ |
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| YOLOv3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ |
<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost.
......@@ -77,7 +77,7 @@ Advanced Features:
- Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md).
- [Face detection models](configs/face_detection/README.md)
- [Pretrained models for pedestrian and vehicle detection](contrib/README.md) Models for object detection in specific scenarios.
- [Pretrained models for pedestrian and vehicle detection](contrib/README.md) Models for object detection in specific scenarios.
- [YOLOv3 enhanced model](docs/YOLOv3_ENHANCEMENT.md) Compared to MAP of 33.0% in paper, enhanced YOLOv3 reaches the MAP of 41.4% and inference speed is improved as well
- [Objects365 2019 Challenge champion model](docs/CACascadeRCNN.md) One of the best single models in Objects365 Full Track of which MAP reaches 31.7%.
- [Open Images Dataset V5 and Objects365 Dataset models](docs/OIDV5_BASELINE_MODEL.md)
......@@ -99,6 +99,10 @@ Advanced Features:
## Updates
#### 12/2019
- Add Res2Net model.
- Add HRNet model.
#### 21/11/2019
- Add CascadeClsAware RCNN model.
- Add CBNet, ResNet200 and Non-local model.
......
......@@ -93,8 +93,8 @@ LearningRate:
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
start_factor: 0.0
steps: 2000
OptimizerBuilder:
optimizer:
......
# High-resolution networks (HRNets) for object detection
## Introduction
- Deep High-Resolution Representation Learning for Human Pose Estimation: [https://arxiv.org/abs/1902.09212](https://arxiv.org/abs/1902.09212)
```
@inproceedings{SunXLW19,
title={Deep High-Resolution Representation Learning for Human Pose Estimation},
author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
booktitle={CVPR},
year={2019}
}
```
- High-Resolution Representations for Labeling Pixels and Regions: [https://arxiv.org/abs/1904.04514](https://arxiv.org/abs/1904.04514)
```
@article{SunZJCXLMWLW19,
title={High-Resolution Representations for Labeling Pixels and Regions},
author={Ke Sun and Yang Zhao and Borui Jiang and Tianheng Cheng and Bin Xiao
and Dong Liu and Yadong Mu and Xinggang Wang and Wenyu Liu and Jingdong Wang},
journal = {CoRR},
volume = {abs/1904.04514},
year={2019}
}
```
## Model Zoo
| Backbone | Type | deformable Conv | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download |
| :---------------------- | :------------- | :---: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: |
| HRNetV2p_W18 | Faster | False | 2 | 1x | 17.509 | 36.0 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_1x.tar) |
| HRNetV2p_W18 | Faster | False | 2 | 2x | 17.509 | 38.0 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_2x.tar) |
architecture: FasterRCNN
max_iters: 90000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar
weights: output/faster_rcnn_hrnetv2p_w18_1x/model_final
metric: COCO
num_classes: 81
FasterRCNN:
backbone: HRNet
fpn: HRFPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
HRNet:
feature_maps: [2, 3, 4, 5]
width: 18
freeze_at: 0
norm_type: bn
HRFPN:
num_chan: 256
share_conv: false
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
TwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
_READER_: '../faster_fpn_reader.yml'
TrainReader:
batch_size: 2
architecture: FasterRCNN
max_iters: 180000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar
weights: output/faster_rcnn_hrnetv2p_w18_2x/model_final
metric: COCO
num_classes: 81
FasterRCNN:
backbone: HRNet
fpn: HRFPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
HRNet:
feature_maps: [2, 3, 4, 5]
width: 18
freeze_at: 0
norm_type: bn
HRFPN:
num_chan: 256
share_conv: false
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
TwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
_READER_: '../faster_fpn_reader.yml'
TrainReader:
batch_size: 2
# Res2Net
## Introduction
- Res2Net: A New Multi-scale Backbone Architecture: [https://arxiv.org/abs/1904.01169](https://arxiv.org/abs/1904.01169)
```
@article{DBLP:journals/corr/abs-1904-01169,
author = {Shanghua Gao and
Ming{-}Ming Cheng and
Kai Zhao and
Xinyu Zhang and
Ming{-}Hsuan Yang and
Philip H. S. Torr},
title = {Res2Net: {A} New Multi-scale Backbone Architecture},
journal = {CoRR},
volume = {abs/1904.01169},
year = {2019},
url = {http://arxiv.org/abs/1904.01169},
archivePrefix = {arXiv},
eprint = {1904.01169},
timestamp = {Thu, 25 Apr 2019 10:24:54 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1904-01169},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
## Model Zoo
| Backbone | Type | deformable Conv | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download |
| :---------------------- | :------------- | :---: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: |
| Res2Net50-FPN | Faster | False | 2 | 1x | 20.320 | 39.5 | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_res2net50_vb_26w_4s_fpn_1x.tar) |
| Res2Net50-FPN | Mask | False | 2 | 2x | 16.069 | 40.7 | 36.2 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_res2net50_vb_26w_4s_fpn_2x.tar) |
| Res2Net50-vd-FPN | Mask | False | 2 | 2x | 15.816 | 40.9 | 36.2 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_res2net50_vd_26w_4s_fpn_2x.tar) |
| Res2Net50-vd-FPN | Mask | True | 2 | 2x | 14.478 | 43.5 | 38.4 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x.tar) |
architecture: FasterRCNN
max_iters: 90000
snapshot_iter: 10000
use_gpu: true
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar
weights: output/faster_rcnn_res2net50_vb_26w_4s_fpn_1x/model_final
metric: COCO
num_classes: 81
FasterRCNN:
backbone: Res2Net
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
Res2Net:
depth: 50
width: 26
scales: 4
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: bn
variant: b
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
box_resolution: 7
sampling_ratio: 2
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
TwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
_READER_: '../faster_fpn_reader.yml'
TrainReader:
batch_size: 2
architecture: MaskRCNN
trarchitecture: MaskRCNN
use_gpu: true
max_iters: 180000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar
metric: COCO
weights: output/mask_rcnn_res2net50_vb_26w_4s_fpn_2x/model_final/
num_classes: 81
MaskRCNN:
backbone: Res2Net
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
Res2Net:
depth: 50
width: 26
scales: 4
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: bn
variant: b
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
sampling_ratio: 2
box_resolution: 7
mask_resolution: 14
MaskHead:
dilation: 1
conv_dim: 256
num_convs: 4
resolution: 28
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
MaskAssigner:
resolution: 28
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
TwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
_READER_: '../mask_fpn_reader.yml'
TrainReader:
batch_size: 2
architecture: MaskRCNN
trarchitecture: MaskRCNN
use_gpu: true
max_iters: 180000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_vd_26w_4s_pretrained.tar
metric: COCO
weights: output/mask_rcnn_res2net50_vd_26w_4s_fpn_2x/model_final/
num_classes: 81
MaskRCNN:
backbone: Res2Net
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
Res2Net:
depth: 50
width: 26
scales: 4
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: bn
variant: d
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
sampling_ratio: 2
box_resolution: 7
mask_resolution: 14
MaskHead:
dilation: 1
conv_dim: 256
num_convs: 4
resolution: 28
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
MaskAssigner:
resolution: 28
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
TwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [120000, 160000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
_READER_: '../mask_fpn_reader.yml'
TrainReader:
batch_size: 2
architecture: MaskRCNN
trarchitecture: MaskRCNN
use_gpu: true
max_iters: 90000
snapshot_iter: 10000
log_smooth_window: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_vd_26w_4s_pretrained.tar
metric: COCO
weights: output/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x/model_final/
num_classes: 81
MaskRCNN:
backbone: Res2Net
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: BBoxHead
bbox_assigner: BBoxAssigner
Res2Net:
depth: 50
width: 26
scales: 4
feature_maps: [2, 3, 4, 5]
freeze_at: 2
norm_type: bn
variant: d
dcn_v2_stages: [3, 4, 5]
FPN:
max_level: 6
min_level: 2
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
FPNRoIAlign:
canconical_level: 4
canonical_size: 224
max_level: 5
min_level: 2
sampling_ratio: 2
box_resolution: 7
mask_resolution: 14
MaskHead:
dilation: 1
conv_dim: 256
num_convs: 4
resolution: 28
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
MaskAssigner:
resolution: 28
BBoxHead:
head: TwoFCHead
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
TwoFCHead:
mlp_dim: 1024
LearningRate:
base_lr: 0.02
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
_READER_: '../mask_fpn_reader.yml'
TrainReader:
batch_size: 2
......@@ -86,12 +86,20 @@ The backbone models pretrained on ImageNet are available. All backbone models ar
| CBResNet200-vd-FPN-Nonlocal | Cascade Faster | c3-c5 | 1 | 2.5x | - | 53.3%(softnms) | - | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.tar) |
#### Notes:
- Deformable ConvNets v2(dcn_v2) reference from [Deformable ConvNets v2](https://arxiv.org/abs/1811.11168).
- `c3-c5` means adding `dcn` in resnet stage 3 to 5.
- Detailed configuration file in [configs/dcn](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/dcn)
### HRNet
* See more details in [HRNet model zoo](../configs/hrnet/README.md)
### Res2Net
* See more details in [Res2Net model zoo](../configs/res2net/README.md)
### Group Normalization
| Backbone | Type | Image/gpu | Lr schd | Box AP | Mask AP | Download |
| :------------------- | :------------- | :-----: | :-----: | :----: | :-----: | :----------------------------------------------------------: |
......
......@@ -82,13 +82,20 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
| ResNet200-vd-FPN-Nonlocal | CascadeClsAware Faster | c3-c5 | 1 | 2.5x | - | 51.7%(softnms) | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.tar) |
| CBResNet200-vd-FPN-Nonlocal | Cascade Faster | c3-c5 | 1 | 2.5x | - | 53.3%(softnms) | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.tar) |
#### 注意事项:
- Deformable卷积网络v2(dcn_v2)参考自论文[Deformable ConvNets v2](https://arxiv.org/abs/1811.11168).
- `c3-c5`意思是在resnet模块的3到5阶段增加`dcn`.
- 详细的配置文件在[configs/dcn](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/dcn)
### HRNet
* 详情见[HRNet模型库](../configs/hrnet/README.md)
### Res2Net
* 详情见[Res2Net模型库](../configs/res2net/README.md)
### Group Normalization
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 | Box AP | Mask AP | 下载 |
| :------------------- | :------------- |:--------: | :-----: | :----: | :-----: | :----------------------------------------------------------: |
......
......@@ -24,6 +24,9 @@ from . import vgg
from . import blazenet
from . import faceboxnet
from . import cb_resnet
from . import res2net
from . import hrnet
from . import hrfpn
from .resnet import *
from .resnext import *
......@@ -35,3 +38,6 @@ from .vgg import *
from .blazenet import *
from .faceboxnet import *
from .cb_resnet import *
from .res2net import *
from .hrnet import *
from .hrfpn import *
\ No newline at end of file
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Xavier
from paddle.fluid.regularizer import L2Decay
from ppdet.core.workspace import register
__all__ = ['HRFPN']
@register
class HRFPN(object):
"""
HRNet, see https://arxiv.org/abs/1908.07919
Args:
num_chan (int): number of feature channels
pooling_type (str): pooling type of downsampling
share_conv (bool): whethet to share conv for different layers' reduction
spatial_scale (list): feature map scaling factor
"""
def __init__(self,
num_chan=256,
pooling_type="avg",
share_conv=False,
spatial_scale=[1./64, 1./32, 1./16, 1./8, 1./4],
):
self.num_chan = num_chan
self.pooling_type = pooling_type
self.share_conv = share_conv
self.spatial_scale = spatial_scale
return
def get_output(self, body_dict):
num_out = len(self.spatial_scale)
body_name_list = list(body_dict.keys())
num_backbone_stages = len(body_name_list)
outs = []
outs.append(body_dict[body_name_list[0]])
# resize
for i in range(1, len(body_dict)):
resized = self.resize_input_tensor(body_dict[body_name_list[i]], outs[0], 2**i)
outs.append( resized )
# concat
out = fluid.layers.concat( outs, axis=1 )
# reduction
out = fluid.layers.conv2d(
input=out,
num_filters=self.num_chan,
filter_size=1,
stride=1,
padding=0,
param_attr=ParamAttr(name='hrfpn_reduction_weights'),
bias_attr=False)
# conv
outs = [out]
for i in range(1, num_out):
outs.append(self.pooling(out, size=2**i, stride=2**i, pooling_type=self.pooling_type))
outputs = []
for i in range(num_out):
conv_name = "shared_fpn_conv" if self.share_conv else "shared_fpn_conv_"+str(i)
conv = fluid.layers.conv2d(
input=outs[i],
num_filters=self.num_chan,
filter_size=3,
stride=1,
padding=1,
param_attr=ParamAttr(name=conv_name+"_weights"),
bias_attr=False)
outputs.append( conv )
for idx in range(0, num_out-len(body_name_list)):
body_name_list.append("fpn_res5_sum_subsampled_{}x".format( 2**(idx+1) ))
outputs = outputs[::-1]
body_name_list = body_name_list[::-1]
res_dict = OrderedDict([(body_name_list[k], outputs[k]) for k in range(len(body_name_list))])
return res_dict, self.spatial_scale
def resize_input_tensor(self, body_input, ref_output, scale):
shape = fluid.layers.shape(ref_output)
shape_hw = fluid.layers.slice(shape, axes=[0], starts=[2], ends=[4])
out_shape_ = shape_hw
out_shape = fluid.layers.cast(out_shape_, dtype='int32')
out_shape.stop_gradient = True
body_output = fluid.layers.resize_bilinear(
body_input, scale=scale, actual_shape=out_shape)
return body_output
def pooling(self, input, size, stride, pooling_type):
pool = fluid.layers.pool2d(input=input,
pool_size=size,
pool_stride=stride,
pool_type=pooling_type)
return pool
\ No newline at end of file
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.framework import Variable
from paddle.fluid.regularizer import L2Decay
from ppdet.core.workspace import register, serializable
from numbers import Integral
from paddle.fluid.initializer import MSRA
import math
from .name_adapter import NameAdapter
__all__ = ['HRNet']
@register
@serializable
class HRNet(object):
"""
HRNet, see https://arxiv.org/abs/1908.07919
Args:
depth (int): ResNet depth, should be 18, 34, 50, 101, 152.
freeze_at (int): freeze the backbone at which stage
norm_type (str): normalization type, 'bn'/'sync_bn'/'affine_channel'
freeze_norm (bool): freeze normalization layers
norm_decay (float): weight decay for normalization layer weights
variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
feature_maps (list): index of stages whose feature maps are returned
"""
def __init__(self,
width=40,
has_se=False,
freeze_at=2,
norm_type='bn',
freeze_norm=True,
norm_decay=0.,
feature_maps=[2, 3, 4, 5]):
super(HRNet, self).__init__()
if isinstance(feature_maps, Integral):
feature_maps = [feature_maps]
assert 0 <= freeze_at <= 4, "freeze_at should be 0, 1, 2, 3 or 4"
assert len(feature_maps) > 0, "need one or more feature maps"
assert norm_type in ['bn', 'sync_bn']
self.width = width
self.has_se = has_se
self.channels = {
18: [[18, 36], [18, 36, 72], [18, 36, 72, 144]],
30: [[30, 60], [30, 60, 120], [30, 60, 120, 240]],
32: [[32, 64], [32, 64, 128], [32, 64, 128, 256]],
40: [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
44: [[44, 88], [44, 88, 176], [44, 88, 176, 352]],
48: [[48, 96], [48, 96, 192], [48, 96, 192, 384]],
60: [[60, 120], [60, 120, 240], [60, 120, 240, 480]],
64: [[64, 128], [64, 128, 256], [64, 128, 256, 512]],
}
self.freeze_at = freeze_at
self.norm_type = norm_type
self.norm_decay = norm_decay
self.freeze_norm = freeze_norm
self._model_type = 'HRNet'
self.feature_maps = feature_maps
self.end_points = []
return
def net(self, input, class_dim=1000):
width = self.width
channels_2, channels_3, channels_4 = self.channels[width]
num_modules_2, num_modules_3, num_modules_4 = 1, 4, 3
x = self.conv_bn_layer(input=input,
filter_size=3,
num_filters=64,
stride=2,
if_act=True,
name='layer1_1')
x = self.conv_bn_layer(input=x,
filter_size=3,
num_filters=64,
stride=2,
if_act=True,
name='layer1_2')
la1 = self.layer1(x, name='layer2')
tr1 = self.transition_layer([la1], [256], channels_2, name='tr1')
st2 = self.stage(tr1, num_modules_2, channels_2, name='st2')
tr2 = self.transition_layer(st2, channels_2, channels_3, name='tr2')
st3 = self.stage(tr2, num_modules_3, channels_3, name='st3')
tr3 = self.transition_layer(st3, channels_3, channels_4, name='tr3')
st4 = self.stage(tr3, num_modules_4, channels_4, name='st4')
self.end_points = st4
return st4[-1]
def layer1(self, input, name=None):
conv = input
for i in range(4):
conv = self.bottleneck_block(conv,
num_filters=64,
downsample=True if i == 0 else False,
name=name+'_'+str(i+1))
return conv
def transition_layer(self, x, in_channels, out_channels, name=None):
num_in = len(in_channels)
num_out = len(out_channels)
out = []
for i in range(num_out):
if i < num_in:
if in_channels[i] != out_channels[i]:
residual = self.conv_bn_layer(x[i],
filter_size=3,
num_filters=out_channels[i],
name=name+'_layer_'+str(i+1))
out.append(residual)
else:
out.append(x[i])
else:
residual = self.conv_bn_layer(x[-1],
filter_size=3,
num_filters=out_channels[i],
stride=2,
name=name+'_layer_'+str(i+1))
out.append(residual)
return out
def branches(self, x, block_num, channels, name=None):
out = []
for i in range(len(channels)):
residual = x[i]
for j in range(block_num):
residual = self.basic_block(residual,
channels[i],
name=name+'_branch_layer_'+str(i+1)+'_'+str(j+1))
out.append(residual)
return out
def fuse_layers(self, x, channels, multi_scale_output=True, name=None):
out = []
for i in range(len(channels) if multi_scale_output else 1):
residual = x[i]
for j in range(len(channels)):
if j > i:
y = self.conv_bn_layer(x[j],
filter_size=1,
num_filters=channels[i],
if_act=False,
name=name+'_layer_'+str(i+1)+'_'+str(j+1))
y = fluid.layers.resize_nearest(input=y, scale=2 ** (j - i))
residual = fluid.layers.elementwise_add(
x=residual, y=y, act=None)
elif j < i:
y = x[j]
for k in range(i - j):
if k == i - j - 1:
y = self.conv_bn_layer(y,
filter_size=3,
num_filters=channels[i],
stride=2,if_act=False,
name=name+'_layer_'+str(i+1)+'_'+str(j+1)+'_'+str(k+1))
else:
y = self.conv_bn_layer(y,
filter_size=3,
num_filters=channels[j],
stride=2,
name=name+'_layer_'+str(i+1)+'_'+str(j+1)+'_'+str(k+1))
residual = fluid.layers.elementwise_add(
x=residual, y=y, act=None)
residual = fluid.layers.relu(residual)
out.append(residual)
return out
def high_resolution_module(self, x, channels, multi_scale_output=True, name=None):
residual = self.branches(x, 4, channels, name=name)
out = self.fuse_layers(residual, channels, multi_scale_output=multi_scale_output, name=name)
return out
def stage(self, x, num_modules, channels, multi_scale_output=True, name=None):
out = x
for i in range(num_modules):
if i == num_modules - 1 and multi_scale_output == False:
out = self.high_resolution_module(out,
channels,
multi_scale_output=False,
name=name+'_'+str(i+1))
else:
out = self.high_resolution_module(out,
channels,
name=name+'_'+str(i+1))
return out
def last_cls_out(self, x, name=None):
out = []
num_filters_list = [128, 256, 512, 1024]
for i in range(len(x)):
out.append(self.conv_bn_layer(input=x[i],
filter_size=1,
num_filters=num_filters_list[i],
name=name+'conv_'+str(i+1)))
return out
def basic_block(self, input, num_filters, stride=1, downsample=False, name=None):
residual = input
conv = self.conv_bn_layer(input=input,
filter_size=3,
num_filters=num_filters,
stride=stride,
name=name+'_conv1')
conv = self.conv_bn_layer(input=conv,
filter_size=3,
num_filters=num_filters,
if_act=False,
name=name+'_conv2')
if downsample:
residual = self.conv_bn_layer(input=input,
filter_size=1,
num_filters=num_filters,
if_act=False,
name=name+'_downsample')
if self.has_se:
conv = self.squeeze_excitation(
input=conv,
num_channels=num_filters,
reduction_ratio=16,
name='fc'+name)
return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
def bottleneck_block(self, input, num_filters, stride=1, downsample=False, name=None):
residual = input
conv = self.conv_bn_layer(input=input,
filter_size=1,
num_filters=num_filters,
name=name+'_conv1')
conv = self.conv_bn_layer(input=conv,
filter_size=3,
num_filters=num_filters,
stride=stride,
name=name+'_conv2')
conv = self.conv_bn_layer(input=conv,
filter_size=1,
num_filters=num_filters*4,
if_act=False,
name=name+'_conv3')
if downsample:
residual = self.conv_bn_layer(input=input,
filter_size=1,
num_filters=num_filters*4,
if_act=False,
name=name+'_downsample')
if self.has_se:
conv = self.squeeze_excitation(
input=conv,
num_channels=num_filters * 4,
reduction_ratio=16,
name='fc'+name)
return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
def squeeze_excitation(self, input, num_channels, reduction_ratio, name=None):
pool = fluid.layers.pool2d(
input=input, pool_size=0, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
squeeze = fluid.layers.fc(input=pool,
size=num_channels / reduction_ratio,
act='relu',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(
-stdv, stdv),name=name+'_sqz_weights'),
bias_attr=ParamAttr(name=name+'_sqz_offset'))
stdv = 1.0 / math.sqrt(squeeze.shape[1] * 1.0)
excitation = fluid.layers.fc(input=squeeze,
size=num_channels,
act='sigmoid',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(
-stdv, stdv),name=name+'_exc_weights'),
bias_attr=ParamAttr(name=name+'_exc_offset'))
scale = fluid.layers.elementwise_mul(x=input, y=excitation, axis=0)
return scale
def conv_bn_layer(self,input, filter_size, num_filters, stride=1, padding=1, num_groups=1, if_act=True, name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size-1)//2,
groups=num_groups,
act=None,
param_attr=ParamAttr(initializer=MSRA(), name=name+'_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = self._bn( input=conv, bn_name=bn_name )
if if_act:
bn = fluid.layers.relu(bn)
return bn
def _bn(self,
input,
act=None,
bn_name=None):
norm_lr = 0. if self.freeze_norm else 1.
norm_decay = self.norm_decay
pattr = ParamAttr(
name=bn_name + '_scale',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
battr = ParamAttr(
name=bn_name + '_offset',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
global_stats = True if self.freeze_norm else False
out = fluid.layers.batch_norm(
input=input,
act=act,
name=bn_name + '.output.1',
param_attr=pattr,
bias_attr=battr,
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance',
use_global_stats=global_stats)
scale = fluid.framework._get_var(pattr.name)
bias = fluid.framework._get_var(battr.name)
if self.freeze_norm:
scale.stop_gradient = True
bias.stop_gradient = True
return out
def __call__(self, input):
assert isinstance(input, Variable)
assert not (set(self.feature_maps) - set([2, 3, 4, 5])), \
"feature maps {} not in [2, 3, 4, 5]".format(self.feature_maps)
res_endpoints = []
res = input
feature_maps = self.feature_maps
self.net( input )
for i in feature_maps:
res = self.end_points[i-2]
if i in self.feature_maps:
res_endpoints.append(res)
if self.freeze_at >= i:
res.stop_gradient = True
return OrderedDict([('res{}_sum'.format(self.feature_maps[idx]), feat)
for idx, feat in enumerate(res_endpoints)])
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.framework import Variable
from paddle.fluid.regularizer import L2Decay
from paddle.fluid.initializer import Constant
from ppdet.core.workspace import register, serializable
from numbers import Integral
from .nonlocal_helper import add_space_nonlocal
from .name_adapter import NameAdapter
from .resnet import ResNet, ResNetC5
__all__ = ['Res2Net', 'Res2NetC5']
@register
@serializable
class Res2Net(ResNet):
"""
Res2Net, see https://arxiv.org/abs/1904.01169
Args:
depth (int): Res2Net depth, should be 50, 101, 152, 200.
width (int): Res2Net width
scales (int): Res2Net scale
freeze_at (int): freeze the backbone at which stage
norm_type (str): normalization type, 'bn'/'sync_bn'/'affine_channel'
freeze_norm (bool): freeze normalization layers
norm_decay (float): weight decay for normalization layer weights
variant (str): Res2Net variant, supports 'a', 'b', 'c', 'd' currently
feature_maps (list): index of stages whose feature maps are returned
dcn_v2_stages (list): index of stages who select deformable conv v2
nonlocal_stages (list): index of stages who select nonlocal networks
"""
__shared__ = ['norm_type', 'freeze_norm', 'weight_prefix_name']
def __init__(self,
depth=50,
width=26,
scales=4,
freeze_at=2,
norm_type='bn',
freeze_norm=True,
norm_decay=0.,
variant='b',
feature_maps=[2, 3, 4, 5],
dcn_v2_stages=[],
weight_prefix_name='',
nonlocal_stages=[],):
super(Res2Net, self).__init__(depth=depth,
freeze_at=freeze_at,
norm_type=norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay,
variant=variant,
feature_maps=feature_maps,
dcn_v2_stages=dcn_v2_stages,
weight_prefix_name=weight_prefix_name,
nonlocal_stages=nonlocal_stages)
assert depth >= 50, "just support depth>=50 in res2net, but got depth=".format(depth)
# res2net config
self.scales = scales
self.width = width
basic_width = self.width * self.scales
self.num_filters1 = [basic_width * t for t in [1, 2, 4, 8]]
self.num_filters2 = [256 * t for t in [1, 2, 4, 8]]
self.num_filters = [64, 128, 384, 768]
def bottleneck(self,
input,
num_filters1,
num_filters2,
stride,
is_first,
name,
dcn_v2=False):
conv0 = self._conv_norm(
input=input,
num_filters=num_filters1,
filter_size=1,
stride=1,
act='relu',
name=name + '_branch2a')
xs = fluid.layers.split(conv0, self.scales, 1)
ys = []
for s in range(self.scales - 1):
if s == 0 or stride == 2:
ys.append(self._conv_norm(input=xs[s],
num_filters=num_filters1//self.scales,
stride=stride,
filter_size=3,
act='relu',
name=name+ '_branch2b_' + str(s+1),
dcn_v2=dcn_v2))
else:
ys.append(self._conv_norm(input=xs[s]+ys[-1],
num_filters=num_filters1//self.scales,
stride=stride,
filter_size=3,
act='relu',
name=name+ '_branch2b_' + str(s+1),
dcn_v2=dcn_v2))
if stride == 1:
ys.append(xs[-1])
else:
ys.append(fluid.layers.pool2d(input=xs[-1],
pool_size=3,
pool_stride=stride,
pool_padding=1,
pool_type='avg'))
conv1 = fluid.layers.concat(ys, axis=1)
conv2 = self._conv_norm(
input=conv1,
num_filters=num_filters2,
filter_size=1,
act=None,
name=name+"_branch2c")
short = self._shortcut(input,
num_filters2,
stride,
is_first,
name=name + "_branch1")
return fluid.layers.elementwise_add(
x=short, y=conv2, act='relu', name=name + ".add.output.5")
def layer_warp(self, input, stage_num):
"""
Args:
input (Variable): input variable.
stage_num (int): the stage number, should be 2, 3, 4, 5
Returns:
The last variable in endpoint-th stage.
"""
assert stage_num in [2, 3, 4, 5]
stages, block_func = self.depth_cfg[self.depth]
count = stages[stage_num - 2]
ch_out = self.stage_filters[stage_num - 2]
is_first = False if stage_num != 2 else True
dcn_v2 = True if stage_num in self.dcn_v2_stages else False
num_filters1 = self.num_filters1[stage_num-2]
num_filters2 = self.num_filters2[stage_num-2]
nonlocal_mod = 1000
if stage_num in self.nonlocal_stages:
nonlocal_mod = self.nonlocal_mod_cfg[self.depth] if stage_num==4 else 2
# Make the layer name and parameter name consistent
# with ImageNet pre-trained model
conv = input
for i in range(count):
conv_name = self.na.fix_layer_warp_name(stage_num, count, i)
if self.depth < 50:
is_first = True if i == 0 and stage_num == 2 else False
conv = block_func(
input=conv,
num_filters1=num_filters1,
num_filters2=num_filters2,
stride=2 if i == 0 and stage_num != 2 else 1,
is_first=is_first,
name=conv_name,
dcn_v2=dcn_v2)
# add non local model
dim_in = conv.shape[1]
nonlocal_name = "nonlocal_conv{}".format( stage_num )
if i % nonlocal_mod == nonlocal_mod - 1:
conv = add_space_nonlocal(
conv, dim_in, dim_in,
nonlocal_name + '_{}'.format(i), int(dim_in / 2) )
return conv
@register
@serializable
class Res2NetC5(Res2Net):
__doc__ = Res2Net.__doc__
def __init__(self,
depth=50,
width=26,
scales=4,
freeze_at=2,
norm_type='bn',
freeze_norm=True,
norm_decay=0.,
variant='b',
feature_maps=[5],
weight_prefix_name=''):
super(Res2NetC5, self).__init__(depth, width, scales,
freeze_at, norm_type, freeze_norm,
norm_decay, variant, feature_maps)
self.severed_head = True
......@@ -207,31 +207,29 @@ class MultiClassNMS(object):
self.nms_eta = nms_eta
self.background_label = background_label
@register
@serializable
class MultiClassSoftNMS(object):
def __init__(self,
score_threshold=0.01,
keep_top_k=300,
softnms_sigma=0.5,
normalized=False,
background_label=0,
):
def __init__(
self,
score_threshold=0.01,
keep_top_k=300,
softnms_sigma=0.5,
normalized=False,
background_label=0, ):
super(MultiClassSoftNMS, self).__init__()
self.score_threshold = score_threshold
self.keep_top_k = keep_top_k
self.softnms_sigma = softnms_sigma
self.normalized = normalized
self.background_label = background_label
def __call__( self, bboxes, scores ):
def __call__(self, bboxes, scores):
def create_tmp_var(program, name, dtype, shape, lod_leval):
return program.current_block().create_var(name=name,
dtype=dtype,
shape=shape,
lod_leval=lod_leval)
return program.current_block().create_var(
name=name, dtype=dtype, shape=shape, lod_leval=lod_leval)
def _soft_nms_for_cls(dets, sigma, thres):
"""soft_nms_for_cls"""
dets_final = []
......@@ -240,6 +238,8 @@ class MultiClassSoftNMS(object):
dets_final.append(dets[maxpos].copy())
ts, tx1, ty1, tx2, ty2 = dets[maxpos]
scores = dets[:, 0]
# force remove bbox at maxpos
scores[maxpos] = -1
x1 = dets[:, 1]
y1 = dets[:, 2]
x2 = dets[:, 3]
......@@ -253,65 +253,69 @@ class MultiClassSoftNMS(object):
w = np.maximum(0.0, xx2 - xx1 + eta)
h = np.maximum(0.0, yy2 - yy1 + eta)
inter = w * h
ovr = inter / (areas + areas[maxpos] - inter)
ovr = inter / (areas + areas[maxpos] - inter)
weight = np.exp(-(ovr * ovr) / sigma)
scores = scores * weight
idx_keep = np.where(scores >= thres)
dets[:, 0] = scores
dets = dets[idx_keep]
dets_final = np.array(dets_final).reshape(-1, 5)
return dets_final
return dets_final
def _soft_nms(bboxes, scores):
bboxes = np.array(bboxes)
scores = np.array(scores)
class_nums = scores.shape[-1]
softnms_thres = self.score_threshold
softnms_sigma = self.softnms_sigma
keep_top_k = self.keep_top_k
cls_boxes = [[] for _ in range(class_nums)]
cls_ids = [[] for _ in range(class_nums)]
start_idx = 1 if self.background_label == 0 else 0
for j in range(start_idx, class_nums):
inds = np.where(scores[:, j] >= softnms_thres)[0]
scores_j = scores[inds, j]
rois_j = bboxes[inds, j, :]
dets_j = np.hstack((scores_j[:, np.newaxis], rois_j)).astype(np.float32, copy=False)
dets_j = np.hstack((scores_j[:, np.newaxis], rois_j)).astype(
np.float32, copy=False)
cls_rank = np.argsort(-dets_j[:, 0])
dets_j = dets_j[cls_rank]
cls_boxes[j] = _soft_nms_for_cls( dets_j, sigma=softnms_sigma, thres=softnms_thres )
cls_ids[j] = np.array( [j]*cls_boxes[j].shape[0] ).reshape(-1,1)
cls_boxes[j] = _soft_nms_for_cls(
dets_j, sigma=softnms_sigma, thres=softnms_thres)
cls_ids[j] = np.array([j] * cls_boxes[j].shape[0]).reshape(-1,
1)
cls_boxes = np.vstack(cls_boxes[start_idx:])
cls_ids = np.vstack(cls_ids[start_idx:])
pred_result = np.hstack( [cls_ids, cls_boxes] )
pred_result = np.hstack([cls_ids, cls_boxes])
# Limit to max_per_image detections **over all classes**
image_scores = cls_boxes[:,0]
image_scores = cls_boxes[:, 0]
if len(image_scores) > keep_top_k:
image_thresh = np.sort(image_scores)[-keep_top_k]
keep = np.where(cls_boxes[:, 0] >= image_thresh)[0]
pred_result = pred_result[keep, :]
res = fluid.LoDTensor()
res.set_lod([[0, pred_result.shape[0]]])
if pred_result.shape[0] == 0:
pred_result = np.array( [[1]], dtype=np.float32 )
pred_result = np.array([[1]], dtype=np.float32)
res.set(pred_result, fluid.CPUPlace())
return res
pred_result = create_tmp_var(fluid.default_main_program(),
name='softnms_pred_result',
dtype='float32',
shape=[6],
lod_leval=1)
fluid.layers.py_func(func=_soft_nms,
x=[bboxes, scores], out=pred_result)
pred_result = create_tmp_var(
fluid.default_main_program(),
name='softnms_pred_result',
dtype='float32',
shape=[6],
lod_leval=1)
fluid.layers.py_func(
func=_soft_nms, x=[bboxes, scores], out=pred_result)
return pred_result
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册