add res2net and hrnet model (#117)

* add res2net model * modify yml files * add res2net model zoo * add hrnet model * modify model zoo * minor fix readme * update readme in dirs * minor fix softnms bug * refine model zoo * modify yml files * modify pretrain names

add res2net and hrnet model (#117)
* add res2net model * modify yml files * add res2net model zoo * add hrnet model * modify model zoo * minor fix readme * update readme in dirs * minor fix softnms bug * refine model zoo * modify yml files * modify pretrain names
d7548aad · littletomatodonkey · GitHub · 8192c758 · d7548aad · d7548aad
18 changed file
--- a/README.md
+++ b/README.md
@@ -29,17 +29,17 @@ PaddleDetection的目的是为工业界和学术界提供丰富、易用的目

 支持的模型结构：

-|                    | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG |
-|--------------------|:------:|------------------------------:|:----------:|:-----:|:---------:|:-------:|:---:|
-| Faster R-CNN       | ✓      |                             ✓ | x          | ✓     | ✗         | ✗       | ✗   |
-| Faster R-CNN + FPN | ✓      |                             ✓ | ✓          | ✓     | ✗         | ✗       | ✗   |
-| Mask R-CNN         | ✓      |                             ✓ | x          | ✓     | ✗         | ✗       | ✗   |
-| Mask R-CNN + FPN   | ✓      |                             ✓ | ✓          | ✓     | ✗         | ✗       | ✗   |
-| Cascade Faster-CNN | ✓      |                             ✓ | ✓          | ✗     | ✗         | ✗       | ✗  |
-| Cascade Mask-CNN   | ✓      |                             ✗ | ✗          | ✓     | ✗         | ✗       | ✗   |
-| RetinaNet          | ✓      |                             ✗ | ✓          | ✗     | ✗         | ✗       | ✗   |
-| YOLOv3             | ✓      |                             ✗ | ✗          | ✗     | ✓         | ✓       | ✗   |
-| SSD                | ✗      |                             ✗ | ✗          | ✗     | ✓         | ✗       | ✓   |
+|                    | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG | HRNet | Res2Net |
+|--------------------|:------:|------------------------------:|:----------:|:-----:|:---------:|:-------:|:---:|:-----:| :--:    |
+| Faster R-CNN       | ✓      |                             ✓ | x          | ✓     | ✗         | ✗       | ✗   | ✗    |  ✗       |
+| Faster R-CNN + FPN | ✓      |                             ✓ | ✓          | ✓     | ✗         | ✗       | ✗   | ✓    |  ✓       |
+| Mask R-CNN         | ✓      |                             ✓ | x          | ✓     | ✗         | ✗       | ✗   | ✗    |  ✗       |
+| Mask R-CNN + FPN   | ✓      |                             ✓ | ✓          | ✓     | ✗         | ✗       | ✗   | ✗    |  ✓       |
+| Cascade Faster-RCNN | ✓      |                             ✓ | ✓          | ✗     | ✗         | ✗       | ✗  | ✗    |  ✗       |
+| Cascade Mask-RCNN  | ✓      |                             ✗ | ✗          | ✓     | ✗         | ✗       | ✗   | ✗    |  ✗       |
+| RetinaNet          | ✓      |                             ✗ | ✓          | ✗     | ✗         | ✗       | ✗   | ✗    |  ✗       |
+| YOLOv3             | ✓      |                             ✗ | ✗          | ✗     | ✓         | ✓       | ✗   | ✗    |  ✗       |
+| SSD                | ✗      |                             ✗ | ✗          | ✗     | ✓         | ✗       | ✓   | ✗    |  ✗       |

 <a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) 模型提供了较大的精度提高和较少的性能损失。

@@ -91,6 +91,11 @@ PaddleDetection的目的是为工业界和学术界提供丰富、易用的目

 ## 版本更新

+### 12/2019
+- 增加Res2Net模型。
+- 增加HRNet模型。
+
+
 ### 21/11/2019
 - 增加CascadeClsAware RCNN模型。
 - 增加CBNet，ResNet200和Non-local模型。
@@ -131,3 +136,4 @@ PaddleDetection的目的是为工业界和学术界提供丰富、易用的目
 ## 如何贡献代码

 我们非常欢迎你可以为PaddleDetection提供代码，也十分感谢你的反馈。
+  ✗       |
--- a/README_en.md
+++ b/README_en.md
@@ -39,17 +39,17 @@ multi-GPU training.

 Supported Architectures:

-|                     | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG  |
-| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: | :-----: | :--: |
-| Faster R-CNN        |   ✓    |                             ✓ |     x      |   ✓   |     ✗     |    ✗    |  ✗   |
-| Faster R-CNN + FPN  |   ✓    |                             ✓ |     ✓      |   ✓   |     ✗     |    ✗    |  ✗   |
-| Mask R-CNN          |   ✓    |                             ✓ |     x      |   ✓   |     ✗     |    ✗    |  ✗   |
-| Mask R-CNN + FPN    |   ✓    |                             ✓ |     ✓      |   ✓   |     ✗     |    ✗    |  ✗   |
-| Cascade Faster-RCNN |   ✓    |                             ✓ |     ✓      |   ✗   |     ✗     |    ✗    |  ✗   |
-| Cascade Mask-RCNN   |   ✓    |                             ✗ |     ✗      |   ✓   |     ✗     |    ✗    |  ✗   |
-| RetinaNet           |   ✓    |                             ✗ |     ✗      |   ✗   |     ✗     |    ✗    |  ✗   |
-| YOLOv3              |   ✓    |                             ✗ |     ✗      |   ✗   |     ✓     |    ✓    |  ✗   |
-| SSD                 |   ✗    |                             ✗ |     ✗      |   ✗   |     ✓     |    ✗    |  ✓   |
+|                     | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG  | HRNet | Res2Net |
+| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: | :-----: | :--: | :--:  | :--:    |
+| Faster R-CNN        |   ✓    |                             ✓ |     x      |   ✓   |     ✗     |    ✗    |  ✗   |  ✗    |  ✗      |
+| Faster R-CNN + FPN  |   ✓    |                             ✓ |     ✓      |   ✓   |     ✗     |    ✗    |  ✗   |  ✓    |  ✓      |
+| Mask R-CNN          |   ✓    |                             ✓ |     x      |   ✓   |     ✗     |    ✗    |  ✗   |  ✗    |  ✗      |
+| Mask R-CNN + FPN    |   ✓    |                             ✓ |     ✓      |   ✓   |     ✗     |    ✗    |  ✗   |  ✗    |  ✓      |
+| Cascade Faster-RCNN |   ✓    |                             ✓ |     ✓      |   ✗   |     ✗     |    ✗    |  ✗   |  ✗    |  ✗      |
+| Cascade Mask-RCNN   |   ✓    |                             ✗ |     ✗      |   ✓   |     ✗     |    ✗    |  ✗   |  ✗    |  ✗      |
+| RetinaNet           |   ✓    |                             ✗ |     ✗      |   ✗   |     ✗     |    ✗    |  ✗   |  ✗    |  ✗      |
+| YOLOv3              |   ✓    |                             ✗ |     ✗      |   ✗   |     ✓     |    ✓    |  ✗   |  ✗    |  ✗      |
+| SSD                 |   ✗    |                             ✗ |     ✗      |   ✗   |     ✓     |    ✗    |  ✓   |  ✗    |  ✗      |

 <a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost.

@@ -77,7 +77,7 @@ Advanced Features:

 - Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md).
 - [Face detection models](configs/face_detection/README.md)
- [Pretrained models for pedestrian  and vehicle detection](contrib/README.md) Models for object detection in specific scenarios. 
+- [Pretrained models for pedestrian  and vehicle detection](contrib/README.md) Models for object detection in specific scenarios.
 - [YOLOv3 enhanced model](docs/YOLOv3_ENHANCEMENT.md) Compared to MAP of 33.0% in paper, enhanced YOLOv3 reaches the MAP of 41.4% and inference speed is improved as well
 - [Objects365 2019 Challenge champion model](docs/CACascadeRCNN.md) One of the best single models in Objects365 Full Track of which MAP reaches 31.7%.
 - [Open Images Dataset V5 and Objects365 Dataset models](docs/OIDV5_BASELINE_MODEL.md)
@@ -99,6 +99,10 @@ Advanced Features:

 ## Updates

+#### 12/2019
+- Add Res2Net model.
+- Add HRNet model.
+
 #### 21/11/2019
 - Add CascadeClsAware RCNN model.
 - Add CBNet, ResNet200 and Non-local model.

--- a/configs/cascade_rcnn_cls_aware_r101_vd_fpn_1x_softnms.yml
+++ b/configs/cascade_rcnn_cls_aware_r101_vd_fpn_1x_softnms.yml
@@ -93,8 +93,8 @@ LearningRate:
    gamma: 0.1
    milestones: [60000, 80000]
  - !LinearWarmup
-    start_factor: 0.1
-    steps: 1000
+    start_factor: 0.0
+    steps: 2000

 OptimizerBuilder:
  optimizer:

--- a/configs/hrnet/README.md
+++ b/configs/hrnet/README.md
+# High-resolution networks (HRNets) for object detection
+
+## Introduction
+
+- Deep High-Resolution Representation Learning for Human Pose Estimation: [https://arxiv.org/abs/1902.09212](https://arxiv.org/abs/1902.09212)
+
+```
+@inproceedings{SunXLW19,
+  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
+  author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
+  booktitle={CVPR},
+  year={2019}
+}
+```
+
+- High-Resolution Representations for Labeling Pixels and Regions: [https://arxiv.org/abs/1904.04514](https://arxiv.org/abs/1904.04514)
+
+```
+@article{SunZJCXLMWLW19,
+  title={High-Resolution Representations for Labeling Pixels and Regions},
+  author={Ke Sun and Yang Zhao and Borui Jiang and Tianheng Cheng and Bin Xiao
+  and Dong Liu and Yadong Mu and Xinggang Wang and Wenyu Liu and Jingdong Wang},
+  journal   = {CoRR},
+  volume    = {abs/1904.04514},
+  year={2019}
+}
+```
+
+## Model Zoo
+
+| Backbone                | Type           | deformable Conv  | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP |                           Download                           |
+| :---------------------- | :------------- | :---: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: |
+| HRNetV2p_W18            | Faster         | False |     2     |   1x    |     17.509     |  36.0  |    -    | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_1x.tar) |
+| HRNetV2p_W18            | Faster         | False |     2     |   2x    |     17.509     |  38.0  |    -    | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_2x.tar) |
--- a/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x.yml
+++ b/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x.yml
+architecture: FasterRCNN
+max_iters: 90000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar
+weights: output/faster_rcnn_hrnetv2p_w18_1x/model_final
+metric: COCO
+num_classes: 81
+
+FasterRCNN:
+  backbone: HRNet
+  fpn: HRFPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+HRNet:
+  feature_maps: [2, 3, 4, 5]
+  width: 18
+  freeze_at: 0
+  norm_type: bn
+
+HRFPN:
+  num_chan: 256
+  share_conv: false
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+
+TwoFCHead:
+  mlp_dim: 1024
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+_READER_: '../faster_fpn_reader.yml'
+TrainReader:
+  batch_size: 2
--- a/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x.yml
+++ b/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x.yml
+architecture: FasterRCNN
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar
+weights: output/faster_rcnn_hrnetv2p_w18_2x/model_final
+metric: COCO
+num_classes: 81
+
+FasterRCNN:
+  backbone: HRNet
+  fpn: HRFPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+HRNet:
+  feature_maps: [2, 3, 4, 5]
+  width: 18
+  freeze_at: 0
+  norm_type: bn
+
+HRFPN:
+  num_chan: 256
+  share_conv: false
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+
+TwoFCHead:
+  mlp_dim: 1024
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+_READER_: '../faster_fpn_reader.yml'
+TrainReader:
+  batch_size: 2
--- a/configs/res2net/README.md
+++ b/configs/res2net/README.md
+# Res2Net
+
+## Introduction
+
+- Res2Net: A New Multi-scale Backbone Architecture: [https://arxiv.org/abs/1904.01169](https://arxiv.org/abs/1904.01169)
+
+```
+@article{DBLP:journals/corr/abs-1904-01169,
+  author    = {Shanghua Gao and
+               Ming{-}Ming Cheng and
+               Kai Zhao and
+               Xinyu Zhang and
+               Ming{-}Hsuan Yang and
+               Philip H. S. Torr},
+  title     = {Res2Net: {A} New Multi-scale Backbone Architecture},
+  journal   = {CoRR},
+  volume    = {abs/1904.01169},
+  year      = {2019},
+  url       = {http://arxiv.org/abs/1904.01169},
+  archivePrefix = {arXiv},
+  eprint    = {1904.01169},
+  timestamp = {Thu, 25 Apr 2019 10:24:54 +0200},
+  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1904-01169},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```
+
+
+## Model Zoo
+
+| Backbone                | Type           | deformable Conv  | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP |                           Download                           |
+| :---------------------- | :------------- | :---: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: |
+| Res2Net50-FPN            | Faster         | False |     2     |   1x    |     20.320     |  39.5  |    -    | [model](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_res2net50_vb_26w_4s_fpn_1x.tar) |
+| Res2Net50-FPN            | Mask         | False |     2     |   2x    |     16.069     |  40.7  |    36.2    | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_res2net50_vb_26w_4s_fpn_2x.tar) |
+| Res2Net50-vd-FPN            | Mask         | False |     2     |   2x    |     15.816     |  40.9  |    36.2    | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_res2net50_vd_26w_4s_fpn_2x.tar) |
+| Res2Net50-vd-FPN            | Mask         | True |     2     |   2x    |     14.478     |  43.5  |    38.4    | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x.tar) |
--- a/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_1x.yml
+++ b/configs/res2net/faster_rcnn_res2net50_vb_26w_4s_fpn_1x.yml
+architecture: FasterRCNN
+max_iters: 90000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar
+weights: output/faster_rcnn_res2net50_vb_26w_4s_fpn_1x/model_final
+metric: COCO
+num_classes: 81
+
+FasterRCNN:
+  backbone: Res2Net
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+Res2Net:
+  depth: 50
+  width: 26
+  scales: 4
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: bn
+  variant: b
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+
+TwoFCHead:
+  mlp_dim: 1024
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+_READER_: '../faster_fpn_reader.yml'
+TrainReader:
+  batch_size: 2
--- a/configs/res2net/mask_rcnn_res2net50_vb_26w_4s_fpn_2x.yml
+++ b/configs/res2net/mask_rcnn_res2net50_vb_26w_4s_fpn_2x.yml
+architecture: MaskRCNN
+trarchitecture: MaskRCNN
+use_gpu: true
+max_iters: 180000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_res2net50_vb_26w_4s_fpn_2x/model_final/
+num_classes: 81
+
+MaskRCNN:
+  backbone: Res2Net
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+Res2Net:
+  depth: 50
+  width: 26
+  scales: 4
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: bn
+  variant: b
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  conv_dim: 256
+  num_convs: 4
+  resolution: 28
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+
+MaskAssigner:
+  resolution: 28
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+
+TwoFCHead:
+  mlp_dim: 1024
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+_READER_: '../mask_fpn_reader.yml'
+TrainReader:
+  batch_size: 2
--- a/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_2x.yml
+++ b/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_2x.yml
+architecture: MaskRCNN
+trarchitecture: MaskRCNN
+use_gpu: true
+max_iters: 180000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_vd_26w_4s_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_res2net50_vd_26w_4s_fpn_2x/model_final/
+num_classes: 81
+
+MaskRCNN:
+  backbone: Res2Net
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+Res2Net:
+  depth: 50
+  width: 26
+  scales: 4
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: bn
+  variant: d
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  conv_dim: 256
+  num_convs: 4
+  resolution: 28
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+
+MaskAssigner:
+  resolution: 28
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+
+TwoFCHead:
+  mlp_dim: 1024
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+_READER_: '../mask_fpn_reader.yml'
+TrainReader:
+  batch_size: 2
--- a/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x.yml
+++ b/configs/res2net/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x.yml
+architecture: MaskRCNN
+trarchitecture: MaskRCNN
+use_gpu: true
+max_iters: 90000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_vd_26w_4s_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_res2net50_vd_26w_4s_fpn_dcnv2_1x/model_final/
+num_classes: 81
+
+MaskRCNN:
+  backbone: Res2Net
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+Res2Net:
+  depth: 50
+  width: 26
+  scales: 4
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: bn
+  variant: d
+  dcn_v2_stages: [3, 4, 5]
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  conv_dim: 256
+  num_convs: 4
+  resolution: 28
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+
+MaskAssigner:
+  resolution: 28
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+
+TwoFCHead:
+  mlp_dim: 1024
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+_READER_: '../mask_fpn_reader.yml'
+TrainReader:
+  batch_size: 2
--- a/docs/MODEL_ZOO.md
+++ b/docs/MODEL_ZOO.md
@@ -86,12 +86,20 @@ The backbone models pretrained on ImageNet are available. All backbone models ar
 | CBResNet200-vd-FPN-Nonlocal | Cascade Faster  | c3-c5 |     1     |   2.5x    |     -     |  53.3%(softnms)  |    -    | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.tar) |


-
 #### Notes:
 - Deformable ConvNets v2(dcn_v2) reference from [Deformable ConvNets v2](https://arxiv.org/abs/1811.11168).
 - `c3-c5` means adding `dcn` in resnet stage 3 to 5.
 - Detailed configuration file in [configs/dcn](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/dcn)

+
+### HRNet
+* See more details in [HRNet model zoo](../configs/hrnet/README.md)。
+
+
+### Res2Net
+* See more details in [Res2Net model zoo](../configs/res2net/README.md)。
+
+
 ### Group Normalization
 | Backbone             | Type           | Image/gpu | Lr schd | Box AP | Mask AP |                           Download                           |
 | :------------------- | :------------- | :-----: | :-----: | :----: | :-----: | :----------------------------------------------------------: |

--- a/docs/MODEL_ZOO_cn.md
+++ b/docs/MODEL_ZOO_cn.md
@@ -82,13 +82,20 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
 | ResNet200-vd-FPN-Nonlocal   | CascadeClsAware Faster   | c3-c5 |     1     |   2.5x    |     -     |  51.7%(softnms)  |    -    | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.tar) |
 | CBResNet200-vd-FPN-Nonlocal   | Cascade Faster  | c3-c5 |     1     |   2.5x    |     -     |  53.3%(softnms)  |    -    | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.tar) |

-
-
 #### 注意事项:
 - Deformable卷积网络v2(dcn_v2)参考自论文[Deformable ConvNets v2](https://arxiv.org/abs/1811.11168).
 - `c3-c5`意思是在resnet模块的3到5阶段增加`dcn`.
 - 详细的配置文件在[configs/dcn](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/dcn)

+
+### HRNet
+* 详情见[HRNet模型库](../configs/hrnet/README.md)。
+
+
+### Res2Net
+* 详情见[Res2Net模型库](../configs/res2net/README.md)。
+
+
 ### Group Normalization
 | 骨架网络             | 网络类型           | 每张GPU图片个数 | 学习率策略 | Box AP | Mask AP |                           下载                           |
 | :------------------- | :------------- |:--------: | :-----: | :----: | :-----: | :----------------------------------------------------------: |

--- a/ppdet/modeling/backbones/__init__.py
+++ b/ppdet/modeling/backbones/__init__.py
@@ -24,6 +24,9 @@ from . import vgg
 from . import blazenet
 from . import faceboxnet
 from . import cb_resnet
+from . import res2net
+from . import hrnet
+from . import hrfpn

 from .resnet import *
 from .resnext import *
@@ -35,3 +38,6 @@ from .vgg import *
 from .blazenet import *
 from .faceboxnet import *
 from .cb_resnet import *
+from .res2net import *
+from .hrnet import *
+from .hrfpn import *
\ No newline at end of file
--- a/ppdet/modeling/backbones/hrfpn.py
+++ b/ppdet/modeling/backbones/hrfpn.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Xavier
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.core.workspace import register
+
+__all__ = ['HRFPN']
+
+
+@register
+class HRFPN(object):
+    """
+    HRNet, see https://arxiv.org/abs/1908.07919
+
+    Args:
+        num_chan (int): number of feature channels
+        pooling_type (str): pooling type of downsampling
+        share_conv (bool): whethet to share conv for different layers' reduction
+        spatial_scale (list): feature map scaling factor
+    """
+
+    def __init__(self,
+                 num_chan=256,
+                 pooling_type="avg",
+                 share_conv=False,
+                 spatial_scale=[1./64, 1./32, 1./16, 1./8, 1./4],
+                 ):
+        self.num_chan = num_chan
+        self.pooling_type = pooling_type
+        self.share_conv = share_conv
+        self.spatial_scale = spatial_scale
+        return
+    
+    def get_output(self, body_dict):
+        num_out = len(self.spatial_scale)
+        body_name_list = list(body_dict.keys())
+        
+        num_backbone_stages = len(body_name_list)
+        
+        outs = []
+        outs.append(body_dict[body_name_list[0]])
+        
+        # resize
+        for i in range(1, len(body_dict)):
+            resized = self.resize_input_tensor(body_dict[body_name_list[i]], outs[0], 2**i)
+            outs.append( resized )
+        
+        # concat
+        out = fluid.layers.concat( outs, axis=1 )
+        
+        # reduction
+        out = fluid.layers.conv2d(
+            input=out,
+            num_filters=self.num_chan,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            param_attr=ParamAttr(name='hrfpn_reduction_weights'),
+            bias_attr=False)
+        
+        # conv
+        outs = [out]
+        for i in range(1, num_out):
+            outs.append(self.pooling(out, size=2**i, stride=2**i, pooling_type=self.pooling_type))
+        outputs = []
+        
+        for i in range(num_out):
+            conv_name = "shared_fpn_conv" if self.share_conv else "shared_fpn_conv_"+str(i)
+            conv = fluid.layers.conv2d(
+                    input=outs[i],
+                    num_filters=self.num_chan,
+                    filter_size=3,
+                    stride=1,
+                    padding=1,
+                    param_attr=ParamAttr(name=conv_name+"_weights"),
+                    bias_attr=False)
+            outputs.append( conv )
+        
+        for idx in range(0, num_out-len(body_name_list)):
+            body_name_list.append("fpn_res5_sum_subsampled_{}x".format( 2**(idx+1) ))
+        
+        outputs = outputs[::-1]
+        body_name_list = body_name_list[::-1]
+        
+        res_dict = OrderedDict([(body_name_list[k], outputs[k]) for k in range(len(body_name_list))])
+        return res_dict, self.spatial_scale
+    
+    def resize_input_tensor(self, body_input, ref_output, scale):
+        shape = fluid.layers.shape(ref_output)
+        shape_hw = fluid.layers.slice(shape, axes=[0], starts=[2], ends=[4])
+        out_shape_ = shape_hw
+        out_shape = fluid.layers.cast(out_shape_, dtype='int32')
+        out_shape.stop_gradient = True
+        body_output = fluid.layers.resize_bilinear(
+            body_input, scale=scale, actual_shape=out_shape)
+        return body_output
+    
+    def pooling(self, input, size, stride, pooling_type):
+        pool = fluid.layers.pool2d(input=input,
+            pool_size=size,
+            pool_stride=stride,
+            pool_type=pooling_type)
+        return pool
+
+    
\ No newline at end of file
--- a/ppdet/modeling/backbones/hrnet.py
+++ b/ppdet/modeling/backbones/hrnet.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.framework import Variable
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.core.workspace import register, serializable
+from numbers import Integral
+from paddle.fluid.initializer import MSRA
+import math
+
+from .name_adapter import NameAdapter
+
+__all__ = ['HRNet']
+
+
+@register
+@serializable
+class HRNet(object):
+    """
+    HRNet, see https://arxiv.org/abs/1908.07919
+    Args:
+        depth (int): ResNet depth, should be 18, 34, 50, 101, 152.
+        freeze_at (int): freeze the backbone at which stage
+        norm_type (str): normalization type, 'bn'/'sync_bn'/'affine_channel'
+        freeze_norm (bool): freeze normalization layers
+        norm_decay (float): weight decay for normalization layer weights
+        variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
+        feature_maps (list): index of stages whose feature maps are returned
+    """
+
+    def __init__(self,
+                 width=40,
+                 has_se=False,
+                 freeze_at=2,
+                 norm_type='bn',
+                 freeze_norm=True,
+                 norm_decay=0.,
+                 feature_maps=[2, 3, 4, 5]):
+        super(HRNet, self).__init__()
+
+        if isinstance(feature_maps, Integral):
+            feature_maps = [feature_maps]
+
+        assert 0 <= freeze_at <= 4, "freeze_at should be 0, 1, 2, 3 or 4"
+        assert len(feature_maps) > 0, "need one or more feature maps"
+        assert norm_type in ['bn', 'sync_bn']
+        
+        self.width = width
+        self.has_se = has_se
+        self.channels = {
+            18: [[18, 36], [18, 36, 72], [18, 36, 72, 144]],
+            30: [[30, 60], [30, 60, 120], [30, 60, 120, 240]],
+            32: [[32, 64], [32, 64, 128], [32, 64, 128, 256]],
+            40: [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
+            44: [[44, 88], [44, 88, 176], [44, 88, 176, 352]],
+            48: [[48, 96], [48, 96, 192], [48, 96, 192, 384]],
+            60: [[60, 120], [60, 120, 240], [60, 120, 240, 480]],
+            64: [[64, 128], [64, 128, 256], [64, 128, 256, 512]],
+            }
+
+        self.freeze_at = freeze_at
+        self.norm_type = norm_type
+        self.norm_decay = norm_decay
+        self.freeze_norm = freeze_norm
+        self._model_type = 'HRNet'
+        self.feature_maps = feature_maps
+        self.end_points = []
+        return
+    
+    def net(self, input, class_dim=1000):
+        width = self.width
+        channels_2, channels_3, channels_4 = self.channels[width]   
+        num_modules_2, num_modules_3, num_modules_4 = 1, 4, 3
+  
+        x = self.conv_bn_layer(input=input,
+                               filter_size=3,
+                               num_filters=64,
+                               stride=2,
+                               if_act=True,
+                               name='layer1_1')
+        x = self.conv_bn_layer(input=x,
+                               filter_size=3,
+                               num_filters=64,
+                               stride=2,
+                               if_act=True,
+                               name='layer1_2')
+
+        la1 = self.layer1(x, name='layer2')
+        tr1 = self.transition_layer([la1], [256], channels_2, name='tr1')
+        st2 = self.stage(tr1, num_modules_2, channels_2, name='st2')
+        tr2 = self.transition_layer(st2, channels_2, channels_3, name='tr2')
+        st3 = self.stage(tr2, num_modules_3, channels_3, name='st3')
+        tr3 = self.transition_layer(st3, channels_3, channels_4, name='tr3')
+        st4 = self.stage(tr3, num_modules_4, channels_4, name='st4')
+        
+        self.end_points = st4
+        return st4[-1]
+    
+    def layer1(self, input, name=None):
+        conv = input
+        for i in range(4):
+            conv = self.bottleneck_block(conv,
+                                         num_filters=64,
+                                         downsample=True if i == 0 else False,
+                                         name=name+'_'+str(i+1))
+        return conv
+    
+    def transition_layer(self, x, in_channels, out_channels, name=None):
+        num_in = len(in_channels)
+        num_out = len(out_channels)
+        out = []
+        for i in range(num_out):
+            if i < num_in:
+                if in_channels[i] != out_channels[i]:
+                    residual = self.conv_bn_layer(x[i],
+                                                  filter_size=3,
+                                                  num_filters=out_channels[i],
+                                                  name=name+'_layer_'+str(i+1))
+                    out.append(residual)
+                else:
+                    out.append(x[i])
+            else:
+                residual = self.conv_bn_layer(x[-1],
+                                              filter_size=3,
+                                              num_filters=out_channels[i],
+                                              stride=2, 
+                                              name=name+'_layer_'+str(i+1))
+                out.append(residual)
+        return out
+
+    def branches(self, x, block_num, channels, name=None):
+        out = []
+        for i in range(len(channels)):
+            residual = x[i]
+            for j in range(block_num):
+                residual = self.basic_block(residual,
+                                            channels[i],
+                                            name=name+'_branch_layer_'+str(i+1)+'_'+str(j+1))
+            out.append(residual)
+        return out
+
+    def fuse_layers(self, x, channels, multi_scale_output=True, name=None):
+        out = []
+        for i in range(len(channels) if multi_scale_output else 1):
+            residual = x[i]
+            for j in range(len(channels)):
+                if j > i:
+                    y = self.conv_bn_layer(x[j],
+                                           filter_size=1,
+                                           num_filters=channels[i],
+                                           if_act=False, 
+                                           name=name+'_layer_'+str(i+1)+'_'+str(j+1))
+                    y = fluid.layers.resize_nearest(input=y, scale=2 ** (j - i))
+                    residual = fluid.layers.elementwise_add(
+                        x=residual, y=y, act=None)
+                elif j < i:
+                    y = x[j]
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            y = self.conv_bn_layer(y,
+                                                   filter_size=3,
+                                                   num_filters=channels[i],
+                                                   stride=2,if_act=False, 
+                                                   name=name+'_layer_'+str(i+1)+'_'+str(j+1)+'_'+str(k+1))
+                        else:
+                            y = self.conv_bn_layer(y,
+                                                   filter_size=3,
+                                                   num_filters=channels[j],
+                                                   stride=2,
+                                                   name=name+'_layer_'+str(i+1)+'_'+str(j+1)+'_'+str(k+1))
+                    residual = fluid.layers.elementwise_add(
+                        x=residual, y=y, act=None)        
+
+            residual = fluid.layers.relu(residual)
+            out.append(residual)
+        return out
+    
+    def high_resolution_module(self, x, channels, multi_scale_output=True, name=None):
+        residual = self.branches(x, 4, channels, name=name)
+        out = self.fuse_layers(residual, channels, multi_scale_output=multi_scale_output, name=name)
+        return out
+    
+    def stage(self, x, num_modules, channels, multi_scale_output=True, name=None):
+        out = x
+        for i in range(num_modules):
+            if i == num_modules - 1 and multi_scale_output == False:
+                out = self.high_resolution_module(out,
+                                                  channels,
+                                                  multi_scale_output=False,
+                                                  name=name+'_'+str(i+1))
+            else:
+                out = self.high_resolution_module(out,
+                                                  channels,
+                                                  name=name+'_'+str(i+1))
+
+        return out
+    
+    def last_cls_out(self, x, name=None):
+        out = []
+        num_filters_list = [128, 256, 512, 1024]
+        for i in range(len(x)):
+            out.append(self.conv_bn_layer(input=x[i],
+                                          filter_size=1,
+                                          num_filters=num_filters_list[i], 
+                                          name=name+'conv_'+str(i+1)))
+        return out
+
+    
+    def basic_block(self, input, num_filters, stride=1, downsample=False, name=None):
+        residual = input
+        conv = self.conv_bn_layer(input=input,
+                                  filter_size=3,
+                                  num_filters=num_filters,
+                                  stride=stride,
+                                  name=name+'_conv1')
+        conv = self.conv_bn_layer(input=conv,
+                                  filter_size=3,
+                                  num_filters=num_filters,
+                                  if_act=False,
+                                  name=name+'_conv2')
+        if downsample:
+            residual = self.conv_bn_layer(input=input,
+                                          filter_size=1,
+                                          num_filters=num_filters,
+                                          if_act=False, 
+                                          name=name+'_downsample')
+        if self.has_se:
+            conv = self.squeeze_excitation(
+                input=conv,
+                num_channels=num_filters,
+                reduction_ratio=16,
+                name='fc'+name)
+        return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
+    
+
+    def bottleneck_block(self, input, num_filters, stride=1, downsample=False, name=None):
+        residual = input
+        conv = self.conv_bn_layer(input=input,
+                                  filter_size=1, 
+                                  num_filters=num_filters,
+                                  name=name+'_conv1')
+        conv = self.conv_bn_layer(input=conv,
+                                  filter_size=3,
+                                  num_filters=num_filters,
+                                  stride=stride,
+                                  name=name+'_conv2')
+        conv = self.conv_bn_layer(input=conv,
+                                  filter_size=1,
+                                  num_filters=num_filters*4,
+                                  if_act=False,
+                                  name=name+'_conv3')
+        if downsample:
+            residual = self.conv_bn_layer(input=input,
+                                          filter_size=1,
+                                          num_filters=num_filters*4,
+                                          if_act=False,
+                                          name=name+'_downsample')
+        if self.has_se:
+            conv = self.squeeze_excitation(
+                input=conv,
+                num_channels=num_filters * 4,
+                reduction_ratio=16,
+                name='fc'+name)
+        return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
+        
+    def squeeze_excitation(self, input, num_channels, reduction_ratio, name=None):
+        pool = fluid.layers.pool2d(
+            input=input, pool_size=0, pool_type='avg', global_pooling=True)
+        stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+        squeeze = fluid.layers.fc(input=pool,
+                                  size=num_channels / reduction_ratio,
+                                  act='relu',
+                                  param_attr=fluid.param_attr.ParamAttr(
+                                      initializer=fluid.initializer.Uniform(
+                                          -stdv, stdv),name=name+'_sqz_weights'),
+                                 bias_attr=ParamAttr(name=name+'_sqz_offset'))
+        stdv = 1.0 / math.sqrt(squeeze.shape[1] * 1.0)
+        excitation = fluid.layers.fc(input=squeeze,
+                                     size=num_channels,
+                                     act='sigmoid',
+                                     param_attr=fluid.param_attr.ParamAttr(
+                                         initializer=fluid.initializer.Uniform(
+                                             -stdv, stdv),name=name+'_exc_weights'),
+                                     bias_attr=ParamAttr(name=name+'_exc_offset'))
+        scale = fluid.layers.elementwise_mul(x=input, y=excitation, axis=0)
+        return scale
+    
+    def conv_bn_layer(self,input, filter_size, num_filters, stride=1, padding=1, num_groups=1, if_act=True, name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size-1)//2,
+            groups=num_groups,
+            act=None,
+            param_attr=ParamAttr(initializer=MSRA(), name=name+'_weights'),
+            bias_attr=False)
+        bn_name = name + '_bn'
+        bn = self._bn( input=conv, bn_name=bn_name )
+        if if_act:
+            bn = fluid.layers.relu(bn)
+        return bn
+    
+    def _bn(self,
+           input,
+           act=None,
+           bn_name=None):
+        norm_lr = 0. if self.freeze_norm else 1.
+        norm_decay = self.norm_decay
+        pattr = ParamAttr(
+            name=bn_name + '_scale',
+            learning_rate=norm_lr,
+            regularizer=L2Decay(norm_decay))
+        battr = ParamAttr(
+            name=bn_name + '_offset',
+            learning_rate=norm_lr,
+            regularizer=L2Decay(norm_decay))
+        
+        global_stats = True if self.freeze_norm else False
+        out = fluid.layers.batch_norm(
+            input=input,
+            act=act,
+            name=bn_name + '.output.1',
+            param_attr=pattr,
+            bias_attr=battr,
+            moving_mean_name=bn_name + '_mean',
+            moving_variance_name=bn_name + '_variance',
+            use_global_stats=global_stats)
+        scale = fluid.framework._get_var(pattr.name)
+        bias = fluid.framework._get_var(battr.name)
+        if self.freeze_norm:
+            scale.stop_gradient = True
+            bias.stop_gradient = True
+        return out
+    
+    def __call__(self, input):
+        assert isinstance(input, Variable)
+        assert not (set(self.feature_maps) - set([2, 3, 4, 5])), \
+            "feature maps {} not in [2, 3, 4, 5]".format(self.feature_maps)
+
+        res_endpoints = []
+
+        res = input
+        feature_maps = self.feature_maps
+        self.net( input )
+
+        for i in feature_maps:
+            res = self.end_points[i-2]
+            if i in self.feature_maps:
+                res_endpoints.append(res)
+            if self.freeze_at >= i:
+                res.stop_gradient = True
+        
+        return OrderedDict([('res{}_sum'.format(self.feature_maps[idx]), feat)
+                            for idx, feat in enumerate(res_endpoints)])
+
--- a/ppdet/modeling/backbones/res2net.py
+++ b/ppdet/modeling/backbones/res2net.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.framework import Variable
+from paddle.fluid.regularizer import L2Decay
+from paddle.fluid.initializer import Constant
+
+from ppdet.core.workspace import register, serializable
+from numbers import Integral
+
+from .nonlocal_helper import add_space_nonlocal
+from .name_adapter import NameAdapter
+from .resnet import ResNet, ResNetC5
+
+__all__ = ['Res2Net', 'Res2NetC5']
+
+
+@register
+@serializable
+class Res2Net(ResNet):
+    """
+    Res2Net, see https://arxiv.org/abs/1904.01169
+    Args:
+        depth (int): Res2Net depth, should be 50, 101, 152, 200.
+        width (int): Res2Net width
+        scales (int): Res2Net scale
+        freeze_at (int): freeze the backbone at which stage
+        norm_type (str): normalization type, 'bn'/'sync_bn'/'affine_channel'
+        freeze_norm (bool): freeze normalization layers
+        norm_decay (float): weight decay for normalization layer weights
+        variant (str): Res2Net variant, supports 'a', 'b', 'c', 'd' currently
+        feature_maps (list): index of stages whose feature maps are returned
+        dcn_v2_stages (list): index of stages who select deformable conv v2
+        nonlocal_stages (list): index of stages who select nonlocal networks
+    """
+    __shared__ = ['norm_type', 'freeze_norm', 'weight_prefix_name']
+
+    def __init__(self,
+                 depth=50,
+                 width=26,
+                 scales=4,
+                 freeze_at=2,
+                 norm_type='bn',
+                 freeze_norm=True,
+                 norm_decay=0.,
+                 variant='b',
+                 feature_maps=[2, 3, 4, 5],
+                 dcn_v2_stages=[],
+                 weight_prefix_name='',
+                 nonlocal_stages=[],):
+        super(Res2Net, self).__init__(depth=depth,
+                                     freeze_at=freeze_at,
+                                     norm_type=norm_type,
+                                     freeze_norm=freeze_norm,
+                                     norm_decay=norm_decay,
+                                     variant=variant,
+                                     feature_maps=feature_maps,
+                                     dcn_v2_stages=dcn_v2_stages,
+                                     weight_prefix_name=weight_prefix_name,
+                                     nonlocal_stages=nonlocal_stages)
+        
+        assert depth >= 50, "just support depth>=50 in res2net, but got depth=".format(depth)
+        # res2net config
+        self.scales = scales
+        self.width = width
+        basic_width = self.width * self.scales
+        self.num_filters1 = [basic_width * t for t in [1, 2, 4, 8]]
+        self.num_filters2 = [256 * t for t in [1, 2, 4, 8]]
+        self.num_filters = [64, 128, 384, 768]
+
+    def bottleneck(self,
+                   input,
+                   num_filters1,
+                   num_filters2,
+                   stride,
+                   is_first,
+                   name,
+                   dcn_v2=False):
+        conv0 = self._conv_norm(
+            input=input, 
+            num_filters=num_filters1, 
+            filter_size=1, 
+            stride=1, 
+            act='relu', 
+            name=name + '_branch2a')
+
+        xs = fluid.layers.split(conv0, self.scales, 1)
+        ys = []
+        for s in range(self.scales - 1):
+            if s == 0 or stride == 2:
+                ys.append(self._conv_norm(input=xs[s], 
+                                             num_filters=num_filters1//self.scales, 
+                                             stride=stride, 
+                                             filter_size=3, 
+                                             act='relu', 
+                                             name=name+ '_branch2b_' + str(s+1),
+                                             dcn_v2=dcn_v2))
+            else:
+                ys.append(self._conv_norm(input=xs[s]+ys[-1], 
+                                             num_filters=num_filters1//self.scales, 
+                                             stride=stride, 
+                                             filter_size=3, 
+                                             act='relu', 
+                                             name=name+ '_branch2b_' + str(s+1),
+                                             dcn_v2=dcn_v2)) 
+
+        if stride == 1:
+            ys.append(xs[-1])
+        else:
+            ys.append(fluid.layers.pool2d(input=xs[-1], 
+                                          pool_size=3, 
+                                          pool_stride=stride, 
+                                          pool_padding=1, 
+                                          pool_type='avg'))
+
+        conv1 = fluid.layers.concat(ys, axis=1)
+        conv2 = self._conv_norm(
+            input=conv1,
+            num_filters=num_filters2,
+            filter_size=1,
+            act=None, 
+            name=name+"_branch2c")
+
+        short = self._shortcut(input,
+                               num_filters2,
+                               stride,
+                               is_first,
+                               name=name + "_branch1")
+
+        return fluid.layers.elementwise_add(
+            x=short, y=conv2, act='relu', name=name + ".add.output.5")
+
+
+    def layer_warp(self, input, stage_num):
+        """
+        Args:
+            input (Variable): input variable.
+            stage_num (int): the stage number, should be 2, 3, 4, 5
+
+        Returns:
+            The last variable in endpoint-th stage.
+        """
+        assert stage_num in [2, 3, 4, 5]
+
+        stages, block_func = self.depth_cfg[self.depth]
+        count = stages[stage_num - 2]
+
+        ch_out = self.stage_filters[stage_num - 2]
+        is_first = False if stage_num != 2 else True
+        dcn_v2 = True if stage_num in self.dcn_v2_stages else False
+        
+        num_filters1 = self.num_filters1[stage_num-2]
+        num_filters2 = self.num_filters2[stage_num-2]
+        
+        nonlocal_mod = 1000
+        if stage_num in self.nonlocal_stages:
+            nonlocal_mod = self.nonlocal_mod_cfg[self.depth] if stage_num==4 else 2
+        
+        # Make the layer name and parameter name consistent
+        # with ImageNet pre-trained model
+        conv = input
+        for i in range(count):
+            conv_name = self.na.fix_layer_warp_name(stage_num, count, i)
+            if self.depth < 50:
+                is_first = True if i == 0 and stage_num == 2 else False
+            conv = block_func(
+                input=conv,
+                num_filters1=num_filters1,
+                num_filters2=num_filters2,
+                stride=2 if i == 0 and stage_num != 2 else 1,
+                is_first=is_first,
+                name=conv_name,
+                dcn_v2=dcn_v2)
+            
+            # add non local model
+            dim_in = conv.shape[1]
+            nonlocal_name = "nonlocal_conv{}".format( stage_num )
+            if i % nonlocal_mod == nonlocal_mod - 1:
+                conv = add_space_nonlocal(
+                    conv, dim_in, dim_in,
+                    nonlocal_name + '_{}'.format(i), int(dim_in / 2) )
+        return conv
+
+
+@register
+@serializable
+class Res2NetC5(Res2Net):
+    __doc__ = Res2Net.__doc__
+
+    def __init__(self,
+                 depth=50,
+                 width=26,
+                 scales=4,
+                 freeze_at=2,
+                 norm_type='bn',
+                 freeze_norm=True,
+                 norm_decay=0.,
+                 variant='b',
+                 feature_maps=[5],
+                 weight_prefix_name=''):
+        super(Res2NetC5, self).__init__(depth, width, scales, 
+                                       freeze_at, norm_type, freeze_norm,
+                                       norm_decay, variant, feature_maps)
+        self.severed_head = True
--- a/ppdet/modeling/ops.py
+++ b/ppdet/modeling/ops.py
@@ -207,31 +207,29 @@ class MultiClassNMS(object):
        self.nms_eta = nms_eta
        self.background_label = background_label

+
 @register
 @serializable
 class MultiClassSoftNMS(object):
-    def __init__(self,
-                 score_threshold=0.01,
-                 keep_top_k=300,
-                 softnms_sigma=0.5,
-                 normalized=False,
-                 background_label=0,
-                ):
+    def __init__(
+            self,
+            score_threshold=0.01,
+            keep_top_k=300,
+            softnms_sigma=0.5,
+            normalized=False,
+            background_label=0, ):
        super(MultiClassSoftNMS, self).__init__()
        self.score_threshold = score_threshold
        self.keep_top_k = keep_top_k
        self.softnms_sigma = softnms_sigma
        self.normalized = normalized
        self.background_label = background_label
-    
-    def __call__( self, bboxes, scores ):
-        
+
+    def __call__(self, bboxes, scores):
        def create_tmp_var(program, name, dtype, shape, lod_leval):
-            return program.current_block().create_var(name=name, 
-                                                      dtype=dtype, 
-                                                      shape=shape, 
-                                                      lod_leval=lod_leval)
-        
+            return program.current_block().create_var(
+                name=name, dtype=dtype, shape=shape, lod_leval=lod_leval)
+
        def _soft_nms_for_cls(dets, sigma, thres):
            """soft_nms_for_cls"""
            dets_final = []
@@ -240,6 +238,8 @@ class MultiClassSoftNMS(object):
                dets_final.append(dets[maxpos].copy())
                ts, tx1, ty1, tx2, ty2 = dets[maxpos]
                scores = dets[:, 0]
+                # force remove bbox at maxpos
+                scores[maxpos] = -1
                x1 = dets[:, 1]
                y1 = dets[:, 2]
                x2 = dets[:, 3]
@@ -253,65 +253,69 @@ class MultiClassSoftNMS(object):
                w = np.maximum(0.0, xx2 - xx1 + eta)
                h = np.maximum(0.0, yy2 - yy1 + eta)
                inter = w * h
-                ovr = inter / (areas + areas[maxpos] - inter) 
+                ovr = inter / (areas + areas[maxpos] - inter)
                weight = np.exp(-(ovr * ovr) / sigma)
                scores = scores * weight
                idx_keep = np.where(scores >= thres)
                dets[:, 0] = scores
                dets = dets[idx_keep]
            dets_final = np.array(dets_final).reshape(-1, 5)
-            return dets_final 
+            return dets_final

        def _soft_nms(bboxes, scores):
            bboxes = np.array(bboxes)
            scores = np.array(scores)
            class_nums = scores.shape[-1]
-            
+
            softnms_thres = self.score_threshold
            softnms_sigma = self.softnms_sigma
            keep_top_k = self.keep_top_k
-            
+
            cls_boxes = [[] for _ in range(class_nums)]
            cls_ids = [[] for _ in range(class_nums)]
-                        
+
            start_idx = 1 if self.background_label == 0 else 0
            for j in range(start_idx, class_nums):
                inds = np.where(scores[:, j] >= softnms_thres)[0]
                scores_j = scores[inds, j]
                rois_j = bboxes[inds, j, :]
-                dets_j = np.hstack((scores_j[:, np.newaxis], rois_j)).astype(np.float32, copy=False)
+                dets_j = np.hstack((scores_j[:, np.newaxis], rois_j)).astype(
+                    np.float32, copy=False)
                cls_rank = np.argsort(-dets_j[:, 0])
                dets_j = dets_j[cls_rank]

-                cls_boxes[j] = _soft_nms_for_cls( dets_j, sigma=softnms_sigma, thres=softnms_thres )
-                cls_ids[j] = np.array( [j]*cls_boxes[j].shape[0] ).reshape(-1,1)
-            
+                cls_boxes[j] = _soft_nms_for_cls(
+                    dets_j, sigma=softnms_sigma, thres=softnms_thres)
+                cls_ids[j] = np.array([j] * cls_boxes[j].shape[0]).reshape(-1,
+                                                                           1)
+
            cls_boxes = np.vstack(cls_boxes[start_idx:])
            cls_ids = np.vstack(cls_ids[start_idx:])
-            pred_result = np.hstack( [cls_ids, cls_boxes] )
+            pred_result = np.hstack([cls_ids, cls_boxes])

            # Limit to max_per_image detections **over all classes**
-            image_scores = cls_boxes[:,0]
+            image_scores = cls_boxes[:, 0]
            if len(image_scores) > keep_top_k:
                image_thresh = np.sort(image_scores)[-keep_top_k]
                keep = np.where(cls_boxes[:, 0] >= image_thresh)[0]
                pred_result = pred_result[keep, :]
-            
+
            res = fluid.LoDTensor()
            res.set_lod([[0, pred_result.shape[0]]])
            if pred_result.shape[0] == 0:
-                pred_result = np.array( [[1]], dtype=np.float32 )
+                pred_result = np.array([[1]], dtype=np.float32)
            res.set(pred_result, fluid.CPUPlace())
-            
+
            return res
-        
-        pred_result = create_tmp_var(fluid.default_main_program(), 
-                                     name='softnms_pred_result', 
-                                     dtype='float32', 
-                                     shape=[6],
-                                     lod_leval=1)
-        fluid.layers.py_func(func=_soft_nms,
-                        x=[bboxes, scores], out=pred_result)
+
+        pred_result = create_tmp_var(
+            fluid.default_main_program(),
+            name='softnms_pred_result',
+            dtype='float32',
+            shape=[6],
+            lod_leval=1)
+        fluid.layers.py_func(
+            func=_soft_nms, x=[bboxes, scores], out=pred_result)
        return pred_result