Rename object_detection to PaddleDetection. (#2601)

* Rename object_detection to PaddleDetection * Small fix for doc

Rename object_detection to PaddleDetection. (#2601)
* Rename object_detection to PaddleDetection * Small fix for doc
3e57b4c3 · qingqing01 · GitHub · 3e57b4c3 · 3e57b4c3 · 3e57b4c3
136 changed file
--- a/.gitignore
+++ b/.gitignore
+# Virtualenv
+/.venv/
+/venv/
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+# C extensions
+*.so
+# json file
+*.json
+# Distribution / packaging
+/bin/
+/build/
+/develop-eggs/
+/dist/
+/eggs/
+/lib/
+/lib64/
+/output/
+/parts/
+/sdist/
+/var/
+/*.egg-info/
+/.installed.cfg
+/*.egg
+/.eggs
+# AUTHORS and ChangeLog will be generated while packaging
+/AUTHORS
+/ChangeLog
+# BCloud / BuildSubmitter
+/build_submitter.*
+/logger_client_log
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+.tox/
+.coverage
+.cache
+.pytest_cache
+nosetests.xml
+coverage.xml
+# Translations
+*.mo
+# Sphinx documentation
+/docs/_build/
+*.json
--- a/.style.yapf
+++ b/.style.yapf
+[style]
+based_on_style = pep8
+column_limit = 80
--- a/README.md
+++ b/README.md
+# PaddleDetection
+The goal of PaddleDetection is to provide easy access to a wide range of object
+detection models in both industry and research settings. We design
+PaddleDetection to be not only performant, production-ready but also highly
+flexible, catering to research needs.
+<div align="center">
+  <img src="demo/output/000000570688.jpg" />
+</div>
+## Introduction
+Design Principles:
+- Production Ready:
+Key operations are implemented in C++ and CUDA, together with PaddlePaddle's
+highly efficient inference engine, enables easy deployment in server environments.
+- Highly Flexible:
+Components are designed to be modular. Model architectures, as well as data
+preprocess pipelines, can be easily customized with simple configuration
+changes.
+- Performance Optimized:
+With the help of the underlying PaddlePaddle framework, faster training and
+reduced GPU memory footprint is achieved. Notably, Yolo V3 training is
+much faster compared to other frameworks. Another example is Mask-RCNN
+(ResNet50), we managed to fit up to 5 images per GPU (V100 16GB) during
+training.
+Supported Architectures:
+|                    | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt | SENet | MobileNet | DarkNet |
+|--------------------|:------:|------------------------------:|:-------:|:-----:|:---------:|:-------:|
+| Faster R-CNN       | ✓      |                             ✓ | ✓       | ✓     | ✗         | ✗       |
+| Faster R-CNN + FPN | ✓      |                             ✓ | ✓       | ✓     | ✗         | ✗       |
+| Mask R-CNN         | ✓      |                             ✓ | ✓       | ✓     | ✗         | ✗       |
+| Mask R-CNN + FPN   | ✓      |                             ✓ | ✓       | ✓     | ✗         | ✗       |
+| Cascade R-CNN      | ✓      |                             ✗ | ✗       | ✗     | ✗         | ✗       |
+| RetinaNet          | ✓      |                             ✗ | ✗       | ✗     | ✗         | ✗       |
+| Yolov3             | ✓      |                             ✗ | ✗       | ✗     | ✓         | ✓       |
+| SSD                | ✗      |                             ✗ | ✗       | ✗     | ✓         | ✗       |
+<a name="vd">[1]</a> ResNet-vd models offer much improved accuracy with negligible performance cost.
+Advanced Features:
+- [x] **Synchronized Batch Norm**: currently used by Yolo V3.
+- [x] **Group Norm**: pretrained models to be released.
+- [x] **Modulated Deformable Convolution**: pretrained models to be released.
+- [x] **Deformable PSRoI Pooling**: pretrained models to be released.
+## Model zoo
+Pretrained models are available in the PaddlePaddle [detection model zoo](docs/MODEL_ZOO.md).
+## Installation
+Please follow the [installation guide](docs/INSTALL.md).
+## Get Started
+For inference, simply run the following command and the visualized result will
+be saved in `output/`.
+```bash
+export PYTHONPATH=`pwd`:$PYTHONPATH
+python tools/infer.py -c configs/mask_rcnn_r50_1x.yml \
+    -o weights=https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_1x.tar
+    -infer_img=demo/000000570688.jpg
+```
+For detailed training and evaluation workflow, please refer to [GETTING_STARTED.md](docs/GETTING_STARTED.md).
+We also recommend users to take a look at the [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
+Further information can be found in these documentations:
+- [Introduction to the configuration workflow.](docs/CONFIG.md)
+- [Guide to custom dataset and preprocess pipeline.](docs/DATA.md)
+##  Todo List
+Please note this is a work in progress, substantial changes may come in the
+near future.
+Some of the planned features include:
+- [ ] Mixed precision training.
+- [ ] Distributed training.
+- [ ] Inference in 8-bit mode.
+- [ ] User defined operations.
+- [ ] Larger model zoo.
+## Updates
+#### Initial release (7/3/2019)
+- Initial release of PaddleDetection and detection model zoo
+- Models included: Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
+  R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, Yolo v3, and SSD.
+## Contributing
+Contributions are highly welcomed and we would really appreciate your feedback!!
--- a/configs/cascade_rcnn_r50_fpn_1x.yml
+++ b/configs/cascade_rcnn_r50_fpn_1x.yml
+architecture: CascadeRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+weights: output/cascade_rcnn_r50_fpn_1x/model_final
+metric: COCO
+CascadeRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: CascadeBBoxHead
+  bbox_assigner: CascadeBBoxAssigner
+ResNet:
+  norm_type: affine_channel
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  variant: b
+FPN:
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_positive_overlap: 0.7
+    rpn_negative_overlap: 0.3
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  min_level: 2
+  max_level: 5
+  box_resolution: 7
+  sampling_ratio: 2
+CascadeBBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [10, 20, 30]
+  bg_thresh_lo: [0.0, 0.0, 0.0]
+  bg_thresh_hi: [0.5, 0.6, 0.7]
+  fg_thresh: [0.5, 0.6, 0.7]
+  fg_fraction: 0.25
+  num_classes: 81
+CascadeBBoxHead:
+  head: FC6FC7Head
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+FC6FC7Head:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
--- a/configs/faster_rcnn_r101_1x.yml
+++ b/configs/faster_rcnn_r101_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 10000
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+metric: COCO
+weights: output/faster_rcnn_r101_1x/model_final
+FasterRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  norm_type: affine_channel
+  depth: 101
+  feature_maps: 4
+  freeze_at: 2
+ResNetC5:
+  norm_type: affine_channel
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+    use_random: true
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+RoIAlign:
+  resolution: 14
+  sampling_ratio: 0
+  spatial_scale: 0.0625
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [12000, 16000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  drop_last: false
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/faster_rcnn_r101_fpn_1x.yml
+++ b/configs/faster_rcnn_r101_fpn_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+weights: output/faster_rcnn_r101_fpn_1x/model_final
+metric: COCO
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_r101_fpn_2x.yml
+++ b/configs/faster_rcnn_r101_fpn_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 360000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+weights: output/faster_rcnn_r101_fpn_2x/model_final
+metric: COCO
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_r101_vd_fpn_1x.yml
+++ b/configs/faster_rcnn_r101_vd_fpn_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar
+weights: output/faster_rcnn_r101_vd_fpn_1x/model_final
+metric: COCO
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+  variant: d
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 1000
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_r101_vd_fpn_2x.yml
+++ b/configs/faster_rcnn_r101_vd_fpn_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 360000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar
+weights: output/faster_rcnn_r101_vd_fpn_2x/model_final
+metric: COCO
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+  variant: d
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 1000
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_r50_1x.yml
+++ b/configs/faster_rcnn_r50_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 10000
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/faster_rcnn_r50_1x/model_final
+FasterRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  norm_type: affine_channel
+  depth: 50
+  feature_maps: 4
+  freeze_at: 2
+ResNetC5:
+  norm_type: affine_channel
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+    use_random: true
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+RoIAlign:
+  resolution: 14
+  sampling_ratio: 0
+  spatial_scale: 0.0625
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [12000, 16000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  drop_last: false
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/faster_rcnn_r50_2x.yml
+++ b/configs/faster_rcnn_r50_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+use_gpu: true
+max_iters: 360000
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 10000
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/faster_rcnn_r50_2x/model_final
+FasterRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  norm_type: affine_channel
+  depth: 50
+  feature_maps: 4
+  freeze_at: 2
+ResNetC5:
+  norm_type: affine_channel
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+    use_random: true
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+RoIAlign:
+  resolution: 14
+  sampling_ratio: 0
+  spatial_scale: 0.0625
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [24000, 32000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  drop_last: false
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/faster_rcnn_r50_fpn_1x.yml
+++ b/configs/faster_rcnn_r50_fpn_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+use_gpu: true
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/fpn/faster_rcnn_r50_fpn_1x/model_final
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+FPN:
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_positive_overlap: 0.7
+    rpn_negative_overlap: 0.3
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  min_level: 2
+  max_level: 5
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_lo: 0.0
+  bg_thresh_hi: 0.5
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
--- a/configs/faster_rcnn_r50_fpn_2x.yml
+++ b/configs/faster_rcnn_r50_fpn_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+use_gpu: true
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/faster_rcnn_r50_fpn_2x/model_final
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+FPN:
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_positive_overlap: 0.7
+    rpn_negative_overlap: 0.3
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  min_level: 2
+  max_level: 5
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_lo: 0.0
+  bg_thresh_hi: 0.5
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: coco/annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
--- a/configs/faster_rcnn_r50_vd_1x.yml
+++ b/configs/faster_rcnn_r50_vd_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+log_smooth_window: 20
+save_dir: output/faster-r50-vd-c4-1x
+snapshot_iter: 10000
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
+metric: COCO
+weights: output/faster_rcnn_r50_vd_1x/model_final
+FasterRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  norm_type: affine_channel
+  depth: 50
+  feature_maps: 4
+  freeze_at: 2
+  variant: d
+ResNetC5:
+  norm_type: affine_channel
+  variant: d
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+    use_random: true
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+RoIAlign:
+  resolution: 14
+  sampling_ratio: 0
+  spatial_scale: 0.0625
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [12000, 16000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  drop_last: false
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/faster_rcnn_r50_vd_fpn_2x.yml
+++ b/configs/faster_rcnn_r50_vd_fpn_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
+weights: output/faster_rcnn_r50_vd_fpn_2x/model_final
+metric: COCO
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+  variant: d
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_se154_vd_1x.yml
+++ b/configs/faster_rcnn_se154_vd_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
+weights: output/faster_rcnn_se154_1x/model_final
+metric: COCO
+FasterRCNN:
+  backbone: SENet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+SENet:
+  depth: 152
+  feature_maps: 4
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+  variant: d
+SENetC5:
+  depth: 152
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+  variant: d
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 12000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 6000
+RoIAlign:
+  resolution: 7
+  sampling_ratio: 0
+  spatial_scale: 0.0625
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: SENetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  num_workers: 2
--- a/configs/faster_rcnn_se154_vd_fpn_1x.yml
+++ b/configs/faster_rcnn_se154_vd_fpn_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
+weights: output/faster_rcnn_se154_fpn_1x/model_final
+metric: COCO
+FasterRCNN:
+  backbone: SENet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+SENet:
+  depth: 152
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+  variant: d
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_se154_vd_fpn_s1x.yml
+++ b/configs/faster_rcnn_se154_vd_fpn_s1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 260000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
+weights: output/faster_rcnn_se154_fpn_s1x/model_final
+metric: COCO
+FasterRCNN:
+  backbone: SENet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+SENet:
+  depth: 152
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+  variant: d
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [200000, 240000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_x101_64x4d_fpn_1x.yml
+++ b/configs/faster_rcnn_x101_64x4d_fpn_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_64x4d_pretrained.tar
+weights: output/faster_rcnn_x101_64x4d_fpn_1x/model_final
+metric: COCO
+FasterRCNN:
+  backbone: ResNeXt
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNeXt:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_x101_64x4d_fpn_2x.yml
+++ b/configs/faster_rcnn_x101_64x4d_fpn_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_64x4d_pretrained.tar
+weights: output/faster_rcnn_x101_64x4d_fpn_2x/model_final
+metric: COCO
+FasterRCNN:
+  backbone: ResNeXt
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNeXt:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_r101_fpn_1x.yml
+++ b/configs/mask_rcnn_r101_fpn_1x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r101_fpn_1x/model_final/
+MaskRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+MaskAssigner:
+  resolution: 28
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_r101_fpn_2x.yml
+++ b/configs/mask_rcnn_r101_fpn_2x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 360000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r101_fpn_2x/model_final/
+MaskRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+MaskAssigner:
+  resolution: 28
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_r50_1x.yml
+++ b/configs/mask_rcnn_r50_1x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r50_1x/model_final
+MaskRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_assigner: BBoxAssigner
+  bbox_head: BBoxHead
+  mask_assigner: MaskAssigner
+  mask_head: MaskHead
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 50
+  feature_maps: 4
+  freeze_at: 2
+ResNetC5:
+  norm_type: affine_channel
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+RoIAlign:
+  resolution: 14
+  spatial_scale: 0.0625
+  sampling_ratio: 0
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    normalized: false
+    score_threshold: 0.05
+  num_classes: 81
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  resolution: 14
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+MaskAssigner:
+  num_classes: 81
+  resolution: 14
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 2
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/mask_rcnn_r50_2x.yml
+++ b/configs/mask_rcnn_r50_2x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 360000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r50_2x/model_final/
+MaskRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_assigner: BBoxAssigner
+  bbox_head: BBoxHead
+  mask_assigner: MaskAssigner
+  mask_head: MaskHead
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 50
+  feature_maps: 4
+  freeze_at: 2
+ResNetC5:
+  norm_type: affine_channel
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+RoIAlign:
+  resolution: 14
+  spatial_scale: 0.0625
+  sampling_ratio: 0
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    normalized: false
+    score_threshold: 0.05
+  num_classes: 81
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  resolution: 14
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+MaskAssigner:
+  num_classes: 81
+  resolution: 14
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  #start the warm up from base_lr * start_factor
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 2
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/mask_rcnn_r50_fpn_1x.yml
+++ b/configs/mask_rcnn_r50_fpn_1x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r50_fpn_1x/model_final/
+MaskRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+MaskAssigner:
+  resolution: 28
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_r50_fpn_2x.yml
+++ b/configs/mask_rcnn_r50_fpn_2x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 360000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r50_fpn_2x/model_final/
+MaskRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+MaskAssigner:
+  resolution: 28
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_r50_vd_fpn_2x.yml
+++ b/configs/mask_rcnn_r50_vd_fpn_2x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 360000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r50_vd_fpn_2x/model_final/
+MaskRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+ResNet:
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+  variant: d
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+  mask_resolution: 14
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+MaskAssigner:
+  resolution: 28
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+MaskRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_se154_vd_fpn_s1x.yml
+++ b/configs/mask_rcnn_se154_vd_fpn_s1x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+max_iters: 260000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
+weights: output/mask_rcnn_se154_vd_fpn_s1x/model_final/
+metric: COCO
+MaskRCNN:
+  backbone: SENet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+SENet:
+  depth: 152
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+  variant: d
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+  mask_resolution: 14
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+MaskAssigner:
+  resolution: 28
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+TwoFCHead:
+  num_chan: 1024
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [200000, 240000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+MaskRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco 
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/retinanet_r101_fpn_1x.yml
+++ b/configs/retinanet_r101_fpn_1x.yml
+architecture: RetinaNet
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+use_gpu: true
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+weights: output/retinanet_r101_fpn_1x/model_final
+log_smooth_window: 20
+snapshot_iter: 10000
+metric: COCO
+save_dir: output
+RetinaNet:
+  backbone: ResNet
+  fpn: FPN
+  retina_head: RetinaHead
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 101
+  feature_maps: [3, 4, 5]
+  freeze_at: 2
+FPN:
+  max_level: 7
+  min_level: 3
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125]
+  has_extra_convs: true
+RetinaHead:
+  num_convs_per_octave: 4
+  num_chan: 256
+  max_level: 7
+  min_level: 3
+  prior_prob: 0.01
+  base_scale: 4
+  num_scales_per_octave: 3
+  num_classes: 81
+  anchor_generator:
+    aspect_ratios: [1.0, 2.0, 0.5]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  target_assign:
+    positive_overlap: 0.5
+    negative_overlap: 0.4
+  gamma: 2.0
+  alpha: 0.25
+  sigma: 3.0151134457776365
+  output_decoder:
+    score_thresh: 0.05
+    nms_thresh: 0.5
+    pre_nms_top_n: 1000
+    detections_per_im: 100
+    nms_eta: 1.0
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  batch_size: 2
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  dataset:
+    dataset_dir: data/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 2
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  dataset:
+    dataset_dir: data/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  dataset:
+    annotation: annotations/instances_val2017.json
+  num_workers: 2
--- a/configs/retinanet_r50_fpn_1x.yml
+++ b/configs/retinanet_r50_fpn_1x.yml
+architecture: RetinaNet
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+use_gpu: true
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+weights: output/retinanet_r50_fpn_1x/model_final
+log_smooth_window: 20
+snapshot_iter: 10000
+metric: COCO
+save_dir: output
+RetinaNet:
+  backbone: ResNet
+  fpn: FPN
+  retina_head: RetinaHead
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 50
+  feature_maps: [3, 4, 5]
+  freeze_at: 2
+FPN:
+  max_level: 7
+  min_level: 3
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125]
+  has_extra_convs: true
+RetinaHead:
+  num_convs_per_octave: 4
+  num_chan: 256
+  max_level: 7
+  min_level: 3
+  prior_prob: 0.01
+  base_scale: 4
+  num_scales_per_octave: 3
+  num_classes: 81
+  anchor_generator:
+    aspect_ratios: [1.0, 2.0, 0.5]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  target_assign:
+    positive_overlap: 0.5
+    negative_overlap: 0.4
+  gamma: 2.0
+  alpha: 0.25
+  sigma: 3.0151134457776365
+  output_decoder:
+    score_thresh: 0.05
+    nms_thresh: 0.5
+    pre_nms_top_n: 1000
+    detections_per_im: 100
+    nms_eta: 1.0
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+FasterRCNNTrainFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  num_workers: 2
+FasterRCNNEvalFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  num_workers: 2
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  num_workers: 2
--- a/configs/ssd_mobilenet_v1_voc.yml
+++ b/configs/ssd_mobilenet_v1_voc.yml
+architecture: SSD
+max_iters: 28000
+train_feed: SSDTrainFeed
+eval_feed: SSDEvalFeed
+test_feed: SSDTestFeed
+pretrain_weights: ./ssd3/
+use_gpu: true
+snapshot_iter: 2000
+log_smooth_window: 1
+metric: VOC
+save_dir: output
+weights: output/ssd_mobilenet_v1_voc/model_final/
+SSD:
+  backbone: MobileNet
+  multi_box_head: MultiBoxHead
+  num_classes: 21
+  metric:
+    ap_version: 11point
+    evaluate_difficult: false
+    overlap_threshold: 0.5
+  output_decoder:
+    background_label: 0
+    keep_top_k: 200
+    nms_eta: 1.0
+    nms_threshold: 0.45
+    nms_top_k: 400
+    score_threshold: 0.01
+MobileNet:
+  norm_decay: 0.
+  conv_group_scale: 1
+  extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]]
+  with_extra_blocks: true
+MultiBoxHead:
+  aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]]
+  base_size: 300
+  flip: true
+  max_ratio: 90
+  max_sizes: [[], 150.0, 195.0, 240.0, 285.0, 300.0]
+  min_ratio: 20
+  min_sizes: [60.0, 105.0, 150.0, 195.0, 240.0, 285.0]
+  offset: 0.5
+LearningRate:
+  schedulers:
+  - !PiecewiseDecay
+    milestones: [10000, 15000, 20000, 25000]
+    values: [0.001, 0.0005, 0.00025, 0.0001, 0.00001]
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.0
+    type: RMSPropOptimizer
+  regularizer:
+    factor: 0.00005
+    type: L2
+SSDTrainFeed:
+  batch_size: 32
+  use_process: true
+  dataset:
+    dataset_dir: dataset/voc
+    annotation: VOCdevkit/VOC_all/ImageSets/Main/train.txt
+    image_dir: VOCdevkit/VOC_all/JPEGImages
+    use_default_label: true
+SSDEvalFeed:
+  batch_size: 64
+  use_process: true
+  dataset:
+    dataset_dir: dataset/voc
+    annotation: VOCdevkit/VOC_all/ImageSets/Main/val.txt
+    image_dir: VOCdevkit/VOC_all/JPEGImages
+    use_default_label: true
+  drop_last: false
+SSDTestFeed:
+  batch_size: 1
+  dataset:
+    use_default_label: true
+  drop_last: false
--- a/configs/yolov3_darknet.yml
+++ b/configs/yolov3_darknet.yml
+architecture: YOLOv3
+train_feed: YoloTrainFeed
+eval_feed: YoloEvalFeed
+test_feed: YoloTestFeed
+use_gpu: true
+max_iters: 500200
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 2000
+metric: COCO
+pretrain_weights: https://paddlemodels.bj.bcebos.com/yolo/darknet53.tar.gz
+weights: https://paddlemodels.bj.bcebos.com/yolo/yolov3.tar.gz
+YOLOv3:
+  backbone: DarkNet
+  yolo_head: YOLOv3Head
+DarkNet:
+  norm_type: sync_bn
+  norm_decay: 0.
+  depth: 53
+YOLOv3Head:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  norm_decay: 0.
+  ignore_thresh: 0.7
+  label_smooth: true
+  nms:
+    background_label: -1
+    keep_top_k: 100
+    nms_threshold: 0.45
+    nms_top_k: 1000
+    normalized: false
+    score_threshold: 0.01
+  num_classes: 80
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 400000
+    - 450000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+YoloTrainFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 8
+  bufsize: 128
+  use_process: true
+YoloEvalFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+YoloTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/yolov3_mobilenet_v1.yml
+++ b/configs/yolov3_mobilenet_v1.yml
+architecture: YOLOv3
+train_feed: YoloTrainFeed
+eval_feed: YoloEvalFeed
+test_feed: YoloTestFeed
+use_gpu: true
+max_iters: 500200
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 2000
+metric: COCO
+pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar
+weights: https://paddlemodels.bj.bcebos.com/yolo/yolo_mobilenet1.0.tar.gz
+YOLOv3:
+  backbone: MobileNet
+  yolo_head: YOLOv3Head
+MobileNet:
+  norm_type: sync_bn
+  norm_decay: 0.
+  conv_group_scale: 1
+  with_extra_blocks: false
+YOLOv3Head:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  norm_decay: 0.
+  ignore_thresh: 0.7
+  label_smooth: true
+  nms:
+    background_label: -1
+    keep_top_k: 100
+    nms_threshold: 0.45
+    nms_top_k: 1000
+    normalized: false
+    score_threshold: 0.01
+  num_classes: 80
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 400000
+    - 450000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+YoloTrainFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 8
+  bufsize: 128
+  use_process: true
+YoloEvalFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+YoloTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/yolov3_r34.yml
+++ b/configs/yolov3_r34.yml
+architecture: YOLOv3
+train_feed: YoloTrainFeed
+eval_feed: YoloEvalFeed
+test_feed: YoloTestFeed
+use_gpu: true
+max_iters: 500200
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 2000
+metric: COCO
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar
+weights: https://paddlemodels.bj.bcebos.com/yolo/yolo_resnet34.tar.gz
+YOLOv3:
+  backbone: ResNet
+  yolo_head: YOLOv3Head
+ResNet:
+  norm_type: sync_bn
+  freeze_at: 0
+  freeze_norm: false
+  norm_decay: 0.
+  depth: 34
+  feature_maps: [3, 4, 5]
+YOLOv3Head:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  norm_decay: 0.
+  ignore_thresh: 0.7
+  label_smooth: true
+  nms:
+    background_label: -1
+    keep_top_k: 100
+    nms_threshold: 0.45
+    nms_top_k: 1000
+    normalized: false
+    score_threshold: 0.01
+  num_classes: 80
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 400000
+    - 450000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+YoloTrainFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 8
+  bufsize: 128
+  use_process: true
+YoloEvalFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+YoloTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/dataset/coco/download.sh
+++ b/dataset/coco/download.sh
+DIR="$( cd "$(dirname "$0")" ; pwd -P )"
+cd "$DIR"
+# Download the data.
+echo "Downloading..."
+wget http://images.cocodataset.org/zips/train2014.zip
+wget http://images.cocodataset.org/zips/val2014.zip
+wget http://images.cocodataset.org/zips/train2017.zip
+wget http://images.cocodataset.org/zips/val2017.zip
+wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
+wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
+# Extract the data.
+echo "Extracting..."
+unzip train2014.zip
+unzip val2014.zip
+unzip train2017.zip
+unzip val2017.zip
+unzip annotations_trainval2014.zip
+unzip annotations_trainval2017.zip
--- a/dataset/voc/download.sh
+++ b/dataset/voc/download.sh
+DIR="$( cd "$(dirname "$0")" ; pwd -P )"
+cd "$DIR"
+# Download the data.
+echo "Downloading..."
+wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
+wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
+wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
+# Extract the data.
+echo "Extracting..."
+tar -xf VOCtrainval_11-May-2012.tar
+tar -xf VOCtrainval_06-Nov-2007.tar
+tar -xf VOCtest_06-Nov-2007.tar
+echo "Creating data lists..."
+python -c 'from ppdet.utils.voc_utils import merge_and_create_list; merge_and_create_list("VOCdevkit", ["2007", "2012"], "VOCdevkit/VOC_all")'
--- a/demo/000000014439.jpg
+++ b/demo/000000014439.jpg
--- a/demo/000000087038.jpg
+++ b/demo/000000087038.jpg
--- a/demo/000000570688.jpg
+++ b/demo/000000570688.jpg
--- a/demo/mask_rcnn_demo.ipynb
+++ b/demo/mask_rcnn_demo.ipynb
--- a/demo/output/000000570688.jpg
+++ b/demo/output/000000570688.jpg
--- a/docs/CONFIG.md
+++ b/docs/CONFIG.md
+# Introduction
+PaddleDetection takes a rather principled approach to configuration management. We aim to automate the configuration workflow and to reduce configuration errors.
+# Rationale
+Presently, configuration in mainstream frameworks are usually dictionary based: the global config is simply a giant, loosely defined Python dictionary.
+This approach is error prone, e.g., misspelled or displaced keys may lead to serious errors in training process, causing time loss and wasted resources.
+To avoid the common pitfalls, with automation and static analysis in mind, we propose a configuration design that is user friendly, easy to maintain and extensible.
+# Design
+The design utilizes some of Python's reflection mechanism to extract configuration schematics from Python class definitions.
+To be specific, it extracts information from class constructor arguments, including names, docstrings, default values, data types (if type hints are available).
+This approach advocates modular and testable design, leading to a unified and extensible code base.
+## API
+Most of the functionality is exposed in `ppdet.core.workspace` module.
+-   `register`: This decorator register a class as configurable module; it understands several special annotations in the class definition.
+    -   `__category__`: For better organization, modules are classified into categories.
+    -   `__inject__`: A list of constructor arguments, which are intended to take module instances as input, module instances will be created at runtime an injected. The corresponding configuration value can be a class name string, a serialized object, a config key pointing to a serialized object, or a dict (in which case the constructor needs to handle it, see example below).
+    -   `__op__`: Shortcut for wrapping PaddlePaddle operators into a callable objects, together with `__append_doc__` (extracting docstring from target PaddlePaddle operator automatically), this can be a real time saver.
+-   `serializable`: This decorator make a class directly serializable in yaml config file, by taking advantage of [pyyaml](https://pyyaml.org/wiki/PyYAMLDocumentation)'s serialization mechanism.
+-   `create`: Constructs a module instance according to global configuration.
+-   `load_config` and `merge_config`: Loading yaml file and merge config settings from command line.
+## Example
+Take the `RPNHead` module for example, it is composed of several PaddlePaddle operators. We first wrap those operators into classes, then pass in instances of these classes when instantiating the `RPNHead` module.
+```python
+# excerpt from `ppdet/modeling/ops.py`
+from ppdet.core.workspace import register, serializable
+# ... more operators
+@register
+@serializable
+class GenerateProposals(object):
+    # NOTE this class simply wraps a PaddlePaddle operator
+    __op__ = fluid.layers.generate_proposals
+    # NOTE docstring for args are extracted from PaddlePaddle OP
+    __append_doc__ = True
+    def __init__(self,
+                 pre_nms_top_n=6000,
+                 post_nms_top_n=1000,
+                 nms_thresh=.5,
+                 min_size=.1,
+                 eta=1.):
+        super(GenerateProposals, self).__init__()
+        self.pre_nms_top_n = pre_nms_top_n
+        self.post_nms_top_n = post_nms_top_n
+        self.nms_thresh = nms_thresh
+        self.min_size = min_size
+        self.eta = eta
+# ... more operators
+# excerpt from `ppdet/modeling/anchor_heads/rpn_head.py`
+from ppdet.core.workspace import register
+from ppdet.modeling.ops import AnchorGenerator, RPNTargetAssign, GenerateProposals
+@register
+class RPNHead(object):
+    """
+    RPN Head
+    Args:
+        anchor_generator (object): `AnchorGenerator` instance
+        rpn_target_assign (object): `RPNTargetAssign` instance
+        train_proposal (object): `GenerateProposals` instance for training
+        test_proposal (object): `GenerateProposals` instance for testing
+    """
+    __inject__ = [
+        'anchor_generator', 'rpn_target_assign', 'train_proposal',
+        'test_proposal'
+    ]
+    def __init__(self,
+                 anchor_generator=AnchorGenerator().__dict__,
+                 rpn_target_assign=RPNTargetAssign().__dict__,
+                 train_proposal=GenerateProposals(12000, 2000).__dict__,
+                 test_proposal=GenerateProposals().__dict__):
+        super(RPNHead, self).__init__()
+        self.anchor_generator = anchor_generator
+        self.rpn_target_assign = rpn_target_assign
+        self.train_proposal = train_proposal
+        self.test_proposal = test_proposal
+        if isinstance(anchor_generator, dict):
+            self.anchor_generator = AnchorGenerator(**anchor_generator)
+        if isinstance(rpn_target_assign, dict):
+            self.rpn_target_assign = RPNTargetAssign(**rpn_target_assign)
+        if isinstance(train_proposal, dict):
+            self.train_proposal = GenerateProposals(**train_proposal)
+        if isinstance(test_proposal, dict):
+            self.test_proposal = GenerateProposals(**test_proposal)
+```
+The corresponding(generated) YAML snippet is as follows, note this is the configuration in **FULL**, all the default values can be omitted. In case of the above example, all arguments have default value, meaning nothing is required in the config file.
+```yaml
+RPNHead:
+  test_prop:
+    eta: 1.0
+    min_size: 0.1
+    nms_thresh: 0.5
+    post_nms_top_n: 1000
+    pre_nms_top_n: 6000
+  train_prop:
+    eta: 1.0
+    min_size: 0.1
+    nms_thresh: 0.5
+    post_nms_top_n: 2000
+    pre_nms_top_n: 12000
+  anchor_generator:
+    # ...
+  rpn_target_assign:
+    # ...
+```
+Example snippet that make use of the `RPNHead` module.
+```python
+from ppdet.core.worskspace import load_config, merge_config, create
+load_config('some_config_file.yml')
+merge_config(more_config_options_from_command_line)
+rpn_head = create('RPNHead')
+# ... code that use the created module!
+```
+Configuration file can also have serialized objects in it, denoted with `!`, for example
+```yaml
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+```
+# Requirements
+Two Python packages are used, both are optional.
+-   [typeguard](https://github.com/agronholm/typeguard) is used for type checking in Python 3.
+-   [docstring\_parser](https://github.com/rr-/docstring_parser) is needed for docstring parsing.
+To install them, simply run:
+```shell
+pip install typeguard http://github.com/willthefrog/docstring_parser/tarball/master
+```
+# Tooling
+A small utility (`tools/configure.py`) is included to simplify the configuration process, it provides 4 commands to walk users through the configuration process:
+1.  `list`: List currently registered modules by category, one can also specify which category to list with the `--category` flag.
+2.  `help`: Get help information for a module, including description, options, configuration template and example command line flags.
+3.  `analyze`: Check configuration file for missing/extraneous options, options with mismatch type (if type hint is given) and missing dependencies, it also highlights user provided values (overridden default values).
+4.  `generate`: Generate a configuration template for a given list of modules. By default it generates a complete configuration file, which can be quite verbose; if a `--minimal` flag is given, it generates a template that only contain non optional settings. For example, to generate a configuration for Faster R-CNN architecture with `ResNet` backbone and `FPN`, run:
+    ```shell
+    python tools/configure.py generate FasterRCNN ResNet RPNHead RoIAlign BBoxAssigner BBoxHead FasterRCNNTrainFeed FasterRCNNTestFeed LearningRate OptimizerBuilder
+    ```
+    For a minimal version, run:
+    ```shell
+    python tools/configure.py --minimal generate FasterRCNN BBoxHead
+    ```
--- a/docs/DATA.md
+++ b/docs/DATA.md
+## Introduction
+The data pipeline is responsible for loading and converting data. Each
+resulting data sample is a tuple of np.ndarrays.
+For example, Faster R-CNN training uses samples of this format: `[(im,
+im_info, im_id, gt_bbox, gt_class, is_crowd), (...)]`.
+### Implementation
+The data pipeline consists of four sub-systems: data parsing, image
+pre-processing, data conversion and data feeding APIs.
+Data samples are collected to form `dataset.Dataset`s, usually 3 sets are
+needed for training, validation, and testing respectively.
+First, `dataset.source` loads the data files into memory, then
+`dataset.transform` processes them, and lastly, the batched samples
+are fetched by `dataset.Reader`.
+Sub-systems details:
+1. Data parsing
+Parses various data sources and creates `dataset.Dataset` instances. Currently,
+following data sources are supported:
+- COCO data source
+Loads `COCO` type datasets with directory structures like this:
+  ```
+  data/coco/
+  ├── annotations
+  │   ├── instances_train2017.json
+  │   ├── instances_val2017.json
+  |   ...
+  ├── train2017
+  │   ├── 000000000009.jpg
+  │   ├── 000000580008.jpg
+  |   ...
+  ├── val2017
+  │   ├── 000000000139.jpg
+  │   ├── 000000000285.jpg
+  |   ...
+  ```
+- Pascal VOC data source
+Loads `Pascal VOC` like datasets with directory structure like this:
+  ```
+  data/pascalvoc/
+  ├──Annotations
+  │   ├── i000050.jpg
+  │   ├── 003876.xml
+  |   ...
+  ├── ImageSets
+  │   ├──Main
+              └── train.txt
+              └── val.txt
+              └── test.txt
+              └── dog_train.txt
+              └── dog_trainval.txt
+              └── dog_val.txt
+              └── dog_test.txt
+              └── ...
+  │   ├──Layout
+               └──...
+  │   ├── Segmentation
+                └──...
+  ├── JPEGImages
+  │   ├── 000050.jpg
+  │   ├── 003876.jpg
+  |   ...
+  ```
+- Roidb data source
+A generalized data source serialized as pickle files, which have the following
+structure:
+```python
+(records, cname2id)
+# `cname2id` is a `dict` which maps category name to class IDs
+# and `records` is a list of dict of this structure:
+{
+    'im_file': im_fname,    # image file name
+    'im_id': im_id,         # image ID
+    'h': im_h,              # height of image
+    'w': im_w,              # width of image
+    'is_crowd': is_crowd,   # crowd marker
+    'gt_class': gt_class,   # ground truth class
+    'gt_bbox': gt_bbox,     # ground truth bounding box
+    'gt_poly': gt_poly,     # ground truth segmentation
+}
+```
+We provide a tool to generate roidb data sources. To convert `COCO` or `VOC`
+like dataset, run this command:
+```sh
+# --type: the type of original data (xml or json)
+# --annotation: the path of file, which contains the name of annotation files
+# --save-dir: the save path
+# --samples: the number of samples (default is -1, which mean all datas in dataset)
+python ./tools/generate_data_for_training.py
+            --type=json \
+            --annotation=./annotations/instances_val2017.json \
+            --save-dir=./roidb \
+            --samples=-1
+```
+ 2. Image preprocessing
+the `dataset.transform.operator` module provides operations such as image
+decoding, expanding, cropping, etc. Multiple operators are combined to form
+larger processing pipelines.
+ 3. Data transformer
+Transform a `dataset.Dataset` to achieve various desired effects, Notably: the
+`dataset.transform.paralle_map` transformer accelerates image processing with
+multi-threads or multi-processes. More transformers can be found in
+`dataset.transform.transformer`.
+ 4. Data feeding apis
+To facilitate data pipeline building, we combine multiple `dataset.Dataset` to
+form a `dataset.Reader` which can provide data for training, validation and
+testing respectively. Users can simply call `Reader.[train|eval|infer]` to get
+the corresponding data stream. Many aspect of the `Reader`, such as storage
+location, preprocessing pipeline, acceleration mode can be configured with yaml
+files.
+The main APIs are as follows:
+1. Data parsing
+ - `source/coco_loader.py`: COCO dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/coco_loader.py)
+ - `source/voc_loader.py`: Pascal VOC dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/voc_loader.py)
+ [Note] To use a non-default label list for VOC datasets, a `label_list.txt`
+ file is needed, one can use the provided label list
+ (`data/pascalvoc/ImageSets/Main/label_list.txt`) or generate a custom one (with `tools/generate_data_for_training.py`). Also, `use_default_label` option should
+ be set to `false` in the configuration file
+ - `source/loader.py`: Roidb dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/loader.py)
+2. Operator
+ `transform/operators.py`: Contains a variety of data enhancement methods, including:
+- `DecodeImage`: Read images in RGB format.
+- `RandomFlipImage`: Horizontal flip.
+- `RandomDistort`: Distort brightness, contrast, saturation, and hue.
+- `ResizeImage`: Resize image with interpolation.
+- `RandomInterpImage`: Use a random interpolation method to resize the image.
+- `CropImage`: Crop image with respect to different scale, aspect ratio, and overlap.
+- `ExpandImage`: Pad image to a larger size, padding filled with mean image value.
+- `NormalizeImage`: Normalize image pixel values.
+- `NormalizeBox`: Normalize the bounding box.
+- `Permute`: Arrange the channels of the image and optionally convert image to BGR format.
+- `MixupImage`: Mixup two images with given fraction<sup>[1](#vd)</sup>.
+<a name="mix">[1]</a> Please refer to [this paper](https://arxiv.org/pdf/1710.09412.pdf)。
+`transform/arrange_sample.py`: Assemble the data samples needed by different models.
+3. Transformer
+`transform/post_map.py`: Transformations that operates on whole batches, mainly for:
+- Padding whole batch to given stride values
+- Resize images to Multi-scales
+- Randomly adjust the image size of the batch data
+`transform/transformer.py`: Data filtering batching.
+`transform/parallel_map.py`: Accelerate data processing with multi-threads/multi-processes.
+4. Reader
+`reader.py`: Combine source and transforms, return batch data according to `max_iter`.
+`data_feed.py`: Configure default parameters for `reader.py`.
+### Usage
+#### Canned Datasets
+Preset for common datasets, e.g., `MS-COCO` and `Pascal Voc` are included. In
+most cases, user can simply use these canned dataset as is. Moreover, the
+whole data pipeline is fully customizable through the yaml configuration files.
+#### Custom Datasets
+- Option 1: Convert the dataset to COCO or VOC format.
+```sh
+ # a small utility (`tools/labelme2coco.py`) is provided to convert
+ # Labelme-annotated dataset to COCO format.
+ python ./tools/labelme2coco.py --json_input_dir ./labelme_annos/
+                                --image_input_dir ./labelme_imgs/
+                                --output_dir ./cocome/
+                                --train_proportion 0.8
+                                --val_proportion 0.2
+                                --test_proportion 0.0
+ # --json_input_dir：The path of json files which are annotated by Labelme.
+ # --image_input_dir：The path of images.
+ # --output_dir：The path of coverted COCO dataset.
+ # --train_proportion：The train proportion of annatation data.
+ # --val_proportion：The validation proportion of annatation data.
+ # --test_proportion: The inference proportion of annatation data.
+```
+- Option 2:
+1. Add `source/XX_loader.py` and implement the `load` function, following the
+   example of `source/coco_loader.py` and `source/voc_loader.py`.
+2. Modify the `load` function in `source/loader.py` to make use of the newly
+   added data loader.
+3. Modify `/source/__init__.py` accordingly.
+```python
+if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
+    source_type = 'RoiDbSource'
+# Replace the above code with the following code:
+if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource', 'XXSource']:
+    source_type = 'RoiDbSource'
+```
+4. In the configure file, define the `type` of `dataset` as `XXSource`.
+#### How to add data pre-processing？
+- To add pre-processing operation for a single image, refer to the classes in
+  `transform/operators.py`, and implement the desired transformation with a new
+  class.
+- To add pre-processing for a batch, one needs to modify the `build_post_map`
+  function in `transform/post_map.py`.
--- a/docs/DATA_cn.md
+++ b/docs/DATA_cn.md
+## 介绍
+本模块是一个Python模块，用于加载数据并将其转换成适用于检测模型的训练、验证、测试所需要的格式——由多个np.ndarray组成的tuple数组，例如用于Faster R-CNN模型的训练数据格式为：`[(im, im_info, im_id, gt_bbox, gt_class, is_crowd), (...)]`。
+### 实现
+该模块内部可分为4个子功能：数据解析、图片预处理、数据转换和数据获取接口。
+我们采用`dataset.Dataset`表示一份数据，比如`COCO`数据包含3份数据，分别用于训练、验证和测试。原始数据存储与文件中，通过`dataset.source`加载到内存，然后使用`dataset.transform`对数据进行处理转换，最终通过`dataset.Reader`的接口可以获得用于训练、验证和测试的batch数据。
+子功能介绍：
+1. 数据解析  
+     数据解析得到的是`dataset.Dataset`,实现逻辑位于`dataset.source`中。通过它可以实现解析不同格式的数据集，已支持的数据源包括：
+- COCO数据源
+     该数据集目前分为COCO2012和COCO2017，主要由json文件和image文件组成，其组织结构如下所示：
+  ```
+  data/coco/
+  ├── annotations
+  │   ├── instances_train2014.json
+  │   ├── instances_train2017.json
+  │   ├── instances_val2014.json
+  │   ├── instances_val2017.json
+  |   ...
+  ├── train2017
+  │   ├── 000000000009.jpg
+  │   ├── 000000580008.jpg
+  |   ...
+  ├── val2017
+  │   ├── 000000000139.jpg
+  │   ├── 000000000285.jpg
+  |   ...
+  ```
+- Pascal VOC数据源
+     该数据集目前分为VOC2007和VOC2012，主要由xml文件和image文件组成，其组织结构如下所示：
+  ```
+  data/pascalvoc/
+  ├──Annotations
+  │   ├── i000050.jpg
+  │   ├── 003876.xml
+  |   ...
+  ├── ImageSets
+  │   ├──Main
+              └── train.txt
+              └── val.txt
+              └── test.txt
+              └── dog_train.txt
+              └── dog_trainval.txt
+              └── dog_val.txt
+              └── dog_test.txt
+              └── ...
+  │   ├──Layout
+               └──...
+  │   ├── Segmentation
+                └──...
+  ├── JPEGImages
+  │   ├── 000050.jpg
+  │   ├── 003876.jpg
+  |   ...
+  ```
+- Roidb数据源
+    该数据集主要由COCO数据集和Pascal VOC数据集转换而成的pickle文件，包含一个dict，而dict中只包含一个命名为‘records’的list（可能还有一个命名为‘cname2cid’的字典），其内容如下所示：
+```python
+(records, catname2clsid)
+'records'是一个list并且它的结构如下:
+{
+    'im_file': im_fname, # 图像文件名
+    'im_id': im_id, # 图像id
+    'h': im_h, # 图像高度
+    'w': im_w, # 图像宽度
+    'is_crowd': is_crowd, # 是否重叠
+    'gt_class': gt_class, # 真实框类别
+    'gt_bbox': gt_bbox, # 真实框坐标
+    'gt_poly': gt_poly, # 多边形坐标
+}
+'cname2id'是一个dict，保存了类别名到id的映射
+```
+我们在`./tools/`中提供了一个生成roidb数据集的代码，可以通过下面命令实现该功能。
+```python
+# --type: 原始数据集的类别（只能是xml或者json）
+# --annotation: 一个包含所需标注文件名的文件的路径
+# --save-dir: 保存路径
+# --samples: sample的个数（默认是-1，代表使用所有sample）
+python ./tools/generate_data_for_training.py
+            --type=json \
+            --annotation=./annotations/instances_val2017.json \
+            --save-dir=./roidb \
+            --samples=-1
+```
+ 2. 图片预处理  
+    图片预处理通过包括图片解码、缩放、裁剪等操作，我们采用`dataset.transform.operator`算子的方式来统一实现，这样能方便扩展。此外，多个算子还可以组合形成复杂的处理流程, 并被`dataset.transformer`中的转换器使用，比如多线程完成一个复杂的预处理流程。
+ 3. 数据转换器  
+    数据转换器的功能是完成对某个`dataset.Dataset`进行转换处理，从而得到一个新的`dataset.Dataset`。我们采用装饰器模式实现各种不同的`dataset.transform.transformer`。比如用于多进程预处理的`dataset.transform.paralle_map`转换器。
+ 4. 数据获取接口  
+     为方便训练时的数据获取，我们将多个`dataset.Dataset`组合在一起构成一个`dataset.Reader`为用户提供数据，用户只需要调用`Reader.[train|eval|infer]`即可获得对应的数据流。`Reader`支持yaml文件配置数据地址、预处理过程、加速方式等。
+主要的APIs如下：
+1. 数据解析  
+ - `source/coco_loader.py`：用于解析COCO数据集。[详见代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/coco_loader.py)
+ - `source/voc_loader.py`：用于解析Pascal VOC数据集。[详见代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/voc_loader.py)  
+ [注意]在使用VOC数据集时，若不使用默认的label列表，则需要先使用`tools/generate_data_for_training.py`生成`label_list.txt`（使用方式与数据解析中的roidb数据集获取过程一致），或提供`label_list.txt`放置于`data/pascalvoc/ImageSets/Main`中；同时在配置文件中设置参数`use_default_label`为`true`。
+ - `source/loader.py`：用于解析Roidb数据集。[详见代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/loader.py)
+2. 算子  
+ `transform/operators.py`：包含多种数据增强方式，主要包括：  
+```  python
+RandomFlipImage：水平翻转。
+RandomDistort：随机扰动图片亮度、对比度、饱和度和色相。
+ResizeImage：根据特定的插值方式调整图像大小。
+RandomInterpImage：使用随机的插值方式调整图像大小。
+CropImage：根据缩放比例、长宽比例两个参数生成若干候选框，再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果。
+ExpandImage：将原始图片放进一张使用像素均值填充(随后会在减均值操作中减掉)的扩张图中，再对此图进行裁剪、缩放和翻转。
+DecodeImage：以RGB格式读取图像。
+Permute：对图像的通道进行排列并转为BGR格式。
+NormalizeImage：对图像像素值进行归一化。
+NormalizeBox：对bounding box进行归一化。
+MixupImage：按比例叠加两张图像。
+```
+[注意]：Mixup的操作可参考[论文](https://arxiv.org/pdf/1710.09412.pdf)。
+`transform/arrange_sample.py`：实现对输入网络数据的排序。  
+3. 转换  
+`transform/post_map.py`：用于完成批数据的预处理操作，其主要包括：
+```  python
+随机调整批数据的图像大小
+多尺度调整图像大小
+padding操作
+```
+`transform/transformer.py`：用于过滤无用的数据，并返回批数据。
+`transform/parallel_map.py`：用于实现加速。  
+4. 读取  
+`reader.py`：用于组合source和transformer操作，根据`max_iter`返回batch数据。
+`data_feed.py`: 用于配置 `reader.py`中所需的默认参数.
+### 使用
+#### 常规使用
+结合yaml文件中的配置信息，完成本模块的功能。yaml文件的使用可以参见配置文件部分。
+ - 读取用于训练的数据
+``` python
+ccfg = load_cfg('./config.yml')
+coco = Reader(ccfg.DATA, ccfg.TRANSFORM, maxiter=-1)
+```
+#### 如何使用自定义数据集？
+- 选择1：将数据集转换为VOC格式或者COCO格式。
+```python
+ # 在./tools/中提供了labelme2coco.py用于将labelme标注的数据集转换为COCO数据集
+ python ./tools/labelme2coco.py --json_input_dir ./labelme_annos/
+                                --image_input_dir ./labelme_imgs/
+                                --output_dir ./cocome/
+                                --train_proportion 0.8
+                                --val_proportion 0.2
+                                --test_proportion 0.0
+ # --json_input_dir：使用labelme标注的json文件所在文件夹
+ # --image_input_dir：图像文件所在文件夹
+ # --output_dir：转换后的COCO格式数据集存放位置
+ # --train_proportion：标注数据中用于train的比例
+ # --val_proportion：标注数据中用于validation的比例
+ # --test_proportion: 标注数据中用于infer的比例
+```
+- 选择2：
+1. 仿照`./source/coco_loader.py`和`./source/voc_loader.py`，添加`./source/XX_loader.py`并实现`load`函数。  
+2. 在`./source/loader.py`的`load`函数中添加使用`./source/XX_loader.py`的入口。  
+3. 修改`./source/__init__.py`：  
+```python
+if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
+    source_type = 'RoiDbSource'
+# 将上述代码替换为如下代码：
+if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource', 'XXSource']:
+    source_type = 'RoiDbSource'
+```
+4. 在配置文件中修改`dataset`下的`type`为`XXSource`。  
+#### 如何增加数据预处理？
+- 若增加单张图像的增强预处理，可在`transform/operators.py`中参考每个类的代码，新建一个类来实现新的数据增强；同时在配置文件中增加该预处理。
+- 若增加单个batch的图像预处理，可在`transform/post_map.py`中参考`build_post_map`中每个函数的代码，新建一个内部函数来实现新的批数据预处理；同时在配置文件中增加该预处理。
--- a/docs/GETTING_STARTED.md
+++ b/docs/GETTING_STARTED.md
+# Getting Started
+For setting up the test environment, please refer to [installation
+instructions](INSTALL.md).
+## Training
+#### Single-GPU Training
+```bash
+export CUDA_VISIBLE_DEVICES=0
+python tools/train.py -c configs/faster_rcnn_r50_1x.yml
+```
+#### Multi-GPU Training
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python tools/train.py -c =configs/faster_rcnn_r50_1x.yml
+```
+- Datasets is stored in `dataset/coco` by default (configurable).
+- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
+- Model checkpoints is saved in `output` by default (configurable).
+- To check out hyper parameters used, please refer to the config file.
+Alternating between training epoch and evaluation run is possible, simply pass
+in `--eval=True` to do so (tested with `SSD` detector on Pascal-VOC, not
+recommended for two stage models or training sessions on COCO dataset)
+## Evaluation
+```bash
+export CUDA_VISIBLE_DEVICES=0
+# or run on CPU with:
+# export CPU_NUM=1
+python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
+```
+- Checkpoint is loaded from `output` by default (configurable)
+- Multi-GPU evaluation for R-CNN and SSD models is not supported at the
+moment, but it is a planned feature
+## Inference
+- Run inference on a single image:
+```bash
+export CUDA_VISIBLE_DEVICES=0
+# or run on CPU with:
+# export CPU_NUM=1
+python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
+```
+- Batch inference:
+```bash
+export CUDA_VISIBLE_DEVICES=0
+# or run on CPU with:
+# export CPU_NUM=1
+python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
+```
+The visualization files are saved in `output` by default, to specify a different
+path, simply add a `--save_file=` flag.
+## FAQ
+Q: Why do I get `NaN` loss values during single GPU training?
+A: The default learning rate is tuned to multi-GPU training (8x GPUs), it must
+be adapted for single GPU training accordingly (e.g., divide by 8).
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
+# Installation
+---
+## Table of Contents
+- [Introduction](#introduction)
+- [PaddlePaddle](#paddlepaddle)
+- [Other Dependencies](#other-dependencies)
+- [PaddleDetection](#paddle-detection)
+- [Datasets](#datasets)
+## Introduction
+This document covers how to install PaddleDetection, its dependencies
+(including PaddlePaddle), together with COCO and PASCAL VOC dataset.
+For general information about PaddleDetection, please see [README.md](../README.md).
+## PaddlePaddle
+Running PaddleDetection requires PaddlePaddle Fluid v.1.5 and later. please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/en/1.4/beginners_guide/install/index_en.html).
+Please make sure your PaddlePaddle installation was successful and the version
+of your PaddlePaddle is not lower than required. Verify with the following commands.
+```
+# To check if PaddlePaddle installation was sucessful
+python -c "from paddle.fluid import fluid; fluid.install_check.run_check()"
+# To check PaddlePaddle version
+python -c "import paddle; print(paddle.__version__)"
+```
+### Requirements:
+- Python2 or Python3
+- CUDA >= 8.0
+- cuDNN >= 5.0
+- nccl >= 2.1.2
+## Other Dependencies
+[COCO-API](https://github.com/cocodataset/cocoapi):
+COCO-API is needed for training. Installation is as follows:
+    git clone https://github.com/cocodataset/cocoapi.git
+    cd cocoapi/PythonAPI
+    # if cython is not installed
+    pip install Cython
+    # Install into global site-packages
+    make install
+    # Alternatively, if you do not have permissions or prefer
+    # not to install the COCO API into global site-packages
+    python setup.py install --user
+## PaddleDetection
+**Clone Paddle models repository:**
+You can clone Paddle models and change working directory to PaddleDetection
+with the following commands:
+```
+cd <path/to/clone/models>
+git clone https://github.com/PaddlePaddle/models
+cd models/PaddleCV/object_detection
+```
+**Install Python dependencies:**
+Required python packages are specified in [requirements.txt](./requirements.txt), and can be installed with:
+```
+pip install -r requirements.txt
+```
+**Make sure the tests pass:**
+```
+export PYTHONPATH=`pwd`:$PYTHONPATH
+python ppdet/modeling/tests/test_architectures.py
+```
+## Datasets
+PaddleDetection includes support for [MSCOCO](http://cocodataset.org) and [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) by default, please follow these instructions to set up the dataset.
+**Create symlinks for local datasets:**
+Default dataset path in config files is `data/coco` and `data/voc`, if the
+datasets are already available on disk, you can simply create symlinks to
+their directories:
+```
+ln -sf <path/to/coco> <path/to/paddle_detection>/data/coco
+ln -sf <path/to/voc> <path/to/paddle_detection>/data/voc
+```
+**Download datasets manually:**
+On the other hand, to download the datasets, run the following commands:
+- MS-COCO
+```
+cd dataset/coco
+./download.sh
+```
+- PASCAL VOC
+```
+cd dataset/voc
+./download.sh
+```
+**Download datasets automatically:**
+If a training session is started but the dataset is not setup properly (e.g,
+not found in `data/coc` or `data/voc`), PaddleDetection can automatically
+download them from [MSCOCO-2017](http://images.cocodataset.org) and
+[VOC2012](http://host.robots.ox.ac.uk/pascal/VOC), the decompressed datasets
+will be cached in `~/.cache/paddle/dataset/` and can be discovered automatically
+subsequently.
+**NOTE:** For further informations on the datasets, please see [DATA.md](DATA.md)
--- a/ppdet/__init__.py
+++ b/ppdet/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/ppdet/core/__init__.py
+++ b/ppdet/core/__init__.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import ppdet.modeling
+import ppdet.optimizer
+import ppdet.data
--- a/ppdet/core/config/__init__.py
+++ b/ppdet/core/config/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/ppdet/core/config/schema.py
+++ b/ppdet/core/config/schema.py
--- a/ppdet/core/config/yaml_helpers.py
+++ b/ppdet/core/config/yaml_helpers.py
--- a/ppdet/core/workspace.py
+++ b/ppdet/core/workspace.py
--- a/ppdet/data/README.md
+++ b/ppdet/data/README.md
+docs/DATA.md
\ No newline at end of file
--- a/ppdet/data/README_cn.md
+++ b/ppdet/data/README_cn.md
+docs/DATA_cn.md
\ No newline at end of file
--- a/ppdet/data/__init__.py
+++ b/ppdet/data/__init__.py
--- a/ppdet/data/data_feed.py
+++ b/ppdet/data/data_feed.py
--- a/ppdet/data/dataset.py
+++ b/ppdet/data/dataset.py
--- a/ppdet/data/reader.py
+++ b/ppdet/data/reader.py
--- a/ppdet/data/source/__init__.py
+++ b/ppdet/data/source/__init__.py
--- a/ppdet/data/source/coco_loader.py
+++ b/ppdet/data/source/coco_loader.py
--- a/ppdet/data/source/loader.py
+++ b/ppdet/data/source/loader.py
--- a/ppdet/data/source/roidb_source.py
+++ b/ppdet/data/source/roidb_source.py
--- a/ppdet/data/source/simple_source.py
+++ b/ppdet/data/source/simple_source.py
--- a/ppdet/data/source/voc_loader.py
+++ b/ppdet/data/source/voc_loader.py
--- a/ppdet/data/tests/000012.jpg
+++ b/ppdet/data/tests/000012.jpg
--- a/ppdet/data/tests/coco.yml
+++ b/ppdet/data/tests/coco.yml
--- a/ppdet/data/tests/data/prepare_data.sh
+++ b/ppdet/data/tests/data/prepare_data.sh
--- a/ppdet/data/tests/rcnn_dataset.yml
+++ b/ppdet/data/tests/rcnn_dataset.yml
--- a/ppdet/data/tests/run_all_tests.py
+++ b/ppdet/data/tests/run_all_tests.py
--- a/ppdet/data/tests/set_env.py
+++ b/ppdet/data/tests/set_env.py
--- a/ppdet/data/tests/test_loader.py
+++ b/ppdet/data/tests/test_loader.py
--- a/ppdet/data/tests/test_operator.py
+++ b/ppdet/data/tests/test_operator.py
--- a/ppdet/data/tests/test_reader.py
+++ b/ppdet/data/tests/test_reader.py
--- a/ppdet/data/tests/test_roidb_source.py
+++ b/ppdet/data/tests/test_roidb_source.py
--- a/ppdet/data/tests/test_transformer.py
+++ b/ppdet/data/tests/test_transformer.py
--- a/ppdet/data/tools/generate_data_for_training.py
+++ b/ppdet/data/tools/generate_data_for_training.py
--- a/ppdet/data/tools/labelme2coco.py
+++ b/ppdet/data/tools/labelme2coco.py
--- a/ppdet/data/transform/__init__.py
+++ b/ppdet/data/transform/__init__.py
--- a/ppdet/data/transform/arrange_sample.py
+++ b/ppdet/data/transform/arrange_sample.py
--- a/ppdet/data/transform/op_helper.py
+++ b/ppdet/data/transform/op_helper.py
--- a/ppdet/data/transform/operators.py
+++ b/ppdet/data/transform/operators.py
--- a/ppdet/data/transform/parallel_map.py
+++ b/ppdet/data/transform/parallel_map.py
--- a/ppdet/data/transform/post_map.py
+++ b/ppdet/data/transform/post_map.py
--- a/ppdet/data/transform/shared_queue/__init__.py
+++ b/ppdet/data/transform/shared_queue/__init__.py
--- a/ppdet/data/transform/shared_queue/queue.py
+++ b/ppdet/data/transform/shared_queue/queue.py
--- a/ppdet/data/transform/shared_queue/sharedmemory.py
+++ b/ppdet/data/transform/shared_queue/sharedmemory.py
--- a/ppdet/data/transform/transformer.py
+++ b/ppdet/data/transform/transformer.py
--- a/ppdet/modeling/__init__.py
+++ b/ppdet/modeling/__init__.py
--- a/ppdet/modeling/anchor_heads/__init__.py
+++ b/ppdet/modeling/anchor_heads/__init__.py
--- a/ppdet/modeling/anchor_heads/retina_head.py
+++ b/ppdet/modeling/anchor_heads/retina_head.py
--- a/ppdet/modeling/anchor_heads/rpn_head.py
+++ b/ppdet/modeling/anchor_heads/rpn_head.py
--- a/ppdet/modeling/anchor_heads/yolo_head.py
+++ b/ppdet/modeling/anchor_heads/yolo_head.py
--- a/ppdet/modeling/architectures/__init__.py
+++ b/ppdet/modeling/architectures/__init__.py
--- a/ppdet/modeling/architectures/cascade_rcnn.py
+++ b/ppdet/modeling/architectures/cascade_rcnn.py
--- a/ppdet/modeling/architectures/faster_rcnn.py
+++ b/ppdet/modeling/architectures/faster_rcnn.py
--- a/ppdet/modeling/architectures/mask_rcnn.py
+++ b/ppdet/modeling/architectures/mask_rcnn.py
--- a/ppdet/modeling/architectures/retinanet.py
+++ b/ppdet/modeling/architectures/retinanet.py
--- a/ppdet/modeling/architectures/ssd.py
+++ b/ppdet/modeling/architectures/ssd.py
--- a/ppdet/modeling/architectures/yolov3.py
+++ b/ppdet/modeling/architectures/yolov3.py
--- a/ppdet/modeling/backbones/__init__.py
+++ b/ppdet/modeling/backbones/__init__.py
--- a/ppdet/modeling/backbones/darknet.py
+++ b/ppdet/modeling/backbones/darknet.py
--- a/ppdet/modeling/backbones/fpn.py
+++ b/ppdet/modeling/backbones/fpn.py
--- a/ppdet/modeling/backbones/mobilenet.py
+++ b/ppdet/modeling/backbones/mobilenet.py
--- a/ppdet/modeling/backbones/name_adapter.py
+++ b/ppdet/modeling/backbones/name_adapter.py
--- a/ppdet/modeling/backbones/resnet.py
+++ b/ppdet/modeling/backbones/resnet.py
--- a/ppdet/modeling/backbones/resnext.py
+++ b/ppdet/modeling/backbones/resnext.py
--- a/ppdet/modeling/backbones/senet.py
+++ b/ppdet/modeling/backbones/senet.py
--- a/ppdet/modeling/model_input.py
+++ b/ppdet/modeling/model_input.py
--- a/ppdet/modeling/ops.py
+++ b/ppdet/modeling/ops.py
--- a/ppdet/modeling/roi_extractors/__init__.py
+++ b/ppdet/modeling/roi_extractors/__init__.py
--- a/ppdet/modeling/roi_extractors/roi_extractor.py
+++ b/ppdet/modeling/roi_extractors/roi_extractor.py
--- a/ppdet/modeling/roi_heads/__init__.py
+++ b/ppdet/modeling/roi_heads/__init__.py
--- a/ppdet/modeling/roi_heads/bbox_head.py
+++ b/ppdet/modeling/roi_heads/bbox_head.py
--- a/ppdet/modeling/roi_heads/cascade_head.py
+++ b/ppdet/modeling/roi_heads/cascade_head.py
--- a/ppdet/modeling/roi_heads/mask_head.py
+++ b/ppdet/modeling/roi_heads/mask_head.py
--- a/ppdet/modeling/target_assigners.py
+++ b/ppdet/modeling/target_assigners.py
--- a/ppdet/modeling/tests/__init__.py
+++ b/ppdet/modeling/tests/__init__.py
--- a/ppdet/modeling/tests/decorator_helper.py
+++ b/ppdet/modeling/tests/decorator_helper.py
--- a/ppdet/modeling/tests/test_architectures.py
+++ b/ppdet/modeling/tests/test_architectures.py
--- a/ppdet/optimizer.py
+++ b/ppdet/optimizer.py
--- a/ppdet/utils/__init__.py
+++ b/ppdet/utils/__init__.py
--- a/ppdet/utils/checkpoint.py
+++ b/ppdet/utils/checkpoint.py
--- a/ppdet/utils/cli.py
+++ b/ppdet/utils/cli.py
--- a/ppdet/utils/coco_eval.py
+++ b/ppdet/utils/coco_eval.py
--- a/ppdet/utils/colormap.py
+++ b/ppdet/utils/colormap.py
--- a/ppdet/utils/download.py
+++ b/ppdet/utils/download.py
--- a/ppdet/utils/eval_utils.py
+++ b/ppdet/utils/eval_utils.py
--- a/ppdet/utils/stats.py
+++ b/ppdet/utils/stats.py
--- a/ppdet/utils/visualizer.py
+++ b/ppdet/utils/visualizer.py
--- a/ppdet/utils/voc_eval.py
+++ b/ppdet/utils/voc_eval.py
--- a/ppdet/utils/voc_utils.py
+++ b/ppdet/utils/voc_utils.py
--- a/requirements.txt
+++ b/requirements.txt
--- a/tools/configure.py
+++ b/tools/configure.py
--- a/tools/eval.py
+++ b/tools/eval.py
--- a/tools/infer.py
+++ b/tools/infer.py
--- a/tools/train.py
+++ b/tools/train.py