Rename object_detection to PaddleDetection. (#2601)

* Rename object_detection to PaddleDetection * Small fix for doc

Rename object_detection to PaddleDetection. (#2601)
* Rename object_detection to PaddleDetection * Small fix for doc
3e57b4c3 · qingqing01 · GitHub · 3e57b4c3 · 3e57b4c3 · 3e57b4c3
136 changed file
--- a/.gitignore
+++ b/.gitignore
+# Virtualenv
+/.venv/
+/venv/
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+
+# C extensions
+*.so
+
+# json file
+*.json
+
+# Distribution / packaging
+/bin/
+/build/
+/develop-eggs/
+/dist/
+/eggs/
+/lib/
+/lib64/
+/output/
+/parts/
+/sdist/
+/var/
+/*.egg-info/
+/.installed.cfg
+/*.egg
+/.eggs
+
+# AUTHORS and ChangeLog will be generated while packaging
+/AUTHORS
+/ChangeLog
+
+# BCloud / BuildSubmitter
+/build_submitter.*
+/logger_client_log
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+.tox/
+.coverage
+.cache
+.pytest_cache
+nosetests.xml
+coverage.xml
+
+# Translations
+*.mo
+
+# Sphinx documentation
+/docs/_build/
+
+*.json
--- a/.style.yapf
+++ b/.style.yapf
+[style]
+based_on_style = pep8
+column_limit = 80
--- a/README.md
+++ b/README.md
+# PaddleDetection
+
+The goal of PaddleDetection is to provide easy access to a wide range of object
+detection models in both industry and research settings. We design
+PaddleDetection to be not only performant, production-ready but also highly
+flexible, catering to research needs.
+
+
+<div align="center">
+  <img src="demo/output/000000570688.jpg" />
+</div>
+
+
+## Introduction
+
+Design Principles:
+
+- Production Ready:
+Key operations are implemented in C++ and CUDA, together with PaddlePaddle's
+highly efficient inference engine, enables easy deployment in server environments.
+
+- Highly Flexible:
+Components are designed to be modular. Model architectures, as well as data
+preprocess pipelines, can be easily customized with simple configuration
+changes.
+
+- Performance Optimized:
+With the help of the underlying PaddlePaddle framework, faster training and
+reduced GPU memory footprint is achieved. Notably, Yolo V3 training is
+much faster compared to other frameworks. Another example is Mask-RCNN
+(ResNet50), we managed to fit up to 5 images per GPU (V100 16GB) during
+training.
+
+Supported Architectures:
+
+|                    | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt | SENet | MobileNet | DarkNet |
+|--------------------|:------:|------------------------------:|:-------:|:-----:|:---------:|:-------:|
+| Faster R-CNN       | ✓      |                             ✓ | ✓       | ✓     | ✗         | ✗       |
+| Faster R-CNN + FPN | ✓      |                             ✓ | ✓       | ✓     | ✗         | ✗       |
+| Mask R-CNN         | ✓      |                             ✓ | ✓       | ✓     | ✗         | ✗       |
+| Mask R-CNN + FPN   | ✓      |                             ✓ | ✓       | ✓     | ✗         | ✗       |
+| Cascade R-CNN      | ✓      |                             ✗ | ✗       | ✗     | ✗         | ✗       |
+| RetinaNet          | ✓      |                             ✗ | ✗       | ✗     | ✗         | ✗       |
+| Yolov3             | ✓      |                             ✗ | ✗       | ✗     | ✓         | ✓       |
+| SSD                | ✗      |                             ✗ | ✗       | ✗     | ✓         | ✗       |
+
+<a name="vd">[1]</a> ResNet-vd models offer much improved accuracy with negligible performance cost.
+
+Advanced Features:
+
+- [x] **Synchronized Batch Norm**: currently used by Yolo V3.
+- [x] **Group Norm**: pretrained models to be released.
+- [x] **Modulated Deformable Convolution**: pretrained models to be released.
+- [x] **Deformable PSRoI Pooling**: pretrained models to be released.
+
+
+## Model zoo
+
+Pretrained models are available in the PaddlePaddle [detection model zoo](docs/MODEL_ZOO.md).
+
+
+## Installation
+
+Please follow the [installation guide](docs/INSTALL.md).
+
+
+## Get Started
+
+For inference, simply run the following command and the visualized result will
+be saved in `output/`.
+
+```bash
+export PYTHONPATH=`pwd`:$PYTHONPATH
+python tools/infer.py -c configs/mask_rcnn_r50_1x.yml \
+    -o weights=https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_1x.tar
+    -infer_img=demo/000000570688.jpg
+```
+
+For detailed training and evaluation workflow, please refer to [GETTING_STARTED.md](docs/GETTING_STARTED.md).
+
+We also recommend users to take a look at the [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
+
+Further information can be found in these documentations:
+
+- [Introduction to the configuration workflow.](docs/CONFIG.md)
+- [Guide to custom dataset and preprocess pipeline.](docs/DATA.md)
+
+
+##  Todo List
+
+Please note this is a work in progress, substantial changes may come in the
+near future.
+Some of the planned features include:
+
+- [ ] Mixed precision training.
+- [ ] Distributed training.
+- [ ] Inference in 8-bit mode.
+- [ ] User defined operations.
+- [ ] Larger model zoo.
+
+
+## Updates
+
+#### Initial release (7/3/2019)
+- Initial release of PaddleDetection and detection model zoo
+- Models included: Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
+  R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, Yolo v3, and SSD.
+
+
+## Contributing
+
+Contributions are highly welcomed and we would really appreciate your feedback!!
--- a/configs/cascade_rcnn_r50_fpn_1x.yml
+++ b/configs/cascade_rcnn_r50_fpn_1x.yml
+architecture: CascadeRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+weights: output/cascade_rcnn_r50_fpn_1x/model_final
+metric: COCO
+
+CascadeRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: CascadeBBoxHead
+  bbox_assigner: CascadeBBoxAssigner
+
+ResNet:
+  norm_type: affine_channel
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  variant: b
+
+FPN:
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_positive_overlap: 0.7
+    rpn_negative_overlap: 0.3
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  min_level: 2
+  max_level: 5
+  box_resolution: 7
+  sampling_ratio: 2
+
+CascadeBBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [10, 20, 30]
+  bg_thresh_lo: [0.0, 0.0, 0.0]
+  bg_thresh_hi: [0.5, 0.6, 0.7]
+  fg_thresh: [0.5, 0.6, 0.7]
+  fg_fraction: 0.25
+  num_classes: 81
+
+CascadeBBoxHead:
+  head: FC6FC7Head
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+FC6FC7Head:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
--- a/configs/faster_rcnn_r101_1x.yml
+++ b/configs/faster_rcnn_r101_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 10000
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+metric: COCO
+weights: output/faster_rcnn_r101_1x/model_final
+
+FasterRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  norm_type: affine_channel
+  depth: 101
+  feature_maps: 4
+  freeze_at: 2
+
+ResNetC5:
+  norm_type: affine_channel
+
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+    use_random: true
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+
+RoIAlign:
+  resolution: 14
+  sampling_ratio: 0
+  spatial_scale: 0.0625
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [12000, 16000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  drop_last: false
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/faster_rcnn_r101_fpn_1x.yml
+++ b/configs/faster_rcnn_r101_fpn_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+weights: output/faster_rcnn_r101_fpn_1x/model_final
+metric: COCO
+
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_r101_fpn_2x.yml
+++ b/configs/faster_rcnn_r101_fpn_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 360000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+weights: output/faster_rcnn_r101_fpn_2x/model_final
+metric: COCO
+
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_r101_vd_fpn_1x.yml
+++ b/configs/faster_rcnn_r101_vd_fpn_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar
+weights: output/faster_rcnn_r101_vd_fpn_1x/model_final
+metric: COCO
+
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+  variant: d
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_r101_vd_fpn_2x.yml
+++ b/configs/faster_rcnn_r101_vd_fpn_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 360000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar
+weights: output/faster_rcnn_r101_vd_fpn_2x/model_final
+metric: COCO
+
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+  variant: d
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_r50_1x.yml
+++ b/configs/faster_rcnn_r50_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 10000
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/faster_rcnn_r50_1x/model_final
+
+FasterRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  norm_type: affine_channel
+  depth: 50
+  feature_maps: 4
+  freeze_at: 2
+
+ResNetC5:
+  norm_type: affine_channel
+
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+    use_random: true
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+
+RoIAlign:
+  resolution: 14
+  sampling_ratio: 0
+  spatial_scale: 0.0625
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [12000, 16000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  drop_last: false
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/faster_rcnn_r50_2x.yml
+++ b/configs/faster_rcnn_r50_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+use_gpu: true
+max_iters: 360000
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 10000
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/faster_rcnn_r50_2x/model_final
+
+FasterRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  norm_type: affine_channel
+  depth: 50
+  feature_maps: 4
+  freeze_at: 2
+
+ResNetC5:
+  norm_type: affine_channel
+
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+    use_random: true
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+
+RoIAlign:
+  resolution: 14
+  sampling_ratio: 0
+  spatial_scale: 0.0625
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [24000, 32000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  drop_last: false
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/faster_rcnn_r50_fpn_1x.yml
+++ b/configs/faster_rcnn_r50_fpn_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+use_gpu: true
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/fpn/faster_rcnn_r50_fpn_1x/model_final
+
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+
+FPN:
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_positive_overlap: 0.7
+    rpn_negative_overlap: 0.3
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  min_level: 2
+  max_level: 5
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_lo: 0.0
+  bg_thresh_hi: 0.5
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
--- a/configs/faster_rcnn_r50_fpn_2x.yml
+++ b/configs/faster_rcnn_r50_fpn_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+use_gpu: true
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/faster_rcnn_r50_fpn_2x/model_final
+
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+
+FPN:
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_positive_overlap: 0.7
+    rpn_negative_overlap: 0.3
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  min_level: 2
+  max_level: 5
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_lo: 0.0
+  bg_thresh_hi: 0.5
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: coco/annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
--- a/configs/faster_rcnn_r50_vd_1x.yml
+++ b/configs/faster_rcnn_r50_vd_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+log_smooth_window: 20
+save_dir: output/faster-r50-vd-c4-1x
+snapshot_iter: 10000
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
+metric: COCO
+weights: output/faster_rcnn_r50_vd_1x/model_final
+
+FasterRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  norm_type: affine_channel
+  depth: 50
+  feature_maps: 4
+  freeze_at: 2
+  variant: d
+
+ResNetC5:
+  norm_type: affine_channel
+  variant: d
+
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+    use_random: true
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+
+RoIAlign:
+  resolution: 14
+  sampling_ratio: 0
+  spatial_scale: 0.0625
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [12000, 16000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  drop_last: false
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/faster_rcnn_r50_vd_fpn_2x.yml
+++ b/configs/faster_rcnn_r50_vd_fpn_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
+weights: output/faster_rcnn_r50_vd_fpn_2x/model_final
+metric: COCO
+
+FasterRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+  variant: d
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_se154_vd_1x.yml
+++ b/configs/faster_rcnn_se154_vd_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
+weights: output/faster_rcnn_se154_1x/model_final
+metric: COCO
+
+FasterRCNN:
+  backbone: SENet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+SENet:
+  depth: 152
+  feature_maps: 4
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+  variant: d
+
+SENetC5:
+  depth: 152
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+  variant: d
+
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 12000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 6000
+
+RoIAlign:
+  resolution: 7
+  sampling_ratio: 0
+  spatial_scale: 0.0625
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: SENetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  num_workers: 2
--- a/configs/faster_rcnn_se154_vd_fpn_1x.yml
+++ b/configs/faster_rcnn_se154_vd_fpn_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
+weights: output/faster_rcnn_se154_fpn_1x/model_final
+metric: COCO
+
+FasterRCNN:
+  backbone: SENet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+SENet:
+  depth: 152
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+  variant: d
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_se154_vd_fpn_s1x.yml
+++ b/configs/faster_rcnn_se154_vd_fpn_s1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 260000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
+weights: output/faster_rcnn_se154_fpn_s1x/model_final
+metric: COCO
+
+FasterRCNN:
+  backbone: SENet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+SENet:
+  depth: 152
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+  variant: d
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [200000, 240000]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_x101_64x4d_fpn_1x.yml
+++ b/configs/faster_rcnn_x101_64x4d_fpn_1x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_64x4d_pretrained.tar
+weights: output/faster_rcnn_x101_64x4d_fpn_1x/model_final
+metric: COCO
+
+FasterRCNN:
+  backbone: ResNeXt
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNeXt:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/faster_rcnn_x101_64x4d_fpn_2x.yml
+++ b/configs/faster_rcnn_x101_64x4d_fpn_2x.yml
+architecture: FasterRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 180000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_64x4d_pretrained.tar
+weights: output/faster_rcnn_x101_64x4d_fpn_2x/model_final
+metric: COCO
+
+FasterRCNN:
+  backbone: ResNeXt
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNeXt:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 2000
+    pre_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    post_nms_top_n: 1000
+    pre_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_r101_fpn_1x.yml
+++ b/configs/mask_rcnn_r101_fpn_1x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r101_fpn_1x/model_final/
+
+MaskRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+MaskAssigner:
+  resolution: 28
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_r101_fpn_2x.yml
+++ b/configs/mask_rcnn_r101_fpn_2x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 360000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r101_fpn_2x/model_final/
+
+MaskRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  depth: 101
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+MaskAssigner:
+  resolution: 28
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_r50_1x.yml
+++ b/configs/mask_rcnn_r50_1x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r50_1x/model_final
+
+MaskRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_assigner: BBoxAssigner
+  bbox_head: BBoxHead
+  mask_assigner: MaskAssigner
+  mask_head: MaskHead
+
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 50
+  feature_maps: 4
+  freeze_at: 2
+
+ResNetC5:
+  norm_type: affine_channel
+
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+
+RoIAlign:
+  resolution: 14
+  spatial_scale: 0.0625
+  sampling_ratio: 0
+
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    normalized: false
+    score_threshold: 0.05
+  num_classes: 81
+
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  resolution: 14
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+MaskAssigner:
+  num_classes: 81
+  resolution: 14
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 2
+
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/mask_rcnn_r50_2x.yml
+++ b/configs/mask_rcnn_r50_2x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 360000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r50_2x/model_final/
+
+MaskRCNN:
+  backbone: ResNet
+  rpn_head: RPNHead
+  roi_extractor: RoIAlign
+  bbox_assigner: BBoxAssigner
+  bbox_head: BBoxHead
+  mask_assigner: MaskAssigner
+  mask_head: MaskHead
+
+
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 50
+  feature_maps: 4
+  freeze_at: 2
+
+ResNetC5:
+  norm_type: affine_channel
+
+RPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 12000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 6000
+    post_nms_top_n: 1000
+
+RoIAlign:
+  resolution: 14
+  spatial_scale: 0.0625
+  sampling_ratio: 0
+
+BBoxHead:
+  head: ResNetC5
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    normalized: false
+    score_threshold: 0.05
+  num_classes: 81
+
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  resolution: 14
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+MaskAssigner:
+  num_classes: 81
+  resolution: 14
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  #start the warm up from base_lr * start_factor
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 2
+
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/mask_rcnn_r50_fpn_1x.yml
+++ b/configs/mask_rcnn_r50_fpn_1x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 180000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r50_fpn_1x/model_final/
+
+MaskRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+MaskAssigner:
+  resolution: 28
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [120000, 160000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_r50_fpn_2x.yml
+++ b/configs/mask_rcnn_r50_fpn_2x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 360000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r50_fpn_2x/model_final/
+
+MaskRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  sampling_ratio: 2
+  box_resolution: 7
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+MaskAssigner:
+  resolution: 28
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+MaskRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_r50_vd_fpn_2x.yml
+++ b/configs/mask_rcnn_r50_vd_fpn_2x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+use_gpu: true
+max_iters: 360000
+snapshot_iter: 10000
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar
+metric: COCO
+weights: output/mask_rcnn_r50_vd_fpn_2x/model_final/
+
+MaskRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+ResNet:
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  norm_type: affine_channel
+  variant: d
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+MaskAssigner:
+  resolution: 28
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 320000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+MaskRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/mask_rcnn_se154_vd_fpn_s1x.yml
+++ b/configs/mask_rcnn_se154_vd_fpn_s1x.yml
+architecture: MaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+max_iters: 260000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SE154_vd_pretrained.tar
+weights: output/mask_rcnn_se154_vd_fpn_s1x/model_final/
+metric: COCO
+
+MaskRCNN:
+  backbone: SENet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: BBoxHead
+  bbox_assigner: BBoxAssigner
+
+SENet:
+  depth: 152
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: affine_channel
+  variant: d
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  num_chan_reduced: 256
+  num_classes: 81
+  num_convs: 4
+  resolution: 28
+
+BBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [0.1, 0.1, 0.2, 0.2]
+  bg_thresh_hi: 0.5
+  bg_thresh_lo: 0.0
+  fg_fraction: 0.25
+  fg_thresh: 0.5
+  num_classes: 81
+
+MaskAssigner:
+  resolution: 28
+
+BBoxHead:
+  head: TwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+  num_classes: 81
+
+TwoFCHead:
+  num_chan: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [200000, 240000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+MaskRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco 
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/configs/retinanet_r101_fpn_1x.yml
+++ b/configs/retinanet_r101_fpn_1x.yml
+architecture: RetinaNet
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+use_gpu: true
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
+weights: output/retinanet_r101_fpn_1x/model_final
+log_smooth_window: 20
+snapshot_iter: 10000
+metric: COCO
+save_dir: output
+
+RetinaNet:
+  backbone: ResNet
+  fpn: FPN
+  retina_head: RetinaHead
+
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 101
+  feature_maps: [3, 4, 5]
+  freeze_at: 2
+
+FPN:
+  max_level: 7
+  min_level: 3
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125]
+  has_extra_convs: true
+
+RetinaHead:
+  num_convs_per_octave: 4
+  num_chan: 256
+  max_level: 7
+  min_level: 3
+  prior_prob: 0.01
+  base_scale: 4
+  num_scales_per_octave: 3
+  num_classes: 81
+  anchor_generator:
+    aspect_ratios: [1.0, 2.0, 0.5]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  target_assign:
+    positive_overlap: 0.5
+    negative_overlap: 0.4
+  gamma: 2.0
+  alpha: 0.25
+  sigma: 3.0151134457776365
+  output_decoder:
+    score_thresh: 0.05
+    nms_thresh: 0.5
+    pre_nms_top_n: 1000
+    detections_per_im: 100
+    nms_eta: 1.0
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  batch_size: 2
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  dataset:
+    dataset_dir: data/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 2
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  dataset:
+    dataset_dir: data/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  dataset:
+    annotation: annotations/instances_val2017.json
+  num_workers: 2
--- a/configs/retinanet_r50_fpn_1x.yml
+++ b/configs/retinanet_r50_fpn_1x.yml
+architecture: RetinaNet
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+use_gpu: true
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+weights: output/retinanet_r50_fpn_1x/model_final
+log_smooth_window: 20
+snapshot_iter: 10000
+metric: COCO
+save_dir: output
+
+RetinaNet:
+  backbone: ResNet
+  fpn: FPN
+  retina_head: RetinaHead
+
+ResNet:
+  norm_type: affine_channel
+  norm_decay: 0.
+  depth: 50
+  feature_maps: [3, 4, 5]
+  freeze_at: 2
+
+FPN:
+  max_level: 7
+  min_level: 3
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125]
+  has_extra_convs: true
+
+RetinaHead:
+  num_convs_per_octave: 4
+  num_chan: 256
+  max_level: 7
+  min_level: 3
+  prior_prob: 0.01
+  base_scale: 4
+  num_scales_per_octave: 3
+  num_classes: 81
+  anchor_generator:
+    aspect_ratios: [1.0, 2.0, 0.5]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  target_assign:
+    positive_overlap: 0.5
+    negative_overlap: 0.4
+  gamma: 2.0
+  alpha: 0.25
+  sigma: 3.0151134457776365
+  output_decoder:
+    score_thresh: 0.05
+    nms_thresh: 0.5
+    pre_nms_top_n: 1000
+    detections_per_im: 100
+    nms_eta: 1.0
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 128
+  num_workers: 2
--- a/configs/ssd_mobilenet_v1_voc.yml
+++ b/configs/ssd_mobilenet_v1_voc.yml
+architecture: SSD
+max_iters: 28000
+train_feed: SSDTrainFeed
+eval_feed: SSDEvalFeed
+test_feed: SSDTestFeed
+pretrain_weights: ./ssd3/
+
+use_gpu: true
+snapshot_iter: 2000
+log_smooth_window: 1
+metric: VOC
+save_dir: output
+weights: output/ssd_mobilenet_v1_voc/model_final/
+
+SSD:
+  backbone: MobileNet
+  multi_box_head: MultiBoxHead
+  num_classes: 21
+  metric:
+    ap_version: 11point
+    evaluate_difficult: false
+    overlap_threshold: 0.5
+  output_decoder:
+    background_label: 0
+    keep_top_k: 200
+    nms_eta: 1.0
+    nms_threshold: 0.45
+    nms_top_k: 400
+    score_threshold: 0.01
+
+MobileNet:
+  norm_decay: 0.
+  conv_group_scale: 1
+  extra_block_filters: [[256, 512], [128, 256], [128, 256], [64, 128]]
+  with_extra_blocks: true
+
+MultiBoxHead:
+  aspect_ratios: [[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]]
+  base_size: 300
+  flip: true
+  max_ratio: 90
+  max_sizes: [[], 150.0, 195.0, 240.0, 285.0, 300.0]
+  min_ratio: 20
+  min_sizes: [60.0, 105.0, 150.0, 195.0, 240.0, 285.0]
+  offset: 0.5
+
+LearningRate:
+  schedulers:
+  - !PiecewiseDecay
+    milestones: [10000, 15000, 20000, 25000]
+    values: [0.001, 0.0005, 0.00025, 0.0001, 0.00001]
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.0
+    type: RMSPropOptimizer
+  regularizer:
+    factor: 0.00005
+    type: L2
+
+SSDTrainFeed:
+  batch_size: 32
+  use_process: true
+  dataset:
+    dataset_dir: dataset/voc
+    annotation: VOCdevkit/VOC_all/ImageSets/Main/train.txt
+    image_dir: VOCdevkit/VOC_all/JPEGImages
+    use_default_label: true
+
+SSDEvalFeed:
+  batch_size: 64
+  use_process: true
+  dataset:
+    dataset_dir: dataset/voc
+    annotation: VOCdevkit/VOC_all/ImageSets/Main/val.txt
+    image_dir: VOCdevkit/VOC_all/JPEGImages
+    use_default_label: true
+  drop_last: false
+
+SSDTestFeed:
+  batch_size: 1
+  dataset:
+    use_default_label: true
+  drop_last: false
--- a/configs/yolov3_darknet.yml
+++ b/configs/yolov3_darknet.yml
+architecture: YOLOv3
+train_feed: YoloTrainFeed
+eval_feed: YoloEvalFeed
+test_feed: YoloTestFeed
+use_gpu: true
+max_iters: 500200
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 2000
+metric: COCO
+pretrain_weights: https://paddlemodels.bj.bcebos.com/yolo/darknet53.tar.gz
+weights: https://paddlemodels.bj.bcebos.com/yolo/yolov3.tar.gz
+
+YOLOv3:
+  backbone: DarkNet
+  yolo_head: YOLOv3Head
+
+DarkNet:
+  norm_type: sync_bn
+  norm_decay: 0.
+  depth: 53
+
+YOLOv3Head:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  norm_decay: 0.
+  ignore_thresh: 0.7
+  label_smooth: true
+  nms:
+    background_label: -1
+    keep_top_k: 100
+    nms_threshold: 0.45
+    nms_top_k: 1000
+    normalized: false
+    score_threshold: 0.01
+  num_classes: 80
+
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 400000
+    - 450000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+YoloTrainFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 8
+  bufsize: 128
+  use_process: true
+
+YoloEvalFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+
+YoloTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/yolov3_mobilenet_v1.yml
+++ b/configs/yolov3_mobilenet_v1.yml
+architecture: YOLOv3
+train_feed: YoloTrainFeed
+eval_feed: YoloEvalFeed
+test_feed: YoloTestFeed
+use_gpu: true
+max_iters: 500200
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 2000
+metric: COCO
+pretrain_weights: http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar
+weights: https://paddlemodels.bj.bcebos.com/yolo/yolo_mobilenet1.0.tar.gz
+
+YOLOv3:
+  backbone: MobileNet
+  yolo_head: YOLOv3Head
+
+MobileNet:
+  norm_type: sync_bn
+  norm_decay: 0.
+  conv_group_scale: 1
+  with_extra_blocks: false
+
+YOLOv3Head:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  norm_decay: 0.
+  ignore_thresh: 0.7
+  label_smooth: true
+  nms:
+    background_label: -1
+    keep_top_k: 100
+    nms_threshold: 0.45
+    nms_top_k: 1000
+    normalized: false
+    score_threshold: 0.01
+  num_classes: 80
+
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 400000
+    - 450000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+YoloTrainFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 8
+  bufsize: 128
+  use_process: true
+
+YoloEvalFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+
+YoloTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/configs/yolov3_r34.yml
+++ b/configs/yolov3_r34.yml
+architecture: YOLOv3
+train_feed: YoloTrainFeed
+eval_feed: YoloEvalFeed
+test_feed: YoloTestFeed
+use_gpu: true
+max_iters: 500200
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 2000
+metric: COCO
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar
+weights: https://paddlemodels.bj.bcebos.com/yolo/yolo_resnet34.tar.gz
+
+YOLOv3:
+  backbone: ResNet
+  yolo_head: YOLOv3Head
+
+ResNet:
+  norm_type: sync_bn
+  freeze_at: 0
+  freeze_norm: false
+  norm_decay: 0.
+  depth: 34
+  feature_maps: [3, 4, 5]
+
+YOLOv3Head:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  norm_decay: 0.
+  ignore_thresh: 0.7
+  label_smooth: true
+  nms:
+    background_label: -1
+    keep_top_k: 100
+    nms_threshold: 0.45
+    nms_top_k: 1000
+    normalized: false
+    score_threshold: 0.01
+  num_classes: 80
+
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 400000
+    - 450000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+YoloTrainFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 8
+  bufsize: 128
+  use_process: true
+
+YoloEvalFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+
+YoloTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: annotations/instances_val2017.json
--- a/dataset/coco/download.sh
+++ b/dataset/coco/download.sh
+DIR="$( cd "$(dirname "$0")" ; pwd -P )"
+cd "$DIR"
+
+# Download the data.
+echo "Downloading..."
+wget http://images.cocodataset.org/zips/train2014.zip
+wget http://images.cocodataset.org/zips/val2014.zip
+wget http://images.cocodataset.org/zips/train2017.zip
+wget http://images.cocodataset.org/zips/val2017.zip
+wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
+wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
+# Extract the data.
+echo "Extracting..."
+unzip train2014.zip
+unzip val2014.zip
+unzip train2017.zip
+unzip val2017.zip
+unzip annotations_trainval2014.zip
+unzip annotations_trainval2017.zip
+
--- a/dataset/voc/download.sh
+++ b/dataset/voc/download.sh
+DIR="$( cd "$(dirname "$0")" ; pwd -P )"
+cd "$DIR"
+
+# Download the data.
+echo "Downloading..."
+wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
+wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
+wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
+# Extract the data.
+echo "Extracting..."
+tar -xf VOCtrainval_11-May-2012.tar
+tar -xf VOCtrainval_06-Nov-2007.tar
+tar -xf VOCtest_06-Nov-2007.tar
+
+echo "Creating data lists..."
+python -c 'from ppdet.utils.voc_utils import merge_and_create_list; merge_and_create_list("VOCdevkit", ["2007", "2012"], "VOCdevkit/VOC_all")'
--- a/demo/000000014439.jpg
+++ b/demo/000000014439.jpg
--- a/demo/000000087038.jpg
+++ b/demo/000000087038.jpg
--- a/demo/000000570688.jpg
+++ b/demo/000000570688.jpg
--- a/demo/mask_rcnn_demo.ipynb
+++ b/demo/mask_rcnn_demo.ipynb
--- a/demo/output/000000570688.jpg
+++ b/demo/output/000000570688.jpg
--- a/docs/CONFIG.md
+++ b/docs/CONFIG.md
+# Introduction
+
+PaddleDetection takes a rather principled approach to configuration management. We aim to automate the configuration workflow and to reduce configuration errors.
+
+
+# Rationale
+
+Presently, configuration in mainstream frameworks are usually dictionary based: the global config is simply a giant, loosely defined Python dictionary.
+
+This approach is error prone, e.g., misspelled or displaced keys may lead to serious errors in training process, causing time loss and wasted resources.
+
+To avoid the common pitfalls, with automation and static analysis in mind, we propose a configuration design that is user friendly, easy to maintain and extensible.
+
+
+# Design
+
+The design utilizes some of Python's reflection mechanism to extract configuration schematics from Python class definitions.
+
+To be specific, it extracts information from class constructor arguments, including names, docstrings, default values, data types (if type hints are available).
+
+This approach advocates modular and testable design, leading to a unified and extensible code base.
+
+
+## API
+
+Most of the functionality is exposed in `ppdet.core.workspace` module.
+
+-   `register`: This decorator register a class as configurable module; it understands several special annotations in the class definition.
+    -   `__category__`: For better organization, modules are classified into categories.
+    -   `__inject__`: A list of constructor arguments, which are intended to take module instances as input, module instances will be created at runtime an injected. The corresponding configuration value can be a class name string, a serialized object, a config key pointing to a serialized object, or a dict (in which case the constructor needs to handle it, see example below).
+    -   `__op__`: Shortcut for wrapping PaddlePaddle operators into a callable objects, together with `__append_doc__` (extracting docstring from target PaddlePaddle operator automatically), this can be a real time saver.
+-   `serializable`: This decorator make a class directly serializable in yaml config file, by taking advantage of [pyyaml](https://pyyaml.org/wiki/PyYAMLDocumentation)'s serialization mechanism.
+-   `create`: Constructs a module instance according to global configuration.
+-   `load_config` and `merge_config`: Loading yaml file and merge config settings from command line.
+
+
+## Example
+
+Take the `RPNHead` module for example, it is composed of several PaddlePaddle operators. We first wrap those operators into classes, then pass in instances of these classes when instantiating the `RPNHead` module.
+
+```python
+# excerpt from `ppdet/modeling/ops.py`
+from ppdet.core.workspace import register, serializable
+
+# ... more operators
+
+@register
+@serializable
+class GenerateProposals(object):
+    # NOTE this class simply wraps a PaddlePaddle operator
+    __op__ = fluid.layers.generate_proposals
+    # NOTE docstring for args are extracted from PaddlePaddle OP
+    __append_doc__ = True
+
+    def __init__(self,
+                 pre_nms_top_n=6000,
+                 post_nms_top_n=1000,
+                 nms_thresh=.5,
+                 min_size=.1,
+                 eta=1.):
+        super(GenerateProposals, self).__init__()
+        self.pre_nms_top_n = pre_nms_top_n
+        self.post_nms_top_n = post_nms_top_n
+        self.nms_thresh = nms_thresh
+        self.min_size = min_size
+        self.eta = eta
+
+# ... more operators
+
+# excerpt from `ppdet/modeling/anchor_heads/rpn_head.py`
+from ppdet.core.workspace import register
+from ppdet.modeling.ops import AnchorGenerator, RPNTargetAssign, GenerateProposals
+
+@register
+class RPNHead(object):
+    """
+    RPN Head
+
+    Args:
+        anchor_generator (object): `AnchorGenerator` instance
+        rpn_target_assign (object): `RPNTargetAssign` instance
+        train_proposal (object): `GenerateProposals` instance for training
+        test_proposal (object): `GenerateProposals` instance for testing
+    """
+    __inject__ = [
+        'anchor_generator', 'rpn_target_assign', 'train_proposal',
+        'test_proposal'
+    ]
+
+    def __init__(self,
+                 anchor_generator=AnchorGenerator().__dict__,
+                 rpn_target_assign=RPNTargetAssign().__dict__,
+                 train_proposal=GenerateProposals(12000, 2000).__dict__,
+                 test_proposal=GenerateProposals().__dict__):
+        super(RPNHead, self).__init__()
+        self.anchor_generator = anchor_generator
+        self.rpn_target_assign = rpn_target_assign
+        self.train_proposal = train_proposal
+        self.test_proposal = test_proposal
+        if isinstance(anchor_generator, dict):
+            self.anchor_generator = AnchorGenerator(**anchor_generator)
+        if isinstance(rpn_target_assign, dict):
+            self.rpn_target_assign = RPNTargetAssign(**rpn_target_assign)
+        if isinstance(train_proposal, dict):
+            self.train_proposal = GenerateProposals(**train_proposal)
+        if isinstance(test_proposal, dict):
+            self.test_proposal = GenerateProposals(**test_proposal)
+```
+
+The corresponding(generated) YAML snippet is as follows, note this is the configuration in **FULL**, all the default values can be omitted. In case of the above example, all arguments have default value, meaning nothing is required in the config file.
+
+```yaml
+RPNHead:
+  test_prop:
+    eta: 1.0
+    min_size: 0.1
+    nms_thresh: 0.5
+    post_nms_top_n: 1000
+    pre_nms_top_n: 6000
+  train_prop:
+    eta: 1.0
+    min_size: 0.1
+    nms_thresh: 0.5
+    post_nms_top_n: 2000
+    pre_nms_top_n: 12000
+  anchor_generator:
+    # ...
+  rpn_target_assign:
+    # ...
+```
+
+Example snippet that make use of the `RPNHead` module.
+
+```python
+from ppdet.core.worskspace import load_config, merge_config, create
+
+load_config('some_config_file.yml')
+merge_config(more_config_options_from_command_line)
+
+rpn_head = create('RPNHead')
+# ... code that use the created module!
+```
+
+Configuration file can also have serialized objects in it, denoted with `!`, for example
+
+```yaml
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+```
+
+
+# Requirements
+
+Two Python packages are used, both are optional.
+
+-   [typeguard](https://github.com/agronholm/typeguard) is used for type checking in Python 3.
+-   [docstring\_parser](https://github.com/rr-/docstring_parser) is needed for docstring parsing.
+
+To install them, simply run:
+
+```shell
+pip install typeguard http://github.com/willthefrog/docstring_parser/tarball/master
+```
+
+
+# Tooling
+
+A small utility (`tools/configure.py`) is included to simplify the configuration process, it provides 4 commands to walk users through the configuration process:
+
+1.  `list`: List currently registered modules by category, one can also specify which category to list with the `--category` flag.
+2.  `help`: Get help information for a module, including description, options, configuration template and example command line flags.
+3.  `analyze`: Check configuration file for missing/extraneous options, options with mismatch type (if type hint is given) and missing dependencies, it also highlights user provided values (overridden default values).
+4.  `generate`: Generate a configuration template for a given list of modules. By default it generates a complete configuration file, which can be quite verbose; if a `--minimal` flag is given, it generates a template that only contain non optional settings. For example, to generate a configuration for Faster R-CNN architecture with `ResNet` backbone and `FPN`, run:
+
+    ```shell
+    python tools/configure.py generate FasterRCNN ResNet RPNHead RoIAlign BBoxAssigner BBoxHead FasterRCNNTrainFeed FasterRCNNTestFeed LearningRate OptimizerBuilder
+    ```
+
+    For a minimal version, run:
+
+    ```shell
+    python tools/configure.py --minimal generate FasterRCNN BBoxHead
+    ```
--- a/docs/DATA.md
+++ b/docs/DATA.md
+## Introduction
+
+The data pipeline is responsible for loading and converting data. Each
+resulting data sample is a tuple of np.ndarrays.
+For example, Faster R-CNN training uses samples of this format: `[(im,
+im_info, im_id, gt_bbox, gt_class, is_crowd), (...)]`.
+
+### Implementation
+
+The data pipeline consists of four sub-systems: data parsing, image
+pre-processing, data conversion and data feeding APIs.
+
+Data samples are collected to form `dataset.Dataset`s, usually 3 sets are
+needed for training, validation, and testing respectively.
+
+First, `dataset.source` loads the data files into memory, then
+`dataset.transform` processes them, and lastly, the batched samples
+are fetched by `dataset.Reader`.
+
+Sub-systems details:
+1. Data parsing
+Parses various data sources and creates `dataset.Dataset` instances. Currently,
+following data sources are supported:
+
+- COCO data source
+Loads `COCO` type datasets with directory structures like this:
+
+  ```
+  data/coco/
+  ├── annotations
+  │   ├── instances_train2017.json
+  │   ├── instances_val2017.json
+  |   ...
+  ├── train2017
+  │   ├── 000000000009.jpg
+  │   ├── 000000580008.jpg
+  |   ...
+  ├── val2017
+  │   ├── 000000000139.jpg
+  │   ├── 000000000285.jpg
+  |   ...
+  ```
+
+- Pascal VOC data source
+Loads `Pascal VOC` like datasets with directory structure like this:
+
+  ```
+  data/pascalvoc/
+  ├──Annotations
+  │   ├── i000050.jpg
+  │   ├── 003876.xml
+  |   ...
+  ├── ImageSets
+  │   ├──Main
+              └── train.txt
+              └── val.txt
+              └── test.txt
+              └── dog_train.txt
+              └── dog_trainval.txt
+              └── dog_val.txt
+              └── dog_test.txt
+              └── ...
+  │   ├──Layout
+               └──...
+  │   ├── Segmentation
+                └──...
+  ├── JPEGImages
+  │   ├── 000050.jpg
+  │   ├── 003876.jpg
+  |   ...
+  ```
+
+- Roidb data source
+A generalized data source serialized as pickle files, which have the following
+structure:
+```python
+(records, cname2id)
+# `cname2id` is a `dict` which maps category name to class IDs
+# and `records` is a list of dict of this structure:
+{
+    'im_file': im_fname,    # image file name
+    'im_id': im_id,         # image ID
+    'h': im_h,              # height of image
+    'w': im_w,              # width of image
+    'is_crowd': is_crowd,   # crowd marker
+    'gt_class': gt_class,   # ground truth class
+    'gt_bbox': gt_bbox,     # ground truth bounding box
+    'gt_poly': gt_poly,     # ground truth segmentation
+}
+```
+
+We provide a tool to generate roidb data sources. To convert `COCO` or `VOC`
+like dataset, run this command:
+```sh
+# --type: the type of original data (xml or json)
+# --annotation: the path of file, which contains the name of annotation files
+# --save-dir: the save path
+# --samples: the number of samples (default is -1, which mean all datas in dataset)
+python ./tools/generate_data_for_training.py
+            --type=json \
+            --annotation=./annotations/instances_val2017.json \
+            --save-dir=./roidb \
+            --samples=-1
+```
+
+ 2. Image preprocessing
+the `dataset.transform.operator` module provides operations such as image
+decoding, expanding, cropping, etc. Multiple operators are combined to form
+larger processing pipelines.
+
+ 3. Data transformer
+Transform a `dataset.Dataset` to achieve various desired effects, Notably: the
+`dataset.transform.paralle_map` transformer accelerates image processing with
+multi-threads or multi-processes. More transformers can be found in
+`dataset.transform.transformer`.
+
+ 4. Data feeding apis
+To facilitate data pipeline building, we combine multiple `dataset.Dataset` to
+form a `dataset.Reader` which can provide data for training, validation and
+testing respectively. Users can simply call `Reader.[train|eval|infer]` to get
+the corresponding data stream. Many aspect of the `Reader`, such as storage
+location, preprocessing pipeline, acceleration mode can be configured with yaml
+files.
+
+The main APIs are as follows:
+
+1. Data parsing
+
+ - `source/coco_loader.py`: COCO dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/coco_loader.py)
+ - `source/voc_loader.py`: Pascal VOC dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/voc_loader.py)
+ [Note] To use a non-default label list for VOC datasets, a `label_list.txt`
+ file is needed, one can use the provided label list
+ (`data/pascalvoc/ImageSets/Main/label_list.txt`) or generate a custom one (with `tools/generate_data_for_training.py`). Also, `use_default_label` option should
+ be set to `false` in the configuration file
+ - `source/loader.py`: Roidb dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/loader.py)
+
+2. Operator
+ `transform/operators.py`: Contains a variety of data enhancement methods, including:
+- `DecodeImage`: Read images in RGB format.
+- `RandomFlipImage`: Horizontal flip.
+- `RandomDistort`: Distort brightness, contrast, saturation, and hue.
+- `ResizeImage`: Resize image with interpolation.
+- `RandomInterpImage`: Use a random interpolation method to resize the image.
+- `CropImage`: Crop image with respect to different scale, aspect ratio, and overlap.
+- `ExpandImage`: Pad image to a larger size, padding filled with mean image value.
+- `NormalizeImage`: Normalize image pixel values.
+- `NormalizeBox`: Normalize the bounding box.
+- `Permute`: Arrange the channels of the image and optionally convert image to BGR format.
+- `MixupImage`: Mixup two images with given fraction<sup>[1](#vd)</sup>.
+
+<a name="mix">[1]</a> Please refer to [this paper](https://arxiv.org/pdf/1710.09412.pdf)。
+
+`transform/arrange_sample.py`: Assemble the data samples needed by different models.
+3. Transformer
+`transform/post_map.py`: Transformations that operates on whole batches, mainly for:
+- Padding whole batch to given stride values
+- Resize images to Multi-scales
+- Randomly adjust the image size of the batch data
+`transform/transformer.py`: Data filtering batching.
+`transform/parallel_map.py`: Accelerate data processing with multi-threads/multi-processes.
+4. Reader
+`reader.py`: Combine source and transforms, return batch data according to `max_iter`.
+`data_feed.py`: Configure default parameters for `reader.py`.
+
+
+### Usage
+
+#### Canned Datasets
+
+Preset for common datasets, e.g., `MS-COCO` and `Pascal Voc` are included. In
+most cases, user can simply use these canned dataset as is. Moreover, the
+whole data pipeline is fully customizable through the yaml configuration files.
+
+#### Custom Datasets
+
+- Option 1: Convert the dataset to COCO or VOC format.
+```sh
+ # a small utility (`tools/labelme2coco.py`) is provided to convert
+ # Labelme-annotated dataset to COCO format.
+ python ./tools/labelme2coco.py --json_input_dir ./labelme_annos/
+                                --image_input_dir ./labelme_imgs/
+                                --output_dir ./cocome/
+                                --train_proportion 0.8
+                                --val_proportion 0.2
+                                --test_proportion 0.0
+ # --json_input_dir：The path of json files which are annotated by Labelme.
+ # --image_input_dir：The path of images.
+ # --output_dir：The path of coverted COCO dataset.
+ # --train_proportion：The train proportion of annatation data.
+ # --val_proportion：The validation proportion of annatation data.
+ # --test_proportion: The inference proportion of annatation data.
+```
+
+- Option 2:
+
+1. Add `source/XX_loader.py` and implement the `load` function, following the
+   example of `source/coco_loader.py` and `source/voc_loader.py`.
+2. Modify the `load` function in `source/loader.py` to make use of the newly
+   added data loader.
+3. Modify `/source/__init__.py` accordingly.
+```python
+if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
+    source_type = 'RoiDbSource'
+# Replace the above code with the following code:
+if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource', 'XXSource']:
+    source_type = 'RoiDbSource'
+```
+4. In the configure file, define the `type` of `dataset` as `XXSource`.
+
+#### How to add data pre-processing？
+
+- To add pre-processing operation for a single image, refer to the classes in
+  `transform/operators.py`, and implement the desired transformation with a new
+  class.
+- To add pre-processing for a batch, one needs to modify the `build_post_map`
+  function in `transform/post_map.py`.
--- a/docs/DATA_cn.md
+++ b/docs/DATA_cn.md
+## 介绍
+本模块是一个Python模块，用于加载数据并将其转换成适用于检测模型的训练、验证、测试所需要的格式——由多个np.ndarray组成的tuple数组，例如用于Faster R-CNN模型的训练数据格式为：`[(im, im_info, im_id, gt_bbox, gt_class, is_crowd), (...)]`。
+
+### 实现
+该模块内部可分为4个子功能：数据解析、图片预处理、数据转换和数据获取接口。
+
+我们采用`dataset.Dataset`表示一份数据，比如`COCO`数据包含3份数据，分别用于训练、验证和测试。原始数据存储与文件中，通过`dataset.source`加载到内存，然后使用`dataset.transform`对数据进行处理转换，最终通过`dataset.Reader`的接口可以获得用于训练、验证和测试的batch数据。
+
+子功能介绍：
+
+1. 数据解析  
+     数据解析得到的是`dataset.Dataset`,实现逻辑位于`dataset.source`中。通过它可以实现解析不同格式的数据集，已支持的数据源包括：
+- COCO数据源
+     该数据集目前分为COCO2012和COCO2017，主要由json文件和image文件组成，其组织结构如下所示：
+
+  ```
+  data/coco/
+  ├── annotations
+  │   ├── instances_train2014.json
+  │   ├── instances_train2017.json
+  │   ├── instances_val2014.json
+  │   ├── instances_val2017.json
+  |   ...
+  ├── train2017
+  │   ├── 000000000009.jpg
+  │   ├── 000000580008.jpg
+  |   ...
+  ├── val2017
+  │   ├── 000000000139.jpg
+  │   ├── 000000000285.jpg
+  |   ...
+  ```
+
+
+- Pascal VOC数据源
+     该数据集目前分为VOC2007和VOC2012，主要由xml文件和image文件组成，其组织结构如下所示：
+
+
+  ```
+  data/pascalvoc/
+  ├──Annotations
+  │   ├── i000050.jpg
+  │   ├── 003876.xml
+  |   ...
+  ├── ImageSets
+  │   ├──Main
+              └── train.txt
+              └── val.txt
+              └── test.txt
+              └── dog_train.txt
+              └── dog_trainval.txt
+              └── dog_val.txt
+              └── dog_test.txt
+              └── ...
+  │   ├──Layout
+               └──...
+  │   ├── Segmentation
+                └──...
+  ├── JPEGImages
+  │   ├── 000050.jpg
+  │   ├── 003876.jpg
+  |   ...
+  ```
+
+
+
+- Roidb数据源
+    该数据集主要由COCO数据集和Pascal VOC数据集转换而成的pickle文件，包含一个dict，而dict中只包含一个命名为‘records’的list（可能还有一个命名为‘cname2cid’的字典），其内容如下所示：
+```python
+(records, catname2clsid)
+'records'是一个list并且它的结构如下:
+{
+    'im_file': im_fname, # 图像文件名
+    'im_id': im_id, # 图像id
+    'h': im_h, # 图像高度
+    'w': im_w, # 图像宽度
+    'is_crowd': is_crowd, # 是否重叠
+    'gt_class': gt_class, # 真实框类别
+    'gt_bbox': gt_bbox, # 真实框坐标
+    'gt_poly': gt_poly, # 多边形坐标
+}
+'cname2id'是一个dict，保存了类别名到id的映射
+
+```
+我们在`./tools/`中提供了一个生成roidb数据集的代码，可以通过下面命令实现该功能。
+```python
+# --type: 原始数据集的类别（只能是xml或者json）
+# --annotation: 一个包含所需标注文件名的文件的路径
+# --save-dir: 保存路径
+# --samples: sample的个数（默认是-1，代表使用所有sample）
+python ./tools/generate_data_for_training.py
+            --type=json \
+            --annotation=./annotations/instances_val2017.json \
+            --save-dir=./roidb \
+            --samples=-1
+```
+ 2. 图片预处理  
+    图片预处理通过包括图片解码、缩放、裁剪等操作，我们采用`dataset.transform.operator`算子的方式来统一实现，这样能方便扩展。此外，多个算子还可以组合形成复杂的处理流程, 并被`dataset.transformer`中的转换器使用，比如多线程完成一个复杂的预处理流程。
+
+ 3. 数据转换器  
+    数据转换器的功能是完成对某个`dataset.Dataset`进行转换处理，从而得到一个新的`dataset.Dataset`。我们采用装饰器模式实现各种不同的`dataset.transform.transformer`。比如用于多进程预处理的`dataset.transform.paralle_map`转换器。
+
+ 4. 数据获取接口  
+     为方便训练时的数据获取，我们将多个`dataset.Dataset`组合在一起构成一个`dataset.Reader`为用户提供数据，用户只需要调用`Reader.[train|eval|infer]`即可获得对应的数据流。`Reader`支持yaml文件配置数据地址、预处理过程、加速方式等。
+
+主要的APIs如下：
+
+
+
+
+1. 数据解析  
+
+ - `source/coco_loader.py`：用于解析COCO数据集。[详见代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/coco_loader.py)
+ - `source/voc_loader.py`：用于解析Pascal VOC数据集。[详见代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/voc_loader.py)  
+ [注意]在使用VOC数据集时，若不使用默认的label列表，则需要先使用`tools/generate_data_for_training.py`生成`label_list.txt`（使用方式与数据解析中的roidb数据集获取过程一致），或提供`label_list.txt`放置于`data/pascalvoc/ImageSets/Main`中；同时在配置文件中设置参数`use_default_label`为`true`。
+ - `source/loader.py`：用于解析Roidb数据集。[详见代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/loader.py)
+
+2. 算子  
+ `transform/operators.py`：包含多种数据增强方式，主要包括：  
+
+```  python
+RandomFlipImage：水平翻转。
+RandomDistort：随机扰动图片亮度、对比度、饱和度和色相。
+ResizeImage：根据特定的插值方式调整图像大小。
+RandomInterpImage：使用随机的插值方式调整图像大小。
+CropImage：根据缩放比例、长宽比例两个参数生成若干候选框，再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果。
+ExpandImage：将原始图片放进一张使用像素均值填充(随后会在减均值操作中减掉)的扩张图中，再对此图进行裁剪、缩放和翻转。
+DecodeImage：以RGB格式读取图像。
+Permute：对图像的通道进行排列并转为BGR格式。
+NormalizeImage：对图像像素值进行归一化。
+NormalizeBox：对bounding box进行归一化。
+MixupImage：按比例叠加两张图像。
+```
+[注意]：Mixup的操作可参考[论文](https://arxiv.org/pdf/1710.09412.pdf)。
+
+`transform/arrange_sample.py`：实现对输入网络数据的排序。  
+3. 转换  
+`transform/post_map.py`：用于完成批数据的预处理操作，其主要包括：
+
+```  python
+随机调整批数据的图像大小
+多尺度调整图像大小
+padding操作
+```
+`transform/transformer.py`：用于过滤无用的数据，并返回批数据。
+`transform/parallel_map.py`：用于实现加速。  
+4. 读取  
+`reader.py`：用于组合source和transformer操作，根据`max_iter`返回batch数据。
+`data_feed.py`: 用于配置 `reader.py`中所需的默认参数.
+
+
+
+
+### 使用
+#### 常规使用
+结合yaml文件中的配置信息，完成本模块的功能。yaml文件的使用可以参见配置文件部分。
+
+ - 读取用于训练的数据
+
+``` python
+ccfg = load_cfg('./config.yml')
+coco = Reader(ccfg.DATA, ccfg.TRANSFORM, maxiter=-1)
+```
+#### 如何使用自定义数据集？
+
+- 选择1：将数据集转换为VOC格式或者COCO格式。
+```python
+ # 在./tools/中提供了labelme2coco.py用于将labelme标注的数据集转换为COCO数据集
+ python ./tools/labelme2coco.py --json_input_dir ./labelme_annos/
+                                --image_input_dir ./labelme_imgs/
+                                --output_dir ./cocome/
+                                --train_proportion 0.8
+                                --val_proportion 0.2
+                                --test_proportion 0.0
+ # --json_input_dir：使用labelme标注的json文件所在文件夹
+ # --image_input_dir：图像文件所在文件夹
+ # --output_dir：转换后的COCO格式数据集存放位置
+ # --train_proportion：标注数据中用于train的比例
+ # --val_proportion：标注数据中用于validation的比例
+ # --test_proportion: 标注数据中用于infer的比例
+```
+- 选择2：
+
+1. 仿照`./source/coco_loader.py`和`./source/voc_loader.py`，添加`./source/XX_loader.py`并实现`load`函数。  
+2. 在`./source/loader.py`的`load`函数中添加使用`./source/XX_loader.py`的入口。  
+3. 修改`./source/__init__.py`：  
+
+
+```python
+if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
+    source_type = 'RoiDbSource'
+# 将上述代码替换为如下代码：
+if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource', 'XXSource']:
+    source_type = 'RoiDbSource'
+```
+
+4. 在配置文件中修改`dataset`下的`type`为`XXSource`。  
+
+#### 如何增加数据预处理？
+- 若增加单张图像的增强预处理，可在`transform/operators.py`中参考每个类的代码，新建一个类来实现新的数据增强；同时在配置文件中增加该预处理。
+- 若增加单个batch的图像预处理，可在`transform/post_map.py`中参考`build_post_map`中每个函数的代码，新建一个内部函数来实现新的批数据预处理；同时在配置文件中增加该预处理。
--- a/docs/GETTING_STARTED.md
+++ b/docs/GETTING_STARTED.md
+# Getting Started
+
+For setting up the test environment, please refer to [installation
+instructions](INSTALL.md).
+
+
+## Training
+
+
+#### Single-GPU Training
+
+
+```bash
+export CUDA_VISIBLE_DEVICES=0
+python tools/train.py -c configs/faster_rcnn_r50_1x.yml
+```
+
+#### Multi-GPU Training
+
+
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python tools/train.py -c =configs/faster_rcnn_r50_1x.yml
+```
+
+- Datasets is stored in `dataset/coco` by default (configurable).
+- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
+- Model checkpoints is saved in `output` by default (configurable).
+- To check out hyper parameters used, please refer to the config file.
+
+Alternating between training epoch and evaluation run is possible, simply pass
+in `--eval=True` to do so (tested with `SSD` detector on Pascal-VOC, not
+recommended for two stage models or training sessions on COCO dataset)
+
+
+## Evaluation
+
+
+```bash
+export CUDA_VISIBLE_DEVICES=0
+# or run on CPU with:
+# export CPU_NUM=1
+python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
+```
+
+- Checkpoint is loaded from `output` by default (configurable)
+- Multi-GPU evaluation for R-CNN and SSD models is not supported at the
+moment, but it is a planned feature
+
+
+## Inference
+
+
+- Run inference on a single image:
+
+```bash
+export CUDA_VISIBLE_DEVICES=0
+# or run on CPU with:
+# export CPU_NUM=1
+python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
+```
+
+- Batch inference:
+
+```bash
+export CUDA_VISIBLE_DEVICES=0
+# or run on CPU with:
+# export CPU_NUM=1
+python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
+```
+
+The visualization files are saved in `output` by default, to specify a different
+path, simply add a `--save_file=` flag.
+
+
+## FAQ
+
+
+Q: Why do I get `NaN` loss values during single GPU training?
+
+A: The default learning rate is tuned to multi-GPU training (8x GPUs), it must
+be adapted for single GPU training accordingly (e.g., divide by 8).
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
+# Installation
+
+---
+## Table of Contents
+
+- [Introduction](#introduction)
+- [PaddlePaddle](#paddlepaddle)
+- [Other Dependencies](#other-dependencies)
+- [PaddleDetection](#paddle-detection)
+- [Datasets](#datasets)
+
+
+## Introduction
+
+This document covers how to install PaddleDetection, its dependencies
+(including PaddlePaddle), together with COCO and PASCAL VOC dataset.
+
+For general information about PaddleDetection, please see [README.md](../README.md).
+
+
+## PaddlePaddle
+
+Running PaddleDetection requires PaddlePaddle Fluid v.1.5 and later. please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/en/1.4/beginners_guide/install/index_en.html).
+
+Please make sure your PaddlePaddle installation was successful and the version
+of your PaddlePaddle is not lower than required. Verify with the following commands.
+
+```
+# To check if PaddlePaddle installation was sucessful
+python -c "from paddle.fluid import fluid; fluid.install_check.run_check()"
+
+# To check PaddlePaddle version
+python -c "import paddle; print(paddle.__version__)"
+```
+
+### Requirements:
+
+- Python2 or Python3
+- CUDA >= 8.0
+- cuDNN >= 5.0
+- nccl >= 2.1.2
+
+
+## Other Dependencies
+
+[COCO-API](https://github.com/cocodataset/cocoapi):
+
+COCO-API is needed for training. Installation is as follows:
+
+    git clone https://github.com/cocodataset/cocoapi.git
+    cd cocoapi/PythonAPI
+    # if cython is not installed
+    pip install Cython
+    # Install into global site-packages
+    make install
+    # Alternatively, if you do not have permissions or prefer
+    # not to install the COCO API into global site-packages
+    python setup.py install --user
+
+
+## PaddleDetection
+
+**Clone Paddle models repository:**
+
+You can clone Paddle models and change working directory to PaddleDetection
+with the following commands:
+
+```
+cd <path/to/clone/models>
+git clone https://github.com/PaddlePaddle/models
+cd models/PaddleCV/object_detection
+```
+
+**Install Python dependencies:**
+
+Required python packages are specified in [requirements.txt](./requirements.txt), and can be installed with:
+
+```
+pip install -r requirements.txt
+```
+
+**Make sure the tests pass:**
+
+```
+export PYTHONPATH=`pwd`:$PYTHONPATH
+python ppdet/modeling/tests/test_architectures.py
+```
+
+
+## Datasets
+
+PaddleDetection includes support for [MSCOCO](http://cocodataset.org) and [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) by default, please follow these instructions to set up the dataset.
+
+**Create symlinks for local datasets:**
+
+Default dataset path in config files is `data/coco` and `data/voc`, if the
+datasets are already available on disk, you can simply create symlinks to
+their directories:
+
+```
+ln -sf <path/to/coco> <path/to/paddle_detection>/data/coco
+ln -sf <path/to/voc> <path/to/paddle_detection>/data/voc
+```
+
+**Download datasets manually:**
+
+On the other hand, to download the datasets, run the following commands:
+
+- MS-COCO
+
+```
+cd dataset/coco
+./download.sh
+```
+
+- PASCAL VOC
+
+```
+cd dataset/voc
+./download.sh
+```
+
+**Download datasets automatically:**
+
+If a training session is started but the dataset is not setup properly (e.g,
+not found in `data/coc` or `data/voc`), PaddleDetection can automatically
+download them from [MSCOCO-2017](http://images.cocodataset.org) and
+[VOC2012](http://host.robots.ox.ac.uk/pascal/VOC), the decompressed datasets
+will be cached in `~/.cache/paddle/dataset/` and can be discovered automatically
+subsequently.
+
+
+**NOTE:** For further informations on the datasets, please see [DATA.md](DATA.md)
--- a/ppdet/__init__.py
+++ b/ppdet/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/ppdet/core/__init__.py
+++ b/ppdet/core/__init__.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import ppdet.modeling
+import ppdet.optimizer
+import ppdet.data
--- a/ppdet/core/config/__init__.py
+++ b/ppdet/core/config/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/ppdet/core/config/schema.py
+++ b/ppdet/core/config/schema.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import inspect
+import importlib
+import re
+
+try:
+    from docstring_parser import parse as doc_parse
+except Exception:
+
+    def doc_parse(*args):
+        if not doc_parse.__warning_sent__:
+            from ppdet.utils.cli import ColorTTY
+            color_tty = ColorTTY()
+            message = "docstring_parser is not installed, " \
+                + "argument description is not available"
+            print(color_tty.yellow(message))
+            doc_parse.__warning_sent__ = True
+
+    doc_parse.__warning_sent__ = False
+
+try:
+    from typeguard import check_type
+except Exception:
+
+    def check_type(*args):
+        if not check_type.__warning_sent__:
+            from ppdet.utils.cli import ColorTTY
+            color_tty = ColorTTY()
+            message = "typeguard is not installed, type checking is not available"
+            print(color_tty.yellow(message))
+            check_type.__warning_sent__ = True
+
+    check_type.__warning_sent__ = False
+
+__all__ = ['SchemaValue', 'SchemaDict', 'extract_schema']
+
+
+class SchemaValue(object):
+    def __init__(self, name, doc='', type=None):
+        super(SchemaValue, self).__init__()
+        self.name = name
+        self.doc = doc
+        self.type = type
+
+    def set_default(self, value):
+        self.default = value
+
+    def has_default(self):
+        return hasattr(self, 'default')
+
+
+class SchemaDict(dict):
+    def __init__(self, **kwargs):
+        super(SchemaDict, self).__init__()
+        self.schema = {}
+        self.strict = False
+        self.doc = ""
+        self.update(kwargs)
+
+    def __setitem__(self, key, value):
+        # XXX also update regular dict to SchemaDict??
+        if isinstance(value, dict) and key in self and isinstance(self[key],
+                                                                  SchemaDict):
+            self[key].update(value)
+        else:
+            super(SchemaDict, self).__setitem__(key, value)
+
+    def __missing__(self, key):
+        if self.has_default(key):
+            return self.schema[key].default
+        elif key in self.schema:
+            return self.schema[key]
+        else:
+            raise KeyError(key)
+
+    def copy(self):
+        newone = SchemaDict()
+        newone.__dict__.update(self.__dict__)
+        newone.update(self)
+        return newone
+
+    def set_schema(self, key, value):
+        assert isinstance(value, SchemaValue)
+        self.schema[key] = value
+
+    def set_strict(self, strict):
+        self.strict = strict
+
+    def has_default(self, key):
+        return key in self.schema and self.schema[key].has_default()
+
+    def is_default(self, key):
+        if not self.has_default(key):
+            return False
+        if hasattr(self[key], '__dict__'):
+            return True
+        else:
+            return key not in self or self[key] == self.schema[key].default
+
+    def find_default_keys(self):
+        return [
+            k for k in list(self.keys()) + list(self.schema.keys())
+            if self.is_default(k)
+        ]
+
+    def mandatory(self):
+        return any([k for k in self.schema.keys() if not self.has_default(k)])
+
+    def find_missing_keys(self):
+        missing = [
+            k for k in self.schema.keys()
+            if k not in self and not self.has_default(k)
+        ]
+        placeholders = [k for k in self if self[k] in ('<missing>', '<value>')]
+        return missing + placeholders
+
+    def find_extra_keys(self):
+        return list(set(self.keys()) - set(self.schema.keys()))
+
+    def find_mismatch_keys(self):
+        mismatch_keys = []
+        for arg in self.schema.values():
+            if arg.type is not None:
+                try:
+                    check_type("{}.{}".format(self.name, arg.name),
+                               self[arg.name], arg.type)
+                except Exception:
+                    mismatch_keys.append(arg.name)
+        return mismatch_keys
+
+    def validate(self):
+        missing_keys = self.find_missing_keys()
+        if missing_keys:
+            raise ValueError("Missing param for class<{}>: {}".format(
+                self.name, ", ".join(missing_keys)))
+        extra_keys = self.find_extra_keys()
+        if extra_keys and self.strict:
+            raise ValueError("Extraneous param for class<{}>: {}".format(
+                self.name, ", ".join(extra_keys)))
+        mismatch_keys = self.find_mismatch_keys()
+        if mismatch_keys:
+            raise TypeError("Wrong param type for class<{}>: {}".format(
+                self.name, ", ".join(mismatch_keys)))
+
+
+def extract_schema(cls):
+    """
+    Extract schema from a given class
+
+    Args:
+        cls (type): Class from which to extract.
+
+    Returns:
+        schema (SchemaDict): Extracted schema.
+    """
+    ctor = cls.__init__
+    # python 2 compatibility
+    if hasattr(inspect, 'getfullargspec'):
+        argspec = inspect.getfullargspec(ctor)
+        annotations = argspec.annotations
+        has_kwargs = argspec.varkw is not None
+    else:
+        argspec = inspect.getargspec(ctor)
+        # python 2 type hinting workaround, see pep-3107
+        # however, since `typeguard` does not support python 2, type checking
+        # is still python 3 only for now
+        annotations = getattr(ctor, '__annotations__', {})
+        has_kwargs = argspec.keywords is not None
+
+    names = [arg for arg in argspec.args if arg != 'self']
+    defaults = argspec.defaults
+    num_defaults = argspec.defaults is not None and len(argspec.defaults) or 0
+    num_required = len(names) - num_defaults
+
+    docs = cls.__doc__
+    if docs is None and getattr(cls, '__category__', None) == 'op':
+        docs = cls.__call__.__doc__
+    docstring = doc_parse(docs)
+    if docstring is None:
+        comments = {}
+    else:
+        comments = {}
+        for p in docstring.params:
+            match_obj = re.match('^([a-zA-Z_]+[a-zA-Z_0-9]*).*', p.arg_name)
+            if match_obj is not None:
+                comments[match_obj.group(1)] = p.description
+
+    schema = SchemaDict()
+    schema.name = cls.__name__
+    schema.doc = ""
+    if docs is not None:
+        start_pos = docs[0] == '\n' and 1 or 0
+        schema.doc = docs[start_pos:].split("\n")[0].strip()
+    # XXX handle paddle's weird doc convention
+    if '**' == schema.doc[:2] and '**' == schema.doc[-2:]:
+        schema.doc = schema.doc[2:-2].strip()
+    schema.category = hasattr(cls, '__category__') and getattr(
+        cls, '__category__') or 'module'
+    schema.strict = not has_kwargs
+    schema.pymodule = importlib.import_module(cls.__module__)
+    schema.inject = getattr(cls, '__inject__', [])
+    for idx, name in enumerate(names):
+        comment = name in comments and comments[name] or name
+        if name in schema.inject:
+            type_ = None
+        else:
+            type_ = name in annotations and annotations[name] or None
+        value_schema = SchemaValue(name, comment, type_)
+        if idx >= num_required:
+            value_schema.set_default(defaults[idx - num_required])
+        schema.set_schema(name, value_schema)
+
+    return schema
--- a/ppdet/core/config/yaml_helpers.py
+++ b/ppdet/core/config/yaml_helpers.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import importlib
+import inspect
+
+import yaml
+
+__all__ = ['serializable', 'Callable']
+
+
+def _make_python_constructor(cls):
+    def python_constructor(loader, node):
+        if isinstance(node, yaml.SequenceNode):
+            args = loader.construct_sequence(node, deep=True)
+            return cls(*args)
+        else:
+            kwargs = loader.construct_mapping(node, deep=True)
+            try:
+                return cls(**kwargs)
+            except Exception as ex:
+                print("Error when construct {} instance from yaml config".
+                      format(cls.__name__))
+                raise ex
+
+    return python_constructor
+
+
+def _make_python_representer(cls):
+    # python 2 compatibility
+    if hasattr(inspect, 'getfullargspec'):
+        argspec = inspect.getfullargspec(cls)
+    else:
+        argspec = inspect.getargspec(cls.__init__)
+    argnames = [arg for arg in argspec.args if arg != 'self']
+
+    def python_representer(dumper, obj):
+        if argnames:
+            data = {name: getattr(obj, name) for name in argnames}
+        else:
+            data = obj.__dict__
+        if '_id' in data:
+            del data['_id']
+        return dumper.represent_mapping(u'!{}'.format(cls.__name__), data)
+
+    return python_representer
+
+
+def serializable(cls):
+    """
+    Add loader and dumper for given class, which must be "trivially serializable"
+
+    Args:
+        cls: class to be serialized
+
+    Returns: cls
+    """
+    yaml.add_constructor(u'!{}'.format(cls.__name__),
+                         _make_python_constructor(cls))
+    yaml.add_representer(cls, _make_python_representer(cls))
+    return cls
+
+
+@serializable
+class Callable(object):
+    """
+    Helper to be used in Yaml for creating arbitrary class objects
+
+    Args:
+        full_type (str): the full module path to target function
+    """
+
+    def __init__(self, full_type, args=[], kwargs={}):
+        super(Callable, self).__init__()
+        self.full_type = full_type
+        self.args = args
+        self.kwargs = kwargs
+
+    def __call__(self):
+        if '.' in self.full_type:
+            idx = self.full_type.rfind('.')
+            module = importlib.import_module(self.full_type[:idx])
+            func_name = self.full_type[idx + 1:]
+        else:
+            try:
+                module = importlib.import_module('builtins')
+            except Exception:
+                module = importlib.import_module('__builtin__')
+            func_name = self.full_type
+
+        func = getattr(module, func_name)
+        return func(*self.args, **self.kwargs)
--- a/ppdet/core/workspace.py
+++ b/ppdet/core/workspace.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import importlib
+import os
+import sys
+
+import yaml
+
+from .config.schema import SchemaDict, extract_schema
+from .config.yaml_helpers import serializable
+
+__all__ = [
+    'global_config', 'load_config', 'merge_config', 'get_registered_modules',
+    'create', 'register', 'serializable'
+]
+
+
+class AttrDict(dict):
+    """Single level attribute dict, NOT recursive"""
+
+    def __init__(self, **kwargs):
+        super(AttrDict, self).__init__()
+        super(AttrDict, self).update(kwargs)
+
+    def __getattr__(self, key):
+        if key in self:
+            return self[key]
+        raise AttributeError("object has no attribute '{}'".format(key))
+
+
+global_config = AttrDict()
+
+
+def load_config(file_path):
+    """
+    Load config from file.
+
+    Args:
+        file_path (str): Path of the config file to be loaded.
+
+    Returns: global config
+    """
+    _, ext = os.path.splitext(file_path)
+    assert ext in ['.yml', '.yaml'], "only support yaml files for now"
+    merge_config(yaml.load(open(file_path), Loader=yaml.Loader))
+    return global_config
+
+
+def merge_config(config):
+    """
+    Merge config into global config.
+
+    Args:
+        config (dict): Config to be merged.
+
+    Returns: global config
+    """
+    for key, value in config.items():
+        if isinstance(value, dict) and key in global_config:
+            global_config[key].update(value)
+        else:
+            global_config[key] = value
+
+
+def get_registered_modules():
+    return {k: v for k, v in global_config.items() if isinstance(v, SchemaDict)}
+
+
+def make_partial(cls):
+    op_module = importlib.import_module(cls.__op__.__module__)
+    op = getattr(op_module, cls.__op__.__name__)
+    cls.__category__ = getattr(cls, '__category__', None) or 'op'
+
+    def partial_apply(self, *args, **kwargs):
+        kwargs_ = self.__dict__.copy()
+        kwargs_.update(kwargs)
+        return op(*args, **kwargs_)
+
+    if getattr(cls, '__append_doc__', True):  # XXX should default to True?
+        if sys.version_info[0] > 2:
+            cls.__doc__ = "Wrapper for `{}` OP".format(op.__name__)
+            cls.__init__.__doc__ = op.__doc__
+            cls.__call__ = partial_apply
+            cls.__call__.__doc__ = op.__doc__
+        else:
+            # XXX work around for python 2
+            partial_apply.__doc__ = op.__doc__
+            cls.__call__ = partial_apply
+    return cls
+
+
+def register(cls):
+    """
+    Register a given module class.
+
+    Args:
+        cls (type): Module class to be registered.
+
+    Returns: cls
+    """
+    if cls.__name__ in global_config:
+        raise ValueError("Module class already registered: {}".format(
+            cls.__name__))
+    if hasattr(cls, '__op__'):
+        cls = make_partial(cls)
+    global_config[cls.__name__] = extract_schema(cls)
+    return cls
+
+
+def create(cls_or_name, **kwargs):
+    """
+    Create an instance of given module class.
+
+    Args:
+        cls_or_name (type or str): Class of which to create instance.
+
+    Returns: instance of type `cls_or_name`
+    """
+    assert type(cls_or_name) in [type, str
+                                 ], "should be a class or name of a class"
+    name = type(cls_or_name) == str and cls_or_name or cls_or_name.__name__
+    assert name in global_config and isinstance(global_config[name], SchemaDict), \
+        "the module {} is not registered".format(name)
+    config = global_config[name]
+    config.update(kwargs)
+    config.validate()
+    cls = getattr(config.pymodule, name)
+
+    kwargs = {}
+    kwargs.update(global_config[name])
+    if getattr(config, 'inject', None):
+        for k in config.inject:
+            target_key = global_config[name][k]
+            # optional dependency
+            if target_key is None:
+                continue
+            # also accept dictionaries and serialized objects
+            if isinstance(target_key, dict) or hasattr(target_key, '__dict__'):
+                continue
+            elif isinstance(target_key, str):
+                if target_key not in global_config:
+                    raise ValueError("Missing injection config:", target_key)
+                target = global_config[target_key]
+                if isinstance(target, SchemaDict):
+                    kwargs[k] = create(target_key)
+                elif hasattr(target, '__dict__'):  # serialized object
+                    kwargs[k] = target
+            else:
+                raise ValueError("Unsupported injection type:", target_key)
+    return cls(**kwargs)
--- a/ppdet/data/README.md
+++ b/ppdet/data/README.md
+docs/DATA.md
\ No newline at end of file
--- a/ppdet/data/README_cn.md
+++ b/ppdet/data/README_cn.md
+docs/DATA_cn.md
\ No newline at end of file
--- a/ppdet/data/__init__.py
+++ b/ppdet/data/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# function:
+#    module to prepare data for detection model training
+#
+# implementation notes:
+# - Dateset
+#    basic interface to accessing data samples in stream mode
+#
+# - xxxSource (RoiDbSource)
+#    * subclass of 'Dataset'
+#    * load data from local files and other source data
+#
+# - xxxOperator (DecodeImage)
+#    * subclass of 'BaseOperator'
+#    * each op can transform a sample, eg: decode/resize/crop image
+#    * each op must obey basic rules defined in transform.operator.base
+#
+# - transformer
+#    * subclass of 'Dataset'
+#    * 'MappedDataset' accept a 'xxxSource' and a list of 'xxxOperator'
+#       to build a transformed 'Dataset'
+
+from __future__ import absolute_import
+
+from .dataset import Dataset
+from .reader import Reader
+from .data_feed import create_reader
+
+__all__ = ['Dataset', 'Reader', 'create_reader']
--- a/ppdet/data/data_feed.py
+++ b/ppdet/data/data_feed.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import os
+import inspect
+
+from ppdet.core.workspace import register, serializable
+from ppdet.utils.download import get_dataset_path
+
+from ppdet.data.reader import Reader
+# XXX these are for triggering the decorator
+from ppdet.data.transform.operators import (
+    DecodeImage, MixupImage, NormalizeBox, NormalizeImage, RandomDistort,
+    RandomFlipImage, RandomInterpImage, ResizeImage, ExpandImage, CropImage,
+    Permute)
+from ppdet.data.transform.arrange_sample import (ArrangeRCNN, ArrangeTestRCNN,
+                                                 ArrangeSSD, ArrangeTestSSD,
+                                                 ArrangeYOLO, ArrangeTestYOLO)
+
+__all__ = [
+    'PadBatch', 'MultiScale', 'RandomShape', 'DataSet', 'CocoDataSet',
+    'DataFeed', 'TrainFeed', 'EvalFeed', 'FasterRCNNTrainFeed',
+    'MaskRCNNTrainFeed', 'FasterRCNNTestFeed', 'MaskRCNNTestFeed',
+    'SSDTrainFeed', 'SSDEvalFeed', 'SSDTestFeed', 'YoloTrainFeed',
+    'YoloEvalFeed', 'YoloTestFeed', 'create_reader'
+]
+
+
+def create_reader(feed, max_iter=0):
+    """
+    Return iterable data reader.
+
+    Args:
+        max_iter (int): number of iterations.
+    """
+
+    # if `DATASET_DIR` does not exists, search ~/.paddle/dataset for a directory
+    # named `DATASET_DIR` (e.g., coco, pascal), if not present either, download
+    if feed.dataset.dataset_dir:
+        dataset_dir = get_dataset_path(feed.dataset.dataset_dir)
+        feed.dataset.annotation = os.path.join(dataset_dir,
+                                               feed.dataset.annotation)
+        feed.dataset.image_dir = os.path.join(dataset_dir,
+                                              feed.dataset.image_dir)
+
+    mixup_epoch = -1
+    if getattr(feed, 'mixup_epoch', None) is not None:
+        mixup_epoch = feed.mixup_epoch
+    bufsize = 10
+    use_process = False
+    if getattr(feed, 'bufsize', None) is not None:
+        bufsize = feed.bufsize
+    if getattr(feed, 'use_process', None) is not None:
+        use_process = feed.use_process
+
+    mode = feed.mode
+    data_config = {
+        mode: {
+            'ANNO_FILE': feed.dataset.annotation,
+            'IMAGE_DIR': feed.dataset.image_dir,
+            'USE_DEFAULT_LABEL': feed.dataset.use_default_label,
+            'IS_SHUFFLE': feed.shuffle,
+            'SAMPLES': feed.samples,
+            'WITH_BACKGROUND': feed.with_background,
+            'MIXUP_EPOCH': mixup_epoch,
+            'TYPE': type(feed.dataset).__source__
+        }
+    }
+
+    if len(getattr(feed.dataset, 'images', [])) > 0:
+        data_config[mode]['IMAGES'] = feed.dataset.images
+
+    transform_config = {
+        'WORKER_CONF': {
+            'bufsize': bufsize,
+            'worker_num': feed.num_workers,
+            'use_process': use_process
+        },
+        'BATCH_SIZE': feed.batch_size,
+        'DROP_LAST': feed.drop_last,
+        'USE_PADDED_IM_INFO': feed.use_padded_im_info,
+    }
+
+    batch_transforms = feed.batch_transforms
+    pad = [t for t in batch_transforms if isinstance(t, PadBatch)]
+    rand_shape = [t for t in batch_transforms if isinstance(t, RandomShape)]
+    multi_scale = [t for t in batch_transforms if isinstance(t, MultiScale)]
+
+    if any(pad):
+        transform_config['IS_PADDING'] = True
+        if pad[0].pad_to_stride != 0:
+            transform_config['COARSEST_STRIDE'] = pad[0].pad_to_stride
+    if any(rand_shape):
+        transform_config['RANDOM_SHAPES'] = rand_shape[0].sizes
+    if any(multi_scale):
+        transform_config['MULTI_SCALES'] = multi_scale[0].scales
+
+    if hasattr(inspect, 'getfullargspec'):
+        argspec = inspect.getfullargspec
+    else:
+        argspec = inspect.getargspec
+
+    ops = []
+    for op in feed.sample_transforms:
+        op_dict = op.__dict__.copy()
+        argnames = [
+            arg for arg in argspec(type(op).__init__).args if arg != 'self'
+        ]
+        op_dict = {k: v for k, v in op_dict.items() if k in argnames}
+        op_dict['op'] = op.__class__.__name__
+        ops.append(op_dict)
+    transform_config['OPS'] = ops
+
+    reader = Reader(data_config, {mode: transform_config}, max_iter)
+    return reader._make_reader(mode)
+
+
+# XXX batch transforms are only stubs for now, actually handled by `post_map`
+@serializable
+class PadBatch(object):
+    """
+    Pad a batch of samples to same dimensions
+
+    Args:
+        pad_to_stride (int): pad to multiple of strides, e.g., 32
+    """
+
+    def __init__(self, pad_to_stride=0):
+        super(PadBatch, self).__init__()
+        self.pad_to_stride = pad_to_stride
+
+
+@serializable
+class MultiScale(object):
+    """
+    Randomly resize image by scale
+
+    Args:
+        scales (list): list of int, randomly resize to one of these scales
+    """
+
+    def __init__(self, scales=[]):
+        super(MultiScale, self).__init__()
+        self.scales = scales
+
+
+@serializable
+class RandomShape(object):
+    """
+    Randomly reshape a batch
+
+    Args:
+        sizes (list): list of int, random choose a size from these
+    """
+
+    def __init__(self, sizes=[]):
+        super(RandomShape, self).__init__()
+        self.sizes = sizes
+
+
+@serializable
+class DataSet(object):
+    """
+    Dataset, e.g., coco, pascal voc
+
+    Args:
+        annotation (str): annotation file path
+        image_dir (str): directory where image files are stored
+        num_classes (int): number of classes
+        shuffle (bool): shuffle samples
+    """
+    __source__ = 'RoiDbSource'
+
+    def __init__(self,
+                 annotation,
+                 image_dir,
+                 dataset_dir=None,
+                 use_default_label=None):
+        super(DataSet, self).__init__()
+        self.dataset_dir = dataset_dir
+        self.annotation = annotation
+        self.image_dir = image_dir
+        self.use_default_label = use_default_label
+
+
+COCO_DATASET_DIR = 'coco'
+COCO_TRAIN_ANNOTATION = 'annotations/instances_train2017.json'
+COCO_TRAIN_IMAGE_DIR = 'train2017'
+COCO_VAL_ANNOTATION = 'annotations/instances_val2017.json'
+COCO_VAL_IMAGE_DIR = 'val2017'
+
+
+@serializable
+class CocoDataSet(DataSet):
+    def __init__(self,
+                 dataset_dir=COCO_DATASET_DIR,
+                 annotation=COCO_TRAIN_ANNOTATION,
+                 image_dir=COCO_TRAIN_IMAGE_DIR):
+        super(CocoDataSet, self).__init__(
+            dataset_dir=dataset_dir, annotation=annotation, image_dir=image_dir)
+
+
+VOC_DATASET_DIR = 'pascalvoc'
+VOC_TRAIN_ANNOTATION = 'VOCdevkit/VOC_all/ImageSets/Main/train.txt'
+VOC_VAL_ANNOTATION = 'VOCdevkit/VOC_all/ImageSets/Main/val.txt'
+VOC_TEST_ANNOTATION = 'VOCdevkit/VOC_all/ImageSets/Main/test.txt'
+VOC_IMAGE_DIR = 'VOCdevkit/VOC_all/JPEGImages'
+VOC_USE_DEFAULT_LABEL = None
+
+
+@serializable
+class VocDataSet(DataSet):
+    __source__ = 'VOCSource'
+
+    def __init__(self,
+                 dataset_dir=VOC_DATASET_DIR,
+                 annotation=VOC_TRAIN_ANNOTATION,
+                 image_dir=VOC_IMAGE_DIR,
+                 use_default_label=VOC_USE_DEFAULT_LABEL):
+        super(VocDataSet, self).__init__(
+            dataset_dir=dataset_dir,
+            annotation=annotation,
+            image_dir=image_dir,
+            use_default_label=use_default_label)
+
+
+@serializable
+class SimpleDataSet(DataSet):
+    __source__ = 'SimpleSource'
+
+    def __init__(self,
+                 dataset_dir=None,
+                 annotation=None,
+                 image_dir=None,
+                 use_default_label=None):
+        super(SimpleDataSet, self).__init__(
+            dataset_dir=dataset_dir, annotation=annotation, image_dir=image_dir)
+        self.images = []
+
+    def add_images(self, images):
+        self.images.extend(images)
+
+
+@serializable
+class DataFeed(object):
+    """
+    DataFeed encompasses all data loading related settings
+
+    Args:
+        dataset (object): a `Dataset` instance
+        fields (list): list of data fields needed
+        image_shape (list): list of image dims (C, MAX_DIM, MIN_DIM)
+        sample_transforms (list): list of sample transformations to use
+        batch_transforms (list): list of batch transformations to use
+        batch_size (int): number of images per device
+        shuffle (bool): if samples should be shuffled
+        drop_last (bool): drop last batch if size is uneven
+        num_workers (int): number of workers processes (or threads)
+    """
+    __category__ = 'data'
+
+    def __init__(self,
+                 dataset,
+                 fields,
+                 image_shape,
+                 sample_transforms=None,
+                 batch_transforms=None,
+                 batch_size=1,
+                 shuffle=False,
+                 samples=-1,
+                 drop_last=False,
+                 with_background=True,
+                 num_workers=2,
+                 bufsize=10,
+                 use_process=False,
+                 use_padded_im_info=False):
+        super(DataFeed, self).__init__()
+        self.fields = fields
+        self.image_shape = image_shape
+        self.sample_transforms = sample_transforms
+        self.batch_transforms = batch_transforms
+        self.batch_size = batch_size
+        self.shuffle = shuffle
+        self.samples = samples
+        self.drop_last = drop_last
+        self.with_background = with_background
+        self.num_workers = num_workers
+        self.bufsize = bufsize
+        self.use_process = use_process
+        self.dataset = dataset
+        self.use_padded_im_info = use_padded_im_info
+        if isinstance(dataset, dict):
+            self.dataset = DataSet(**dataset)
+
+
+# for custom (i.e., Non-preset) datasets
+@register
+class TrainFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset,
+                 fields,
+                 image_shape,
+                 sample_transforms=[],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=True,
+                 samples=-1,
+                 drop_last=False,
+                 with_background=True,
+                 num_workers=2,
+                 bufsize=10,
+                 use_process=True):
+        super(TrainFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            with_background=with_background,
+            num_workers=num_workers,
+            bufsize=bufsize,
+            use_process=use_process, )
+
+
+@register
+class EvalFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset,
+                 fields,
+                 image_shape,
+                 sample_transforms=[],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=False,
+                 samples=-1,
+                 drop_last=False,
+                 with_background=True,
+                 num_workers=2):
+        super(EvalFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            with_background=with_background,
+            num_workers=num_workers)
+
+
+@register
+class TestFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset,
+                 fields,
+                 image_shape,
+                 sample_transforms=[],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=False,
+                 drop_last=False,
+                 with_background=True,
+                 num_workers=2):
+        super(TestFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            drop_last=drop_last,
+            with_background=with_background,
+            num_workers=num_workers)
+
+
+@register
+class FasterRCNNTrainFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=CocoDataSet().__dict__,
+                 fields=[
+                     'image', 'im_info', 'im_id', 'gt_box', 'gt_label',
+                     'is_crowd'
+                 ],
+                 image_shape=[3, 1333, 800],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True),
+                     RandomFlipImage(prob=0.5),
+                     NormalizeImage(mean=[0.485, 0.456, 0.406],
+                                    std=[0.229, 0.224, 0.225],
+                                    is_scale=True,
+                                    is_channel_first=False),
+                     ResizeImage(target_size=800, max_size=1333, interp=1),
+                     Permute(to_bgr=False)
+                 ],
+                 batch_transforms=[PadBatch()],
+                 batch_size=1,
+                 shuffle=True,
+                 samples=-1,
+                 drop_last=False,
+                 num_workers=2,
+                 use_process=False):
+        # XXX this should be handled by the data loader, since `fields` is
+        # given, just collect them
+        sample_transforms.append(ArrangeRCNN())
+        super(FasterRCNNTrainFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            num_workers=num_workers,
+            use_process=use_process)
+        # XXX these modes should be unified
+        self.mode = 'TRAIN'
+
+
+@register
+class FasterRCNNEvalFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=CocoDataSet(COCO_VAL_ANNOTATION,
+                                     COCO_VAL_IMAGE_DIR).__dict__,
+                 fields=['image', 'im_info', 'im_id', 'im_shape'],
+                 image_shape=[3, 1333, 800],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True),
+                     NormalizeImage(mean=[0.485, 0.456, 0.406],
+                                    std=[0.229, 0.224, 0.225],
+                                    is_scale=True,
+                                    is_channel_first=False),
+                     ResizeImage(target_size=800, max_size=1333, interp=1),
+                     Permute(to_bgr=False)
+                 ],
+                 batch_transforms=[PadBatch()],
+                 batch_size=1,
+                 shuffle=False,
+                 samples=-1,
+                 drop_last=False,
+                 num_workers=2,
+                 use_padded_im_info=True):
+        sample_transforms.append(ArrangeTestRCNN())
+        super(FasterRCNNEvalFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            num_workers=num_workers,
+            use_padded_im_info=use_padded_im_info)
+        self.mode = 'VAL'
+
+
+@register
+class FasterRCNNTestFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=SimpleDataSet(COCO_VAL_ANNOTATION,
+                                       COCO_VAL_IMAGE_DIR).__dict__,
+                 fields=['image', 'im_info', 'im_id', 'im_shape'],
+                 image_shape=[3, 1333, 800],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True),
+                     NormalizeImage(mean=[0.485, 0.456, 0.406],
+                                    std=[0.229, 0.224, 0.225],
+                                    is_scale=True,
+                                    is_channel_first=False),
+                     Permute(to_bgr=False)
+                 ],
+                 batch_transforms=[PadBatch()],
+                 batch_size=1,
+                 shuffle=False,
+                 samples=-1,
+                 drop_last=False,
+                 num_workers=2,
+                 use_padded_im_info=True):
+        sample_transforms.append(ArrangeTestRCNN())
+        if isinstance(dataset, dict):
+            dataset = SimpleDataSet(**dataset)
+        super(FasterRCNNTestFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            num_workers=num_workers,
+            use_padded_im_info=use_padded_im_info)
+        self.mode = 'TEST'
+
+
+# XXX currently use two presets, in the future, these should be combined into a
+# single `RCNNTrainFeed`. Mask (and keypoint) should be processed
+# automatically if `gt_mask` (or `gt_keypoints`) is in the required fields
+@register
+class MaskRCNNTrainFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=CocoDataSet().__dict__,
+                 fields=[
+                     'image', 'im_info', 'im_id', 'gt_box', 'gt_label',
+                     'is_crowd', 'gt_mask'
+                 ],
+                 image_shape=[3, 1333, 800],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True),
+                     RandomFlipImage(prob=0.5, is_mask_flip=True),
+                     NormalizeImage(mean=[0.485, 0.456, 0.406],
+                                    std=[0.229, 0.224, 0.225],
+                                    is_scale=True,
+                                    is_channel_first=False),
+                     ResizeImage(target_size=800,
+                                 max_size=1333,
+                                 interp=1,
+                                 use_cv2=True),
+                     Permute(to_bgr=False, channel_first=True)
+                 ],
+                 batch_transforms=[PadBatch()],
+                 batch_size=1,
+                 shuffle=True,
+                 samples=-1,
+                 drop_last=False,
+                 num_workers=2,
+                 use_process=False,
+                 use_padded_im_info=False):
+        sample_transforms.append(ArrangeRCNN(is_mask=True))
+        super(MaskRCNNTrainFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            num_workers=num_workers,
+            use_process=use_process)
+        self.mode = 'TRAIN'
+
+
+@register
+class MaskRCNNEvalFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=CocoDataSet(COCO_VAL_ANNOTATION,
+                                     COCO_VAL_IMAGE_DIR).__dict__,
+                 fields=['image', 'im_info', 'im_id', 'im_shape'],
+                 image_shape=[3, 1333, 800],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True),
+                     NormalizeImage(mean=[0.485, 0.456, 0.406],
+                                    std=[0.229, 0.224, 0.225],
+                                    is_scale=True,
+                                    is_channel_first=False),
+                     ResizeImage(target_size=800,
+                                 max_size=1333,
+                                 interp=1,
+                                 use_cv2=True),
+                     Permute(to_bgr=False, channel_first=True)
+                 ],
+                 batch_transforms=[PadBatch()],
+                 batch_size=1,
+                 shuffle=False,
+                 samples=-1,
+                 drop_last=False,
+                 num_workers=2,
+                 use_process=False,
+                 use_padded_im_info=True):
+        sample_transforms.append(ArrangeTestRCNN())
+        super(MaskRCNNEvalFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            num_workers=num_workers,
+            use_process=use_process,
+            use_padded_im_info=use_padded_im_info)
+        self.mode = 'VAL'
+
+
+@register
+class MaskRCNNTestFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=SimpleDataSet(COCO_VAL_ANNOTATION,
+                                       COCO_VAL_IMAGE_DIR).__dict__,
+                 fields=['image', 'im_info', 'im_id', 'im_shape'],
+                 image_shape=[3, 1333, 800],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True),
+                     NormalizeImage(
+                         mean=[0.485, 0.456, 0.406],
+                         std=[0.229, 0.224, 0.225],
+                         is_scale=True,
+                         is_channel_first=False),
+                     Permute(to_bgr=False, channel_first=True)
+                 ],
+                 batch_transforms=[PadBatch()],
+                 batch_size=1,
+                 shuffle=False,
+                 samples=-1,
+                 drop_last=False,
+                 num_workers=2,
+                 use_process=False,
+                 use_padded_im_info=True):
+        sample_transforms.append(ArrangeTestRCNN())
+        if isinstance(dataset, dict):
+            dataset = SimpleDataSet(**dataset)
+        super(MaskRCNNTestFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            num_workers=num_workers,
+            use_process=use_process,
+            use_padded_im_info=use_padded_im_info)
+        self.mode = 'TEST'
+
+
+@register
+class SSDTrainFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=VocDataSet().__dict__,
+                 fields=['image', 'gt_box', 'gt_label', 'is_difficult'],
+                 image_shape=[3, 300, 300],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True, with_mixup=False),
+                     NormalizeBox(),
+                     RandomDistort(brightness_lower=0.875,
+                                   brightness_upper=1.125,
+                                   is_order=True),
+                     ExpandImage(max_ratio=4, prob=0.5),
+                     CropImage(batch_sampler=[[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 0.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 0.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 0.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 0.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 0.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]],
+                               satisfy_all=False, avoid_no_bbox=False),
+                     ResizeImage(target_size=300, use_cv2=False, interp=1),
+                     RandomFlipImage(is_normalized=True),
+                     Permute(),
+                     NormalizeImage(mean=[127.5, 127.5, 127.5],
+                                    std=[127.502231, 127.502231, 127.502231],
+                                    is_scale=False)
+                 ],
+                 batch_transforms=[],
+                 batch_size=32,
+                 shuffle=True,
+                 samples=-1,
+                 drop_last=True,
+                 num_workers=8,
+                 bufsize=10,
+                 use_process=True):
+        sample_transforms.append(ArrangeSSD())
+        if isinstance(dataset, dict):
+            dataset = VocDataSet(**dataset)
+        super(SSDTrainFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            num_workers=num_workers,
+            use_process=use_process)
+        self.mode = 'TRAIN'
+
+
+@register
+class SSDEvalFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(
+            self,
+            dataset=VocDataSet(VOC_VAL_ANNOTATION).__dict__,
+            fields=['image', 'gt_box', 'gt_label', 'is_difficult'],
+            image_shape=[3, 300, 300],
+            sample_transforms=[
+                DecodeImage(to_rgb=True, with_mixup=False),
+                NormalizeBox(),
+                ResizeImage(target_size=300, use_cv2=False, interp=1),
+                RandomFlipImage(is_normalized=True),
+                Permute(),
+                NormalizeImage(
+                    mean=[127.5, 127.5, 127.5],
+                    std=[127.502231, 127.502231, 127.502231],
+                    is_scale=False)
+            ],
+            batch_transforms=[],
+            batch_size=64,
+            shuffle=False,
+            samples=-1,
+            drop_last=True,
+            num_workers=8,
+            bufsize=10,
+            use_process=False):
+        sample_transforms.append(ArrangeSSD())
+        if isinstance(dataset, dict):
+            dataset = VocDataSet(**dataset)
+        super(SSDEvalFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            num_workers=num_workers,
+            use_process=use_process)
+        self.mode = 'VAL'
+
+
+@register
+class SSDTestFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=SimpleDataSet(VOC_TEST_ANNOTATION).__dict__,
+                 fields=['image', 'im_id'],
+                 image_shape=[3, 300, 300],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True),
+                     ResizeImage(target_size=300, use_cv2=False, interp=1),
+                     Permute(),
+                     NormalizeImage(
+                         mean=[127.5, 127.5, 127.5],
+                         std=[127.502231, 127.502231, 127.502231],
+                         is_scale=False)
+                 ],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=False,
+                 samples=-1,
+                 drop_last=False,
+                 num_workers=8,
+                 bufsize=10,
+                 use_process=False):
+        sample_transforms.append(ArrangeTestSSD())
+        if isinstance(dataset, dict):
+            dataset = SimpleDataSet(**dataset)
+        super(SSDTestFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            num_workers=num_workers)
+        self.mode = 'TEST'
+
+
+@register
+class YoloTrainFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=CocoDataSet().__dict__,
+                 fields=['image', 'gt_box', 'gt_label', 'gt_score'],
+                 image_shape=[3, 608, 608],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True, with_mixup=True),
+                     MixupImage(alpha=1.5, beta=1.5),
+                     NormalizeBox(),
+                     RandomDistort(),
+                     ExpandImage(max_ratio=4., prob=.5,
+                                 mean=[123.675, 116.28, 103.53]),
+                     CropImage([[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
+                                [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]),
+                     RandomInterpImage(target_size=608),
+                     RandomFlipImage(is_normalized=True),
+                     NormalizeImage(
+                         mean=[0.485, 0.456, 0.406],
+                         std=[0.229, 0.224, 0.225],
+                         is_scale=True,
+                         is_channel_first=False),
+                     Permute(to_bgr=False),
+                 ],
+                 batch_transforms=[
+                     RandomShape(sizes=[
+                         320, 352, 384, 416, 448, 480, 512, 544, 576, 608
+                     ])
+                 ],
+                 batch_size=8,
+                 shuffle=True,
+                 samples=-1,
+                 drop_last=True,
+                 with_background=False,
+                 num_workers=8,
+                 bufsize=128,
+                 use_process=True,
+                 num_max_boxes=50,
+                 mixup_epoch=250):
+        sample_transforms.append(ArrangeYOLO())
+        super(YoloTrainFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            with_background=with_background,
+            num_workers=num_workers,
+            bufsize=bufsize,
+            use_process=use_process)
+        self.num_max_boxes = num_max_boxes
+        self.mixup_epoch = mixup_epoch
+        self.mode = 'TRAIN'
+
+
+@register
+class YoloEvalFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=CocoDataSet(COCO_VAL_ANNOTATION,
+                                     COCO_VAL_IMAGE_DIR).__dict__,
+                 fields=['image', 'im_shape', 'im_id'],
+                 image_shape=[3, 608, 608],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True),
+                     ResizeImage(target_size=608, interp=2),
+                     NormalizeImage(
+                         mean=[0.485, 0.456, 0.406],
+                         std=[0.229, 0.224, 0.225],
+                         is_scale=True,
+                         is_channel_first=False),
+                     Permute(to_bgr=False),
+                 ],
+                 batch_transforms=[],
+                 batch_size=8,
+                 shuffle=False,
+                 samples=-1,
+                 drop_last=False,
+                 with_background=False,
+                 num_workers=8,
+                 num_max_boxes=50,
+                 use_process=False):
+        sample_transforms.append(ArrangeTestYOLO())
+        super(YoloEvalFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            with_background=with_background,
+            num_workers=num_workers,
+            use_process=use_process)
+        self.num_max_boxes = num_max_boxes
+        self.mode = 'VAL'
+        self.bufsize = 128
+
+
+@register
+class YoloTestFeed(DataFeed):
+    __doc__ = DataFeed.__doc__
+
+    def __init__(self,
+                 dataset=SimpleDataSet(COCO_VAL_ANNOTATION,
+                                       COCO_VAL_IMAGE_DIR).__dict__,
+                 fields=['image', 'im_shape', 'im_id'],
+                 image_shape=[3, 608, 608],
+                 sample_transforms=[
+                     DecodeImage(to_rgb=True),
+                     ResizeImage(target_size=608, interp=2),
+                     NormalizeImage(mean=[0.485, 0.456, 0.406],
+                                    std=[0.229, 0.224, 0.225],
+                                    is_scale=True,
+                                    is_channel_first=False),
+                     Permute(to_bgr=False),
+                 ],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=False,
+                 samples=-1,
+                 drop_last=False,
+                 with_background=False,
+                 num_workers=8,
+                 num_max_boxes=50,
+                 use_process=False):
+        sample_transforms.append(ArrangeTestYOLO())
+        if isinstance(dataset, dict):
+            dataset = SimpleDataSet(**dataset)
+        super(YoloTestFeed, self).__init__(
+            dataset,
+            fields,
+            image_shape,
+            sample_transforms,
+            batch_transforms,
+            batch_size=batch_size,
+            shuffle=shuffle,
+            samples=samples,
+            drop_last=drop_last,
+            with_background=with_background,
+            num_workers=num_workers,
+            use_process=use_process)
+        self.num_max_boxes = num_max_boxes
+        self.mode = 'TEST'
+        self.bufsize = 128
--- a/ppdet/data/dataset.py
+++ b/ppdet/data/dataset.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# function:
+#    interface for accessing data samples in stream
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+
+class Dataset(object):
+    """interface to access a stream of data samples"""
+
+    def __init__(self):
+        self._epoch = -1
+
+    def __next__(self):
+        return self.next()
+
+    def __iter__(self):
+        return self
+
+    def __str__(self):
+        return "{}(fname:{}, epoch:{:d}, size:{:d}, pos:{:d})".format(
+            type(self).__name__, self._fname, self._epoch,
+            self.size(), self._pos)
+
+    def next(self):
+        """get next sample"""
+        raise NotImplementedError('%s.next not available' %
+                                  (self.__class__.__name__))
+
+    def reset(self):
+        """reset to initial status and begins a new epoch"""
+        raise NotImplementedError('%s.reset not available' %
+                                  (self.__class__.__name__))
+
+    def size(self):
+        """get number of samples in this dataset"""
+        raise NotImplementedError('%s.size not available' %
+                                  (self.__class__.__name__))
+
+    def drained(self):
+        """whether all sampled has been readed out for this epoch"""
+        raise NotImplementedError('%s.drained not available' %
+                                  (self.__class__.__name__))
+
+    def epoch_id(self):
+        """return epoch id for latest sample"""
+        raise NotImplementedError('%s.epoch_id not available' %
+                                  (self.__class__.__name__))
--- a/ppdet/data/reader.py
+++ b/ppdet/data/reader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# function:
+#    Interface to build readers for detection data like COCO or VOC
+#
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+from numbers import Integral
+
+import logging
+from .source import build_source
+from .transform import build_mapper, map, batch, batch_map
+
+logger = logging.getLogger(__name__)
+
+
+class Reader(object):
+    """Interface to make readers for training or evaluation"""
+
+    def __init__(self, data_cf, trans_conf, maxiter=-1):
+        self._data_cf = data_cf
+        self._trans_conf = trans_conf
+        self._maxiter = maxiter
+        self._cname2cid = None
+        assert isinstance(self._maxiter, Integral), "maxiter should be int"
+
+    def _make_reader(self, mode):
+        """Build reader for training or validation"""
+        file_conf = self._data_cf[mode]
+
+        # 1, Build data source
+
+        sc_conf = {'data_cf': file_conf, 'cname2cid': self._cname2cid}
+        sc = build_source(sc_conf)
+
+        # 2, Buid a transformed dataset
+        ops = self._trans_conf[mode]['OPS']
+        batchsize = self._trans_conf[mode]['BATCH_SIZE']
+        drop_last = False if 'DROP_LAST' not in \
+            self._trans_conf[mode] else self._trans_conf[mode]['DROP_LAST']
+
+        mapper = build_mapper(ops, {'is_train': mode == 'TRAIN'})
+
+        worker_args = None
+        if 'WORKER_CONF' in self._trans_conf[mode]:
+            worker_args = self._trans_conf[mode]['WORKER_CONF']
+            worker_args = {k.lower(): v for k, v in worker_args.items()}
+
+        mapped_ds = map(sc, mapper, worker_args)
+        batched_ds = batch(mapped_ds, batchsize, drop_last)
+
+        trans_conf = {k.lower(): v for k, v in self._trans_conf[mode].items()}
+        need_keys = {
+            'is_padding',
+            'coarsest_stride',
+            'random_shapes',
+            'multi_scales',
+            'use_padded_im_info',
+        }
+        bm_config = {
+            key: value
+            for key, value in trans_conf.items() if key in need_keys
+        }
+
+        batched_ds = batch_map(batched_ds, bm_config)
+
+        batched_ds.reset()
+        if mode.lower() == 'train':
+            if self._cname2cid is not None:
+                logger.warn('cname2cid already set, it will be overridden')
+            self._cname2cid = sc.cname2cid
+
+        # 3, Build a reader
+        maxit = -1 if self._maxiter <= 0 else self._maxiter
+
+        def _reader():
+            n = 0
+            while True:
+                for _batch in batched_ds:
+                    yield _batch
+                    n += 1
+                    if maxit > 0 and n == maxit:
+                        return
+                batched_ds.reset()
+                if maxit <= 0:
+                    return
+
+        if hasattr(sc, 'get_imid2path'):
+            _reader.imid2path = sc.get_imid2path()
+
+        return _reader
+
+    def train(self):
+        """Build reader for training"""
+        return self._make_reader('TRAIN')
+
+    def val(self):
+        """Build reader for validation"""
+        return self._make_reader('VAL')
+
+    def test(self):
+        """Build reader for inference"""
+        return self._make_reader('TEST')
--- a/ppdet/data/source/__init__.py
+++ b/ppdet/data/source/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import copy
+
+from .roidb_source import RoiDbSource
+from .simple_source import SimpleSource
+
+
+def build_source(config):
+    """
+    Build dataset from source data, default source type is 'RoiDbSource'
+    Args:
+        config (dict): should have following structure:
+        {
+            data_cf (dict):
+                anno_file (str): label file or image list file path
+                image_dir (str): root directory for images
+                samples (int): number of samples to load, -1 means all
+                is_shuffle (bool): should samples be shuffled
+                load_img (bool): should images be loaded
+                mixup_epoch (int): parse mixup in first n epoch
+                with_background (bool): whether load background as a class
+            cname2cid (dict): the label name to id dictionary
+        }
+    """
+    if 'data_cf' in config:
+        data_cf = {k.lower(): v for k, v in config['data_cf'].items()}
+        data_cf['cname2cid'] = config['cname2cid']
+    else:
+        data_cf = config
+
+    args = copy.deepcopy(data_cf)
+    # defaut type is 'RoiDbSource'
+    source_type = 'RoiDbSource'
+    if 'type' in data_cf:
+        if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
+            source_type = 'RoiDbSource'
+        else:
+            source_type = data_cf['type']
+        del args['type']
+    if source_type == 'RoiDbSource':
+        return RoiDbSource(**args)
+    elif source_type == 'SimpleSource':
+        return SimpleSource(**args)
+    else:
+        raise ValueError('source type not supported: ' + source_type)
--- a/ppdet/data/source/coco_loader.py
+++ b/ppdet/data/source/coco_loader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+from pycocotools.coco import COCO
+
+import logging
+logger = logging.getLogger(__name__)
+
+
+def load(anno_path, sample_num=-1, with_background=True):
+    """
+    Load COCO records with annotations in json file 'anno_path'
+
+    Args:
+        anno_path (str): json file path
+        sample_num (int): number of samples to load, -1 means all
+        with_background (bool): whether load background as a class.
+                                if True, total class number will
+                                be 81. default True
+
+    Returns:
+        (records, cname2cid)
+        'records' is list of dict whose structure is:
+        {
+            'im_file': im_fname, # image file name
+            'im_id': img_id, # image id
+            'h': im_h, # height of image
+            'w': im_w, # width
+            'is_crowd': is_crowd,
+            'gt_score': gt_score,
+            'gt_class': gt_class,
+            'gt_bbox': gt_bbox,
+            'gt_poly': gt_poly,
+        }
+        'cname2cid' is a dict used to map category name to class id
+    """
+    assert anno_path.endswith('.json'), 'invalid coco annotation file: ' \
+        + anno_path
+    coco = COCO(anno_path)
+    img_ids = coco.getImgIds()
+    cat_ids = coco.getCatIds()
+    records = []
+    ct = 0
+
+    # when with_background = True, mapping category to classid, like:
+    #   background:0, first_class:1, second_class:2, ...
+    catid2clsid = dict(
+        {catid: i + int(with_background)
+         for i, catid in enumerate(cat_ids)})
+    cname2cid = dict({
+        coco.loadCats(catid)[0]['name']: clsid
+        for catid, clsid in catid2clsid.items()
+    })
+
+    for img_id in img_ids:
+        img_anno = coco.loadImgs(img_id)[0]
+        im_fname = img_anno['file_name']
+        im_w = img_anno['width']
+        im_h = img_anno['height']
+
+        ins_anno_ids = coco.getAnnIds(imgIds=img_id, iscrowd=False)
+        instances = coco.loadAnns(ins_anno_ids)
+
+        bboxes = []
+        for inst in instances:
+            x, y, box_w, box_h = inst['bbox']
+            x1 = max(0, x)
+            y1 = max(0, y)
+            x2 = min(im_w - 1, x1 + max(0, box_w - 1))
+            y2 = min(im_h - 1, y1 + max(0, box_h - 1))
+            if inst['area'] > 0 and x2 >= x1 and y2 >= y1:
+                inst['clean_bbox'] = [x1, y1, x2, y2]
+                bboxes.append(inst)
+        num_bbox = len(bboxes)
+
+        gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32)
+        gt_class = np.zeros((num_bbox, 1), dtype=np.int32)
+        gt_score = np.ones((num_bbox, 1), dtype=np.float32)
+        is_crowd = np.zeros((num_bbox, 1), dtype=np.int32)
+        difficult = np.zeros((num_bbox, 1), dtype=np.int32)
+        gt_poly = [None] * num_bbox
+
+        for i, box in enumerate(bboxes):
+            catid = box['category_id']
+            gt_class[i][0] = catid2clsid[catid]
+            gt_bbox[i, :] = box['clean_bbox']
+            is_crowd[i][0] = box['iscrowd']
+            gt_poly[i] = box['segmentation']
+
+        coco_rec = {
+            'im_file': im_fname,
+            'im_id': np.array([img_id]),
+            'h': im_h,
+            'w': im_w,
+            'is_crowd': is_crowd,
+            'gt_class': gt_class,
+            'gt_bbox': gt_bbox,
+            'gt_score': gt_score,
+            'gt_poly': gt_poly,
+            'difficult': difficult
+        }
+
+        logger.debug('Load file: {}, im_id: {}, h: {}, w: {}.'.format(
+            im_fname, img_id, im_h, im_w))
+        records.append(coco_rec)
+        ct += 1
+        if sample_num > 0 and ct >= sample_num:
+            break
+    assert len(records) > 0, 'not found any coco record in %s' % (anno_path)
+    logger.info('{} samples in file {}'.format(ct, anno_path))
+    return records, cname2cid
--- a/ppdet/data/source/loader.py
+++ b/ppdet/data/source/loader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# function:
+#   load data records from local files(maybe in COCO or VOC data formats)
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import os
+
+import numpy as np
+import logging
+import pickle as pkl
+
+logger = logging.getLogger(__name__)
+
+
+def check_records(records):
+    """ check the fields of 'records' must contains some keys
+    """
+    needed_fields = [
+        'im_file', 'im_id', 'h', 'w', 'is_crowd', 'gt_class', 'gt_bbox',
+        'gt_poly'
+    ]
+
+    for i, rec in enumerate(records):
+        for k in needed_fields:
+            assert k in rec, 'not found field[%s] in record[%d]' % (k, i)
+
+
+def load_roidb(anno_file, sample_num=-1):
+    """ load normalized data records from file 
+        'anno_file' which is a pickled file.
+        And the records should has a structure:
+        {
+            'im_file': str, # image file name
+            'im_id': int, # image id
+            'h': int, # height of image
+            'w': int, # width of image
+            'is_crowd': bool,
+            'gt_class': list of np.ndarray, # classids info
+            'gt_bbox': list of np.ndarray, # bounding box info
+            'gt_poly': list of int, # poly info
+        }
+
+    Args:
+        anno_file (str): file name for picked records
+        sample_num (int): number of samples to load
+
+    Returns:
+        list of records for detection model training
+    """
+
+    assert anno_file.endswith('.roidb'), 'invalid roidb file[%s]' % (anno_file)
+    with open(anno_file, 'rb') as f:
+        roidb = f.read()
+        # for support python3 and python2
+        try:
+            records, cname2cid = pkl.loads(roidb, encoding='bytes')
+        except:
+            records, cname2cid = pkl.loads(roidb)
+
+        assert type(records) is list, 'invalid data type from roidb'
+
+    if sample_num > 0 and sample_num < len(records):
+        records = records[:sample_num]
+
+    return records, cname2cid
+
+
+def load(fname,
+         samples=-1,
+         with_background=True,
+         with_cat2id=False,
+         use_default_label=None,
+         cname2cid=None):
+    """ Load data records from 'fnames'
+
+    Args:
+        fnames (str): file name for data record, eg:
+            instances_val2017.json or COCO17_val2017.roidb
+        samples (int): number of samples to load, default to all
+        with_background (bool): whether load background as a class.
+                                default True.
+        with_cat2id (bool): whether return cname2cid info out
+        use_default_label (bool): whether use the default mapping of label to id
+        cname2cid (dict): the mapping of category name to id
+
+    Returns:
+        list of loaded records whose structure is:
+        {
+            'im_file': str, # image file name
+            'im_id': int, # image id
+            'h': int, # height of image
+            'w': int, # width of image
+            'is_crowd': bool,
+            'gt_class': list of np.ndarray, # classids info
+            'gt_bbox': list of np.ndarray, # bounding box info
+            'gt_poly': list of int, # poly info
+        }
+
+    """
+
+    if fname.endswith('.roidb'):
+        records, cname2cid = load_roidb(fname, samples)
+    elif fname.endswith('.json'):
+        from . import coco_loader
+        records, cname2cid = coco_loader.load(fname, samples, with_background)
+    elif os.path.isfile(fname):
+        from . import voc_loader
+        if use_default_label is None or cname2cid is not None:
+            records, cname2cid = voc_loader.get_roidb(fname, samples, cname2cid,
+                                                with_background=with_background)
+        else:
+            records, cname2cid = voc_loader.load(fname, samples,
+                                                 use_default_label,
+                                                 with_background=with_background)
+    else:
+        raise ValueError('invalid file type when load data from file[%s]' %
+                         (fname))
+    check_records(records)
+    if with_cat2id:
+        return records, cname2cid
+    else:
+        return records
--- a/ppdet/data/source/roidb_source.py
+++ b/ppdet/data/source/roidb_source.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#function:
+#    interface to load data from local files and parse it for samples, 
+#    eg: roidb data in pickled files
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import os
+import random
+
+import copy
+import pickle as pkl
+from ..dataset import Dataset
+
+
+class RoiDbSource(Dataset):
+    """ interface to load roidb data from files
+    """
+
+    def __init__(self,
+                 anno_file,
+                 image_dir=None,
+                 samples=-1,
+                 is_shuffle=True,
+                 load_img=False,
+                 cname2cid=None,
+                 use_default_label=None,
+                 mixup_epoch=-1,
+                 with_background=True):
+        """ Init
+
+        Args:
+            fname (str): label file path
+            image_dir (str): root dir for images
+            samples (int): samples to load, -1 means all
+            is_shuffle (bool): whether to shuffle samples
+            load_img (bool): whether load data in this class
+            cname2cid (dict): the label name to id dictionary
+            use_default_label (bool):whether use the default mapping of label to id
+            mixup_epoch (int): parse mixup in first n epoch
+            with_background (bool): whether load background 
+                                    as a class
+        """
+        super(RoiDbSource, self).__init__()
+        self._epoch = -1
+        assert os.path.isfile(anno_file) or os.path.isdir(
+            anno_file), 'invalid file[%s] for RoiDbSource' % (anno_file)
+        self._fname = anno_file
+        self._image_dir = image_dir
+        if image_dir is not None:
+            assert os.path.isdir(image_dir), 'invalid image directory[%s]' % (
+                image_dir)
+        self._roidb = None
+        self._pos = -1
+        self._drained = False
+        self._samples = samples
+        self._is_shuffle = is_shuffle
+        self._load_img = load_img
+        self.use_default_label = use_default_label
+        self._mixup_epoch = mixup_epoch
+        self._with_background = with_background
+        self.cname2cid = cname2cid
+
+    def __str__(self):
+        return 'RoiDbSource(fname:%s,epoch:%d,size:%d,pos:%d)' \
+            % (self._fname, self._epoch, self.size(), self._pos)
+
+    def next(self):
+        """ load next sample
+        """
+        if self._epoch < 0:
+            self.reset()
+        if self._pos >= self._samples:
+            self._drained = True
+            raise StopIteration('%s no more data' % (str(self)))
+        sample = copy.deepcopy(self._roidb[self._pos])
+        if self._load_img:
+            sample['image'] = self._load_image(sample['im_file'])
+        else:
+            sample['im_file'] = os.path.join(self._image_dir, sample['im_file'])
+
+        if self._epoch < self._mixup_epoch:
+            mix_idx = random.randint(1, self._samples - 1)
+            mix_pos = (mix_idx + self._pos) % self._samples
+            sample['mixup'] = copy.deepcopy(self._roidb[mix_pos])
+            if self._load_image:
+                sample['mixup']['image'] = \
+                        self._load_image(sample['mixup']['im_file'])
+            else:
+                sample['mixup']['im_file'] = \
+                        os.path.join(self._image_dir, sample['mixup']['im_file'])
+        self._pos += 1
+        return sample
+
+    def _load(self):
+        """ load data from file
+        """
+        from . import loader
+        records, cname2cid = loader.load(self._fname, self._samples,
+                                         self._with_background, True,
+                                         self.use_default_label, self.cname2cid)
+        self.cname2cid = cname2cid
+        return records
+
+    def _load_image(self, where):
+        fn = os.path.join(self._image_dir, where)
+        with open(fn, 'rb') as f:
+            return f.read()
+
+    def reset(self):
+        """ implementation of Dataset.reset
+        """
+        if self._roidb is None:
+            self._roidb = self._load()
+
+        self._samples = len(self._roidb)
+        if self._is_shuffle:
+            random.shuffle(self._roidb)
+
+        if self._epoch < 0:
+            self._epoch = 0
+        else:
+            self._epoch += 1
+
+        self._pos = 0
+        self._drained = False
+
+    def size(self):
+        """ implementation of Dataset.size
+        """
+        return len(self._roidb)
+
+    def drained(self):
+        """ implementation of Dataset.drained
+        """
+        assert self._epoch >= 0, 'The first epoch has not begin!'
+        return self._pos >= self.size()
+
+    def epoch_id(self):
+        """ return epoch id for latest sample
+        """
+        return self._epoch
--- a/ppdet/data/source/simple_source.py
+++ b/ppdet/data/source/simple_source.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# function:
+#    interface to load data from txt file.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import numpy as np
+import copy
+from ..dataset import Dataset
+
+
+class SimpleSource(Dataset):
+    """
+    Load image files for testing purpose
+
+    Args:
+        images (list): list of path of images
+        samples (int): number of samples to load, -1 means all
+        load_img (bool): should images be loaded
+    """
+
+    def __init__(self,
+                 images=[],
+                 samples=-1,
+                 load_img=True,
+                 **kwargs):
+        super(SimpleSource, self).__init__()
+        self._epoch = -1
+        for image in images:
+            assert image != '' and os.path.isfile(image), \
+                    "Image {} not found".format(image)
+        self._images = images
+        self._fname = None
+        self._simple = None
+        self._pos = -1
+        self._drained = False
+        self._samples = samples
+        self._load_img = load_img
+        self._imid2path = {}
+
+    def next(self):
+        if self._epoch < 0:
+            self.reset()
+
+        if self._pos >= self.size():
+            self._drained = True
+            raise StopIteration("no more data in " + str(self))
+        else:
+            sample = copy.deepcopy(self._simple[self._pos])
+            if self._load_img:
+                sample['image'] = self._load_image(sample['im_file'])
+
+            self._pos += 1
+            return sample
+
+    def _load(self):
+        ct = 0
+        records = []
+        for image in self._images:
+            if self._samples > 0 and ct >= self._samples:
+                break
+            rec = {'im_id': np.array([ct]), 'im_file': image}
+            self._imid2path[ct] = image
+            ct += 1
+            records.append(rec)
+        assert len(records) > 0, "no image file found"
+        return records
+
+    def _load_image(self, where):
+        with open(where, 'rb') as f:
+            return f.read()
+
+    def reset(self):
+        if self._simple is None:
+            self._simple = self._load()
+
+        if self._epoch < 0:
+            self._epoch = 0
+        else:
+            self._epoch += 1
+
+        self._pos = 0
+        self._drained = False
+
+    def size(self):
+        return len(self._simple)
+
+    def drained(self):
+        assert self._epoch >= 0, "the first epoch has not started yet"
+        return self._pos >= self.size()
+
+    def epoch_id(self):
+        return self._epoch
+
+    def get_imid2path(self):
+        """return image id to image path map"""
+        return self._imid2path
--- a/ppdet/data/source/voc_loader.py
+++ b/ppdet/data/source/voc_loader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import numpy as np
+
+import xml.etree.ElementTree as ET
+
+
+def get_roidb(anno_path,
+              sample_num=-1,
+              cname2cid=None,
+              with_background=True):
+    """
+    Load VOC records with annotations in xml directory 'anno_path'
+
+    Notes:
+    ${anno_path}/ImageSets/Main/train.txt must contains xml file names for annotations
+    ${anno_path}/Annotations/xxx.xml must contain annotation info for one record
+
+    Args:
+        anno_path (str): root directory for voc annotation data
+        sample_num (int): number of samples to load, -1 means all
+        cname2cid (dict): the label name to id dictionary
+        with_background (bool): whether load background as a class.
+                                if True, total class number will
+                                be 81. default True
+
+    Returns:
+        (records, catname2clsid)
+        'records' is list of dict whose structure is:
+        {
+            'im_file': im_fname, # image file name
+            'im_id': im_id, # image id
+            'h': im_h, # height of image
+            'w': im_w, # width
+            'is_crowd': is_crowd,
+            'gt_class': gt_class,
+            'gt_bbox': gt_bbox,
+            'gt_poly': gt_poly,
+        }
+        'cname2id' is a dict to map category name to class id
+    """
+
+    txt_file = anno_path
+    part = txt_file.split('ImageSets')
+    xml_path = os.path.join(part[0], 'Annotations')
+    assert os.path.isfile(txt_file) and \
+        os.path.isdir(xml_path), 'invalid xml path'
+
+    records = []
+    ct = 0
+    existence = False if cname2cid is None else True
+    if cname2cid is None:
+        cname2cid = {}
+
+    # mapping category name to class id
+    # background:0, first_class:1, second_class:2, ...
+    with open(txt_file, 'r') as fr:
+        while True:
+            line = fr.readline()
+            if not line:
+                break
+            fname = line.strip() + '.xml'
+            xml_file = os.path.join(xml_path, fname)
+            if not os.path.isfile(xml_file):
+                continue
+            tree = ET.parse(xml_file)
+            im_fname = tree.find('filename').text
+            if tree.find('id') is None:
+                im_id = np.array([ct])
+            else:
+                im_id = np.array([int(tree.find('id').text)])
+
+            objs = tree.findall('object')
+            im_w = float(tree.find('size').find('width').text)
+            im_h = float(tree.find('size').find('height').text)
+            gt_bbox = np.zeros((len(objs), 4), dtype=np.float32)
+            gt_class = np.zeros((len(objs), 1), dtype=np.int32)
+            gt_score = np.ones((len(objs), 1), dtype=np.float32)
+            is_crowd = np.zeros((len(objs), 1), dtype=np.int32)
+            difficult = np.zeros((len(objs), 1), dtype=np.int32)
+            for i, obj in enumerate(objs):
+                cname = obj.find('name').text
+                if not existence and cname not in cname2cid:
+                    # the background's id is 0, so need to add 1.
+                    cname2cid[cname] = len(cname2cid) + int(with_background)
+                elif existence and cname not in cname2cid:
+                    raise KeyError(
+                        'Not found cname[%s] in cname2cid when map it to cid.' %
+                        (cname))
+                gt_class[i][0] = cname2cid[cname]
+                _difficult = int(obj.find('difficult').text)
+                x1 = float(obj.find('bndbox').find('xmin').text)
+                y1 = float(obj.find('bndbox').find('ymin').text)
+                x2 = float(obj.find('bndbox').find('xmax').text)
+                y2 = float(obj.find('bndbox').find('ymax').text)
+                x1 = max(0, x1)
+                y1 = max(0, y1)
+                x2 = min(im_w - 1, x2)
+                y2 = min(im_h - 1, y2)
+                gt_bbox[i] = [x1, y1, x2, y2]
+                is_crowd[i][0] = 0
+                difficult[i][0] = _difficult
+            voc_rec = {
+                'im_file': im_fname,
+                'im_id': im_id,
+                'h': im_h,
+                'w': im_w,
+                'is_crowd': is_crowd,
+                'gt_class': gt_class,
+                'gt_score': gt_score,
+                'gt_bbox': gt_bbox,
+                'gt_poly': [],
+                'difficult': difficult
+            }
+            if len(objs) != 0:
+                records.append(voc_rec)
+
+            ct += 1
+            if sample_num > 0 and ct >= sample_num:
+                break
+    assert len(records) > 0, 'not found any voc record in %s' % (anno_path)
+    return [records, cname2cid]
+
+
+def load(anno_path,
+         sample_num=-1,
+         use_default_label=True,
+         with_background=True):
+    """
+    Load VOC records with annotations in
+    xml directory 'anno_path'
+
+    Notes:
+    ${anno_path}/ImageSets/Main/train.txt must contains xml file names for annotations
+    ${anno_path}/Annotations/xxx.xml must contain annotation info for one record
+
+    Args:
+        @anno_path (str): root directory for voc annotation data
+        @sample_num (int): number of samples to load, -1 means all
+        @use_default_label (bool): whether use the default mapping of label to id
+        @with_background (bool): whether load background as a class.
+                                 if True, total class number will
+                                 be 81. default True
+
+    Returns:
+        (records, catname2clsid)
+        'records' is list of dict whose structure is:
+        {
+            'im_file': im_fname, # image file name
+            'im_id': im_id, # image id
+            'h': im_h, # height of image
+            'w': im_w, # width
+            'is_crowd': is_crowd,
+            'gt_class': gt_class,
+            'gt_bbox': gt_bbox,
+            'gt_poly': gt_poly,
+        }
+        'cname2id' is a dict to map category name to class id
+    """
+
+    txt_file = anno_path
+    part = txt_file.split('ImageSets')
+    xml_path = os.path.join(part[0], 'Annotations')
+    assert os.path.isfile(txt_file) and \
+        os.path.isdir(xml_path), 'invalid xml path'
+
+    # mapping category name to class id
+    # if with_background is True:
+    #   background:0, first_class:1, second_class:2, ...
+    # if with_background is False:
+    #   first_class:0, second_class:1, ...
+    records = []
+    ct = 0
+    cname2cid = {}
+    if not use_default_label:
+        label_path = os.path.join(part[0], 'ImageSets/Main/label_list.txt')
+        with open(label_path, 'r') as fr:
+            label_id = int(with_background)
+            for line in fr.readlines():
+                cname2cid[line.strip()] = label_id
+                label_id += 1
+    else:
+        cname2cid = pascalvoc_label(with_background)
+
+    with open(txt_file, 'r') as fr:
+        while True:
+            line = fr.readline()
+            if not line:
+                break
+            fname = line.strip() + '.xml'
+            xml_file = os.path.join(xml_path, fname)
+            if not os.path.isfile(xml_file):
+                continue
+            tree = ET.parse(xml_file)
+            im_fname = tree.find('filename').text
+            if tree.find('id') is None:
+                im_id = np.array([ct])
+            else:
+                im_id = np.array([int(tree.find('id').text)])
+
+            objs = tree.findall('object')
+            im_w = float(tree.find('size').find('width').text)
+            im_h = float(tree.find('size').find('height').text)
+            gt_bbox = np.zeros((len(objs), 4), dtype=np.float32)
+            gt_class = np.zeros((len(objs), 1), dtype=np.int32)
+            gt_score = np.ones((len(objs), 1), dtype=np.float32)
+            is_crowd = np.zeros((len(objs), 1), dtype=np.int32)
+            difficult = np.zeros((len(objs), 1), dtype=np.int32)
+            for i, obj in enumerate(objs):
+                cname = obj.find('name').text
+                gt_class[i][0] = cname2cid[cname]
+                _difficult = int(obj.find('difficult').text)
+                x1 = float(obj.find('bndbox').find('xmin').text)
+                y1 = float(obj.find('bndbox').find('ymin').text)
+                x2 = float(obj.find('bndbox').find('xmax').text)
+                y2 = float(obj.find('bndbox').find('ymax').text)
+                x1 = max(0, x1)
+                y1 = max(0, y1)
+                x2 = min(im_w - 1, x2)
+                y2 = min(im_h - 1, y2)
+                gt_bbox[i] = [x1, y1, x2, y2]
+                is_crowd[i][0] = 0
+                difficult[i][0] = _difficult
+            voc_rec = {
+                'im_file': im_fname,
+                'im_id': im_id,
+                'h': im_h,
+                'w': im_w,
+                'is_crowd': is_crowd,
+                'gt_class': gt_class,
+                'gt_score': gt_score,
+                'gt_bbox': gt_bbox,
+                'gt_poly': [],
+                'difficult': difficult
+            }
+            if len(objs) != 0:
+                records.append(voc_rec)
+
+            ct += 1
+            if sample_num > 0 and ct >= sample_num:
+                break
+    assert len(records) > 0, 'not found any voc record in %s' % (anno_path)
+    return [records, cname2cid]
+
+
+def pascalvoc_label(with_background=True):
+    labels_map = {
+	'aeroplane': 1,
+	'bicycle': 2,
+	'bird': 3,
+	'boat': 4,
+	'bottle': 5,
+	'bus': 6,
+	'car': 7,
+	'cat': 8,
+	'chair': 9,
+	'cow': 10,
+	'diningtable': 11,
+	'dog': 12,
+	'horse': 13,
+	'motorbike': 14,
+	'person': 15,
+	'pottedplant': 16,
+	'sheep': 17,
+	'sofa': 18,
+	'train': 19,
+	'tvmonitor': 20
+    }
+    if not with_background:
+        labels_map = {k: v - 1 for k, v in labels_map.items()}
+    return labels_map
--- a/ppdet/data/tests/000012.jpg
+++ b/ppdet/data/tests/000012.jpg
--- a/ppdet/data/tests/coco.yml
+++ b/ppdet/data/tests/coco.yml
+DATA:
+    TRAIN:
+        ANNO_FILE: data/coco.test/train2017.roidb
+        IMAGE_DIR: data/coco.test/train2017
+        SAMPLES: 10
+        TYPE: RoiDbSource
+    VAL: 
+        ANNO_FILE: data/coco.test/val2017.roidb
+        IMAGE_DIR: data/coco.test/val2017
+        SAMPLES: 10
+        TYPE: RoiDbSource
+TRANSFORM:
+    TRAIN:
+        OPS:
+            - OP: DecodeImage
+              TO_RGB: False
+            - OP: RandomFlipImage
+              PROB: 0.5
+            - OP: NormalizeImage
+              MEAN: [102.9801, 115.9465, 122.7717]
+              IS_SCALE: False
+              IS_CHANNEL_FIRST: False
+            - OP: ResizeImage
+              TARGET_SIZE: 800
+              MAX_SIZE: 1333
+            - OP: Rgb2Bgr
+              TO_BGR: False
+            - OP: ArrangeRCNN
+        BATCH_SIZE: 1
+        IS_PADDING: True
+        DROP_LAST: False
+        
+    VAL:
+        OPS:
+            - OP: DecodeImage
+              TO_RGB: True
+            - OP: ResizeImage
+              TARGET_SIZE: 224
+            - OP: ArrangeSSD
+        BATCH_SIZE: 1
+    WORKER_CONF:
+        BUFSIZE: 200
+        WORKER_NUM: 8
+        USE_PROCESS: False
--- a/ppdet/data/tests/data/prepare_data.sh
+++ b/ppdet/data/tests/data/prepare_data.sh
+#!/bin/bash
+
+#function:
+#   prepare coco data for testing
+
+root=$(dirname `readlink -f ${BASH_SOURCE}[0]`)
+cwd=`pwd`
+
+if [[ $cwd != $root ]];then
+    pushd $root 2>&1 1>/dev/null
+fi
+
+test_coco_python2_url="http://filecenter.matrix.baidu.com/api/v1/file/wanglong03/coco.test.python2.zip/20190603095315/download"
+test_coco_python3_url="http://filecenter.matrix.baidu.com/api/v1/file/wanglong03/coco.test.python3.zip/20190603095447/download"
+
+if [[ $1 = "python2" ]];then
+    test_coco_data_url=${test_coco_python2_url}
+    coco_zip_file="coco.test.python2.zip"
+else
+    test_coco_data_url=${test_coco_python3_url}
+    coco_zip_file="coco.test.python3.zip"
+fi
+echo "download testing coco from url[${test_coco_data_url}]"
+coco_root_dir=${coco_zip_file/.zip/}
+
+# clear already exist file or directory
+rm -rf ${coco_root_dir} ${coco_zip_file}
+
+wget ${test_coco_data_url} -O ${coco_zip_file}
+if [ -e $coco_zip_file ];then
+    echo "succeed to download ${coco_zip_file}, so unzip it"
+    unzip ${coco_zip_file} >/dev/null 2>&1
+fi
+
+if [ -e ${coco_root_dir} ];then
+    rm -rf coco.test
+    ln -s ${coco_root_dir} coco.test
+    echo "succeed to generate coco data in[${coco_root_dir}] for testing"
+    exit 0
+else
+    echo "failed to generate coco data"
+    exit 1
+fi
--- a/ppdet/data/tests/rcnn_dataset.yml
+++ b/ppdet/data/tests/rcnn_dataset.yml
+DATA:
+    TRAIN:
+        ANNO_FILE: data/coco.test/train2017.roidb
+        IMAGE_DIR: data/coco.test/train2017
+        SAMPLES: 10
+        IS_SHUFFLE: True
+        TYPE: RoiDbSource
+TRANSFORM:
+    TRAIN:
+        OPS:
+            - OP: DecodeImage
+              TO_RGB: False
+            - OP: RandomFlipImage
+              PROB: 0.5
+            - OP: NormalizeImage
+              MEAN: [102.9801, 115.9465, 122.7717]
+              IS_SCALE: False
+              IS_CHANNEL_FIRST: False
+            - OP: ResizeImage
+              TARGET_SIZE: 800
+              MAX_SIZE: 1333
+            - OP: Rgb2Bgr
+              TO_BGR: False
+            - OP: ArrangeRCNN
+        BATCH_SIZE: 1
+        IS_PADDING: True
+        DROP_LAST: False
+    WORKER_CONF:
+        BUFSIZE: 10
+        WORKER_NUM: 2
--- a/ppdet/data/tests/run_all_tests.py
+++ b/ppdet/data/tests/run_all_tests.py
+#!/usr/bin/python
+#-*-coding:utf-8-*-
+"""Run all tests
+"""
+
+import unittest
+import test_loader
+import test_operator
+import test_roidb_source
+import test_transformer
+import test_reader
+
+if __name__ == '__main__':
+    alltests = unittest.TestSuite([
+        unittest.TestLoader().loadTestsFromTestCase(t) \
+        for t in [
+            test_loader.TestLoader,
+            test_operator.TestBase,
+            test_roidb_source.TestRoiDbSource,
+            test_transformer.TestTransformer,
+            test_reader.TestReader,
+        ]
+    ])
+
+    was_succ = unittest\
+                .TextTestRunner(verbosity=2)\
+                .run(alltests)\
+                .wasSuccessful()
+
+    exit(0 if was_succ else 1)
--- a/ppdet/data/tests/set_env.py
+++ b/ppdet/data/tests/set_env.py
+import sys
+import os
+import six
+import logging
+
+path = os.path.join(os.path.dirname(os.path.abspath(__file__)), '../../')
+if path not in sys.path:
+    sys.path.insert(0, path)
+
+prefix = os.path.dirname(os.path.abspath(__file__))
+
+#coco data for testing
+if six.PY3:
+    version = 'python3'
+else:
+    version = 'python2'
+
+data_root = os.path.join(prefix, 'data/coco.test.%s' % (version))
+
+# coco data for testing
+coco_data = {
+    'TRAIN': {
+        'ANNO_FILE': os.path.join(data_root, 'train2017.roidb'),
+        'IMAGE_DIR': os.path.join(data_root, 'train2017')
+    },
+    'VAL': {
+        'ANNO_FILE': os.path.join(data_root, 'val2017.roidb'),
+        'IMAGE_DIR': os.path.join(data_root, 'val2017')
+    }
+}
+
+script = os.path.join(os.path.dirname(__file__), 'data/prepare_data.sh')
+
+if not os.path.exists(data_root):
+    ret = os.system('bash %s %s' % (script, version))
+    if ret != 0:
+        logging.error('not found file[%s], you should manually prepare '
+                      'your data using "data/prepare_data.sh"' % (data_root))
+        sys.exit(1)
--- a/ppdet/data/tests/test_loader.py
+++ b/ppdet/data/tests/test_loader.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+import os
+import time
+import unittest
+import sys
+import logging
+import numpy as np
+
+import set_env
+
+
+class TestLoader(unittest.TestCase):
+    """Test cases for dataset.source.loader
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        """ setup
+        """
+        cls.prefix = os.path.dirname(os.path.abspath(__file__))
+        # json data
+        cls.anno_path = os.path.join(cls.prefix,
+                                     'data/coco/instances_val2017.json')
+        cls.image_dir = os.path.join(cls.prefix, 'data/coco/val2017')
+        cls.anno_path1 = os.path.join(cls.prefix,
+                                      "data/voc/ImageSets/Main/train.txt")
+        cls.image_dir1 = os.path.join(cls.prefix, "data/voc/JPEGImages")
+
+    @classmethod
+    def tearDownClass(cls):
+        """ tearDownClass """
+        pass
+
+    def test_load_coco_in_json(self):
+        """ test loading COCO data in json file
+        """
+        from data.source.coco_loader import load
+        if not os.path.exists(self.anno_path):
+            logging.warn('not found %s, so skip this test' % (self.anno_path))
+            return
+        samples = 10
+        records, cname2id = load(self.anno_path, samples)
+        self.assertEqual(len(records), samples)
+        self.assertGreater(len(cname2id), 0)
+
+    def test_load_coco_in_roidb(self):
+        """ test loading COCO data in pickled records
+        """
+        anno_path = os.path.join(self.prefix,
+                                 'data/roidbs/instances_val2017.roidb')
+
+        if not os.path.exists(anno_path):
+            logging.warn('not found %s, so skip this test' % (anno_path))
+            return
+
+        samples = 10
+        from data.source.loader import load_roidb
+        records, cname2cid = load_roidb(anno_path, samples)
+        self.assertEqual(len(records), samples)
+        self.assertGreater(len(cname2cid), 0)
+
+    def test_load_voc_in_xml(self):
+        """ test loading VOC data in xml files
+        """
+        from data.source.voc_loader import load
+        if not os.path.exists(self.anno_path1):
+            logging.warn('not found %s, so skip this test' % (self.anno_path1))
+            return
+        samples = 3
+        records, cname2cid = load(self.anno_path1, samples)
+        self.assertEqual(len(records), samples)
+        self.assertGreater(len(cname2cid), 0)
+
+    def test_load_voc_in_roidb(self):
+        """ test loading VOC data in pickled records
+        """
+        anno_path = os.path.join(self.prefix, 'data/roidbs/train.roidb')
+
+        if not os.path.exists(anno_path):
+            logging.warn('not found %s, so skip this test' % (anno_path))
+            return
+
+        samples = 3
+        from loader import load_roidb
+        records, cname2cid = load_roidb(anno_path, samples)
+        self.assertEqual(len(records), samples)
+        self.assertGreater(len(cname2cid), 0)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/ppdet/data/tests/test_operator.py
+++ b/ppdet/data/tests/test_operator.py
+import os
+import unittest
+import logging
+import numpy as np
+import set_env
+from data import transform as tf
+logging.basicConfig(level=logging.INFO)
+
+
+class TestBase(unittest.TestCase):
+    """Test cases for dataset.transform.operator
+    """
+
+    @classmethod
+    def setUpClass(cls, with_mixup=False):
+        """ setup
+        """
+        roidb_fname = set_env.coco_data['TRAIN']['ANNO_FILE']
+        image_dir = set_env.coco_data['TRAIN']['IMAGE_DIR']
+        import pickle as pkl
+        with open(roidb_fname, 'rb') as f:
+            roidb = f.read()
+            roidb = pkl.loads(roidb)
+        fn = os.path.join(image_dir, roidb[0][0]['im_file'])
+        with open(fn, 'rb') as f:
+            roidb[0][0]['image'] = f.read()
+        if with_mixup:
+            mixup_fn = os.path.join(image_dir, roidb[0][1]['im_file'])
+            roidb[0][0]['mixup'] = roidb[0][1]
+            with open(fn, 'rb') as f:
+                roidb[0][0]['mixup']['image'] = f.read()
+        cls.sample = roidb[0][0]
+
+    @classmethod
+    def tearDownClass(cls):
+        """ tearDownClass """
+        pass
+
+    def test_ops_all(self):
+        """ test operators
+        """
+        # ResizeImage
+        ops_conf = [{
+            'op': 'DecodeImage'
+        }, {
+            'op': 'ResizeImage',
+            'target_size': 300,
+            'max_size': 1333
+        }]
+        mapper = tf.build(ops_conf)
+        self.assertTrue(mapper is not None)
+        data = self.sample.copy()
+        result0 = mapper(data)
+        self.assertIsNotNone(result0['image'])
+        self.assertEqual(len(result0['image'].shape), 3)
+        # RandFlipImage
+        ops_conf = [{'op': 'RandomFlipImage'}]
+        mapper = tf.build(ops_conf)
+        self.assertTrue(mapper is not None)
+        result1 = mapper(result0)
+        self.assertEqual(result1['image'].shape, result0['image'].shape)
+        self.assertEqual(result1['gt_bbox'].shape, result0['gt_bbox'].shape)
+        # NormalizeImage
+        ops_conf = [{'op': 'NormalizeImage', 'is_channel_first': False}]
+        mapper = tf.build(ops_conf)
+        self.assertTrue(mapper is not None)
+        result2 = mapper(result1)
+        im1 = result1['image']
+        count = np.where(im1 <= 1)[0]
+        if im1.dtype == 'float64':
+            self.assertEqual(count, im1.shape[0] * im1.shape[1], im1.shape[2])
+        # ArrangeSample
+        ops_conf = [{'op': 'ArrangeRCNN'}]
+        mapper = tf.build(ops_conf)
+        self.assertTrue(mapper is not None)
+        result3 = mapper(result2)
+        self.assertEqual(type(result3), tuple)
+
+    def test_ops_part1(self):
+        """test Crop and Resize
+        """
+        ops_conf = [{
+            'op': 'DecodeImage'
+        }, {
+            'op': 'NormalizeBox'
+        }, {
+            'op': 'CropImage',
+            'batch_sampler': [[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
+                              [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 0.0],
+                              [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 0.0],
+                              [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 0.0],
+                              [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 0.0],
+                              [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 0.0],
+                              [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
+        }]
+        mapper = tf.build(ops_conf)
+        self.assertTrue(mapper is not None)
+        data = self.sample.copy()
+        result = mapper(data)
+        self.assertEqual(len(result['image'].shape), 3)
+
+    def test_ops_part2(self):
+        """test Expand and RandomDistort
+        """
+        ops_conf = [{
+            'op': 'DecodeImage'
+        }, {
+            'op': 'NormalizeBox'
+        }, {
+            'op': 'ExpandImage',
+            'max_ratio': 1.5,
+            'prob': 1
+        }]
+        mapper = tf.build(ops_conf)
+        self.assertTrue(mapper is not None)
+        data = self.sample.copy()
+        result = mapper(data)
+        self.assertEqual(len(result['image'].shape), 3)
+        self.assertGreater(result['gt_bbox'].shape[0], 0)
+
+    def test_ops_part3(self):
+        """test Mixup and RandomInterp
+        """
+        ops_conf = [{
+            'op': 'DecodeImage',
+            'with_mixup': True,
+        }, {
+            'op': 'MixupImage',
+        }, {
+            'op': 'RandomInterpImage',
+            'target_size': 608
+        }]
+        mapper = tf.build(ops_conf)
+        self.assertTrue(mapper is not None)
+        data = self.sample.copy()
+        result = mapper(data)
+        self.assertEqual(len(result['image'].shape), 3)
+        self.assertGreater(result['gt_bbox'].shape[0], 0)
+        #self.assertGreater(result['gt_score'].shape[0], 0)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/ppdet/data/tests/test_reader.py
+++ b/ppdet/data/tests/test_reader.py
+import os
+import time
+import unittest
+import sys
+import logging
+import numpy as np
+import yaml
+
+import set_env
+from data import Reader
+
+
+class TestReader(unittest.TestCase):
+    """Test cases for dataset.reader
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        """ setup
+        """
+        prefix = os.path.dirname(os.path.abspath(__file__))
+        coco_yml = os.path.join(prefix, 'coco.yml')
+        with open(coco_yml, 'rb') as f:
+            cls.coco_conf = yaml.load(f.read())
+
+        cls.coco_conf['DATA']['TRAIN'] = set_env.coco_data['TRAIN']
+        cls.coco_conf['DATA']['VAL'] = set_env.coco_data['VAL']
+
+        rcnn_yml = os.path.join(prefix, 'rcnn_dataset.yml')
+
+        with open(rcnn_yml, 'rb') as f:
+            cls.rcnn_conf = yaml.load(f.read())
+
+        cls.rcnn_conf['DATA']['TRAIN'] = set_env.coco_data['TRAIN']
+        cls.rcnn_conf['DATA']['VAL'] = set_env.coco_data['VAL']
+
+    @classmethod
+    def tearDownClass(cls):
+        """ tearDownClass """
+        pass
+
+    def test_train(self):
+        """ Test reader for training
+        """
+        coco = Reader(
+            self.coco_conf['DATA'], self.coco_conf['TRANSFORM'], maxiter=1000)
+        train_rd = coco.train()
+        self.assertTrue(train_rd is not None)
+
+        ct = 0
+        total = 0
+        bytes = 0
+        prev_ts = None
+        for sample in train_rd():
+            if prev_ts is None:
+                start_ts = time.time()
+                prev_ts = start_ts
+
+            ct += 1
+            bytes += 4 * sample[0][0].size * len(sample[0])
+            self.assertTrue(sample is not None)
+            cost = time.time() - prev_ts
+            if cost >= 1.0:
+                total += ct
+                qps = total / (time.time() - start_ts)
+                bps = bytes / (time.time() - start_ts)
+
+                logging.info('got %d/%d samples in %.3fsec with qps:%d bps:%d' %
+                             (ct, total, cost, qps, bps))
+                bytes = 0
+                ct = 0
+                prev_ts = time.time()
+
+        total += ct
+        self.assertEqual(total, coco._maxiter)
+
+    def test_val(self):
+        """ Test reader for validation
+        """
+        coco = Reader(self.coco_conf['DATA'], self.coco_conf['TRANSFORM'], 10)
+        val_rd = coco.val()
+        self.assertTrue(val_rd is not None)
+
+        # test 3 epoches
+        for _ in range(3):
+            ct = 0
+            for sample in val_rd():
+                ct += 1
+                self.assertTrue(sample is not None)
+            self.assertGreaterEqual(ct, coco._maxiter)
+
+    def test_rcnn_train(self):
+        """ Test reader for training
+        """
+        anno = self.rcnn_conf['DATA']['TRAIN']['ANNO_FILE']
+        if not os.path.exists(anno):
+            logging.error('exit test_rcnn for not found file[%s]' % (anno))
+            return
+
+        rcnn = Reader(self.rcnn_conf['DATA'], self.rcnn_conf['TRANSFORM'], 10)
+        rcnn_rd = rcnn.train()
+        self.assertTrue(rcnn_rd is not None)
+
+        ct = 0
+        out = None
+        for sample in rcnn_rd():
+            out = sample
+            ct += 1
+            self.assertTrue(sample is not None)
+        self.assertEqual(out[0][0].shape[0], 3)
+        self.assertEqual(out[0][1].shape[0], 3)
+        self.assertEqual(out[0][3].shape[1], 4)
+        self.assertEqual(out[0][4].shape[1], 1)
+        self.assertEqual(out[0][5].shape[1], 1)
+        self.assertGreaterEqual(ct, rcnn._maxiter)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/ppdet/data/tests/test_roidb_source.py
+++ b/ppdet/data/tests/test_roidb_source.py
+import os
+import time
+import unittest
+import sys
+import logging
+
+import set_env
+from data import build_source
+
+
+class TestRoiDbSource(unittest.TestCase):
+    """Test cases for dataset.source.roidb_source
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        """ setup
+        """
+        anno_path = set_env.coco_data['TRAIN']['ANNO_FILE']
+        image_dir = set_env.coco_data['TRAIN']['IMAGE_DIR']
+        cls.config = {
+            'data_cf': {
+                'anno_file': anno_path,
+                'image_dir': image_dir,
+                'samples': 100,
+                'load_img': True
+            },
+            'cname2cid': None
+        }
+
+    @classmethod
+    def tearDownClass(cls):
+        """ tearDownClass """
+        pass
+
+    def test_basic(self):
+        """ test basic apis 'next/size/drained'
+        """
+        roi_source = build_source(self.config)
+        for i, sample in enumerate(roi_source):
+            self.assertTrue('image' in sample)
+            self.assertGreater(len(sample['image']), 0)
+        self.assertTrue(roi_source.drained())
+        self.assertEqual(i + 1, roi_source.size())
+
+    def test_reset(self):
+        """ test functions 'reset/epoch_id'
+        """
+        roi_source = build_source(self.config)
+
+        self.assertTrue(roi_source.next() is not None)
+        self.assertEqual(roi_source.epoch_id(), 0)
+
+        roi_source.reset()
+
+        self.assertEqual(roi_source.epoch_id(), 1)
+        self.assertTrue(roi_source.next() is not None)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/ppdet/data/tests/test_transformer.py
+++ b/ppdet/data/tests/test_transformer.py
+import os
+import time
+import unittest
+import sys
+import logging
+import numpy as np
+
+import set_env
+from data import build_source
+from data import transform as tf
+
+logger = logging.getLogger(__name__)
+
+logging.basicConfig(level=logging.INFO)
+
+
+class TestTransformer(unittest.TestCase):
+    """Test cases for dataset.transform.transformer
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        """ setup
+        """
+
+        prefix = os.path.dirname(os.path.abspath(__file__))
+        # json data
+        anno_path = set_env.coco_data['TRAIN']['ANNO_FILE']
+        image_dir = set_env.coco_data['TRAIN']['IMAGE_DIR']
+        cls.sc_config = {
+            'anno_file': anno_path,
+            'image_dir': image_dir,
+            'samples': 200
+        }
+
+        cls.ops = [{
+            'op': 'DecodeImage',
+            'to_rgb': True
+        }, {
+            'op': 'ResizeImage',
+            'target_size': 800,
+            'max_size': 1333
+        }, {
+            'op': 'ArrangeRCNN',
+            'is_mask': False
+        }]
+
+    @classmethod
+    def tearDownClass(cls):
+        """ tearDownClass """
+        pass
+
+    def test_map(self):
+        """ test transformer.map
+        """
+        mapper = tf.build(self.ops)
+        ds = build_source(self.sc_config)
+        mapped_ds = tf.map(ds, mapper)
+        ct = 0
+        for sample in mapped_ds:
+            self.assertTrue(type(sample[0]) is np.ndarray)
+            ct += 1
+
+        self.assertEqual(ct, mapped_ds.size())
+
+    def test_parallel_map(self):
+        """ test transformer.map with concurrent workers
+        """
+        mapper = tf.build(self.ops)
+        ds = build_source(self.sc_config)
+        worker_conf = {'WORKER_NUM': 2, 'use_process': True}
+        mapped_ds = tf.map(ds, mapper, worker_conf)
+
+        ct = 0
+        for sample in mapped_ds:
+            self.assertTrue(type(sample[0]) is np.ndarray)
+            ct += 1
+
+        self.assertTrue(mapped_ds.drained())
+        self.assertEqual(ct, mapped_ds.size())
+        mapped_ds.reset()
+
+        ct = 0
+        for sample in mapped_ds:
+            self.assertTrue(type(sample[0]) is np.ndarray)
+            ct += 1
+
+        self.assertEqual(ct, mapped_ds.size())
+
+    def test_batch(self):
+        """ test batched dataset
+        """
+        batchsize = 2
+        mapper = tf.build(self.ops)
+        ds = build_source(self.sc_config)
+        mapped_ds = tf.map(ds, mapper)
+        batched_ds = tf.batch(mapped_ds, batchsize, True)
+        for sample in batched_ds:
+            out = sample
+        self.assertEqual(len(out), batchsize)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/ppdet/data/tools/generate_data_for_training.py
+++ b/ppdet/data/tools/generate_data_for_training.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# function:
+#   tool used convert COCO or VOC data to a pickled file whose
+#   schema for each sample is the same.
+#
+# notes:
+#   Original data format of COCO or VOC can also be directly
+#   used by 'PPdetection' to train.
+#   This tool just convert data to a unified schema,
+#   and it's useful when debuging with small dataset.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import argparse
+
+import os
+import sys
+import logging
+import pickle as pkl
+
+path = os.path.join(os.path.dirname(os.path.abspath(__file__)), '../../')
+if path not in sys.path:
+    sys.path.insert(0, path)
+
+from data.source import loader
+
+
+def parse_args():
+    """ parse arguments
+    """
+    parser = argparse.ArgumentParser(
+        description='Generate Standard Dataset for PPdetection')
+
+    parser.add_argument(
+        '--type',
+        type=str,
+        default='json',
+        help='file format of label file, eg: json for COCO and xml for VOC')
+    parser.add_argument(
+        '--annotation',
+        type=str,
+        help='label file name for COCO or VOC dataset, '
+        'eg: instances_val2017.json or train.txt')
+    parser.add_argument(
+        '--save-dir',
+        type=str,
+        default='roidb',
+        help='directory to save roidb file which contains pickled samples')
+    parser.add_argument(
+        '--samples',
+        type=int,
+        default=-1,
+        help='number of samples to dump, default to all')
+
+    args = parser.parse_args()
+    return args
+
+
+def dump_coco_as_pickle(args):
+    """ Load COCO data, and then save it as pickled file.
+
+        Notes:
+            label file of COCO contains a json which consists
+            of label info for each sample
+    """
+    samples = args.samples
+    save_dir = args.save_dir
+    if not os.path.exists(save_dir):
+        os.makedirs(save_dir)
+    anno_path = args.annotation
+    roidb, cat2id = loader.load(anno_path, samples, with_cat2id=True)
+    samples = len(roidb)
+    dsname = os.path.basename(anno_path).rstrip('.json')
+    roidb_fname = save_dir + "/%s.roidb" % (dsname)
+    with open(roidb_fname, "wb") as fout:
+        pkl.dump((roidb, cat2id), fout)
+
+    #for rec in roidb:
+    #    sys.stderr.write('%s\n' % (rec['im_file']))
+    logging.info('dumped %d samples to file[%s]' % (samples, roidb_fname))
+
+
+def dump_voc_as_pickle(args):
+    """ Load VOC data, and then save it as pickled file.
+
+        Notes:
+            we assume label file of VOC contains lines
+            each of which corresponds to a xml file
+            that contains it's label info
+    """
+    samples = args.samples
+    save_dir = args.save_dir
+    if not os.path.exists(save_dir):
+        os.makedirs(save_dir)
+    save_dir = args.save_dir
+    anno_path = os.path.expanduser(args.annotation)
+    roidb, cat2id = loader.load(
+        anno_path, samples, with_cat2id=True, use_default_label=None)
+    samples = len(roidb)
+    part = anno_path.split('/')
+    dsname = part[-4]
+    roidb_fname = save_dir + "/%s.roidb" % (dsname)
+    with open(roidb_fname, "wb") as fout:
+        pkl.dump((roidb, cat2id), fout)
+    anno_path = os.path.join(anno_path.split('/train.txt')[0], 'label_list.txt')
+    with open(anno_path, 'w') as fw:
+        for key in cat2id.keys():
+            fw.write(key + '\n')
+    logging.info('dumped %d samples to file[%s]' % (samples, roidb_fname))
+
+
+if __name__ == "__main__":
+    """ Make sure you have already downloaded original COCO or VOC data,
+        then you can convert it using this tool.
+
+    Usage:
+        python generate_data_for_training.py --type=json
+            --annotation=./annotations/instances_val2017.json
+            --save-dir=./roidb --samples=100
+    """
+    args = parse_args()
+
+    # VOC data are organized in xml files
+    if args.type == 'xml':
+        dump_voc_as_pickle(args)
+    # COCO data are organized in json file
+    elif args.type == 'json':
+        dump_coco_as_pickle(args)
+    else:
+        TypeError('Can\'t deal with {} type. '\
+            'Only xml or json file format supported'.format(args.type))
--- a/ppdet/data/tools/labelme2coco.py
+++ b/ppdet/data/tools/labelme2coco.py
+#!/usr/bin/env python
+# coding: utf-8
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import glob
+import json
+import os
+import os.path as osp
+import sys
+import shutil
+
+import numpy as np
+import PIL.ImageDraw
+
+
+class MyEncoder(json.JSONEncoder):
+    def default(self, obj):
+        if isinstance(obj, np.integer):
+            return int(obj)
+        elif isinstance(obj, np.floating):
+            return float(obj)
+        elif isinstance(obj, np.ndarray):
+            return obj.tolist()
+        else:
+            return super(MyEncoder, self).default(obj)
+
+
+def getbbox(self, points):
+    polygons = points
+    mask = self.polygons_to_mask([self.height, self.width], polygons)
+    return self.mask2box(mask)
+
+
+def images(data, num):
+    image = {}
+    image['height'] = data['imageHeight']
+    image['width'] = data['imageWidth']
+    image['id'] = num + 1
+    image['file_name'] = data['imagePath'].split('/')[-1]
+    return image
+
+
+def categories(label, labels_list):
+    category = {}
+    category['supercategory'] = 'component'
+    category['id'] = len(labels_list) + 1
+    category['name'] = label
+    return category
+
+
+def annotations_rectangle(points, label, num, label_to_num):
+    annotation = {}
+    seg_points = np.asarray(points).copy()
+    seg_points[1, :] = np.asarray(points)[2, :]
+    seg_points[2, :] = np.asarray(points)[1, :]
+    annotation['segmentation'] = [list(seg_points.flatten())]
+    annotation['iscrowd'] = 0
+    annotation['image_id'] = num + 1
+    annotation['bbox'] = list(
+        map(float, [
+            points[0][0], points[0][1], points[1][0] - points[0][0], points[1][
+                1] - points[0][1]
+        ]))
+    annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
+    annotation['category_id'] = label_to_num[label]
+    annotation['id'] = num + 1
+    return annotation
+
+
+def annotations_polygon(height, width, points, label, num, label_to_num):
+    annotation = {}
+    annotation['segmentation'] = [list(np.asarray(points).flatten())]
+    annotation['iscrowd'] = 0
+    annotation['image_id'] = num + 1
+    annotation['bbox'] = list(map(float, get_bbox(height, width, points)))
+    annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
+    annotation['category_id'] = label_to_num[label]
+    annotation['id'] = num + 1
+    return annotation
+
+
+def get_bbox(height, width, points):
+    polygons = points
+    mask = np.zeros([height, width], dtype=np.uint8)
+    mask = PIL.Image.fromarray(mask)
+    xy = list(map(tuple, polygons))
+    PIL.ImageDraw.Draw(mask).polygon(xy=xy, outline=1, fill=1)
+    mask = np.array(mask, dtype=bool)
+    index = np.argwhere(mask == 1)
+    rows = index[:, 0]
+    clos = index[:, 1]
+    left_top_r = np.min(rows)
+    left_top_c = np.min(clos)
+    right_bottom_r = np.max(rows)
+    right_bottom_c = np.max(clos)
+    return [
+        left_top_c, left_top_r, right_bottom_c - left_top_c,
+        right_bottom_r - left_top_r
+    ]
+
+
+def deal_json(img_path, json_path):
+    data_coco = {}
+    label_to_num = {}
+    images_list = []
+    categories_list = []
+    annotations_list = []
+    labels_list = []
+    num = -1
+    for img_file in os.listdir(img_path):
+        img_label = img_file.split('.')[0]
+        label_file = osp.join(json_path, img_label + '.json')
+        print('Generating dataset from:', label_file)
+        num = num + 1
+        with open(label_file) as f:
+            data = json.load(f)
+            images_list.append(images(data, num))
+            for shapes in data['shapes']:
+                label = shapes['label']
+                if label not in labels_list:
+                    categories_list.append(categories(label, labels_list))
+                    labels_list.append(label)
+                    label_to_num[label] = len(labels_list)
+                points = shapes['points']
+                p_type = shapes['shape_type']
+                if p_type == 'polygon':
+                    annotations_list.append(
+                        annotations_polygon(data['imageHeight'], data[
+                            'imageWidth'], points, label, num, label_to_num))
+
+                if p_type == 'rectangle':
+                    points.append([points[0][0], points[1][1]])
+                    points.append([points[1][0], points[0][1]])
+                    annotations_list.append(
+                        annotations_rectangle(points, label, num, label_to_num))
+    data_coco['images'] = images_list
+    data_coco['categories'] = categories_list
+    data_coco['annotations'] = annotations_list
+    return data_coco
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument('--json_input_dir', help='input annotated directory')
+    parser.add_argument('--image_input_dir', help='image directory')
+    parser.add_argument(
+        '--output_dir', help='output dataset directory', default='../../../')
+    parser.add_argument(
+        '--train_proportion',
+        help='the proportion of train dataset',
+        type=float,
+        default=1.0)
+    parser.add_argument(
+        '--val_proportion',
+        help='the proportion of validation dataset',
+        type=float,
+        default=0.0)
+    parser.add_argument(
+        '--test_proportion',
+        help='the proportion of test dataset',
+        type=float,
+        default=0.0)
+    args = parser.parse_args()
+    try:
+        assert os.path.exists(args.json_input_dir)
+    except AssertionError as e:
+        print('The json folder does not exist!')
+        os._exit(0)
+    try:
+        assert os.path.exists(args.image_input_dir)
+    except AssertionError as e:
+        print('The image folder does not exist!')
+        os._exit(0)
+    try:
+        assert args.train_proportion + args.val_proportion + args.test_proportion == 1.0
+    except AssertionError as e:
+        print(
+            'The sum of pqoportion of training, validation and test datase must be 1!'
+        )
+        os._exit(0)
+
+    # Allocate the dataset.
+    total_num = len(glob.glob(osp.join(args.json_input_dir, '*.json')))
+    if args.train_proportion != 0:
+        train_num = int(total_num * args.train_proportion)
+        os.makedirs(args.output_dir + '/train')
+    else:
+        train_num = 0
+    if args.val_proportion == 0.0:
+        val_num = 0
+        test_num = total_num - train_num
+        if args.test_proportion != 0.0:
+            os.makedirs(args.output_dir + '/test')
+    else:
+        val_num = int(total_num * args.val_proportion)
+        test_num = total_num - train_num - val_num
+        os.makedirs(args.output_dir + '/val')
+        if args.test_proportion != 0.0:
+            os.makedirs(args.output_dir + '/test')
+    count = 1
+    for img_name in os.listdir(args.image_input_dir):
+        if count <= train_num:
+            shutil.copyfile(
+                osp.join(args.image_input_dir, img_name),
+                osp.join(args.output_dir + '/train/', img_name))
+        else:
+            if count <= train_num + val_num:
+                shutil.copyfile(
+                    osp.join(args.image_input_dir, img_name),
+                    osp.join(args.output_dir + '/val/', img_name))
+            else:
+                shutil.copyfile(
+                    osp.join(args.image_input_dir, img_name),
+                    osp.join(args.output_dir + '/test/', img_name))
+        count = count + 1
+
+    # Deal with the json files.
+    if not os.path.exists(args.output_dir + '/annotations'):
+        os.makedirs(args.output_dir + '/annotations')
+    if args.train_proportion != 0:
+        train_data_coco = deal_json(args.output_dir + '/train',
+                                    args.json_input_dir)
+        train_json_path = osp.join(args.output_dir + '/annotations',
+                                   'instance_train.json')
+        json.dump(
+            train_data_coco,
+            open(train_json_path, 'w'),
+            indent=4,
+            cls=MyEncoder)
+    if args.val_proportion != 0:
+        val_data_coco = deal_json(args.output_dir + '/val', args.json_input_dir)
+        val_json_path = osp.join(args.output_dir + '/annotations',
+                                 'instance_val.json')
+        json.dump(
+            val_data_coco, open(val_json_path, 'w'), indent=4, cls=MyEncoder)
+    if args.test_proportion != 0:
+        test_data_coco = deal_json(args.output_dir + '/test',
+                                   args.json_input_dir)
+        test_json_path = osp.join(args.output_dir + '/annotations',
+                                  'instance_test.json')
+        json.dump(
+            test_data_coco, open(test_json_path, 'w'), indent=4, cls=MyEncoder)
+
+if __name__ == '__main__':
+    main()
--- a/ppdet/data/transform/__init__.py
+++ b/ppdet/data/transform/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import print_function
+
+import copy
+import logging
+
+from .transformer import MappedDataset, BatchedDataset
+from .post_map import build_post_map
+from .parallel_map import ParallelMappedDataset
+from .operators import BaseOperator, registered_ops
+
+__all__ = ['build_mapper', 'map', 'batch', 'batch_map']
+
+logger = logging.getLogger(__name__)
+
+
+def build_mapper(ops, context=None):
+    """
+    Build a mapper for operators in 'ops'
+
+    Args:
+        ops (list of operator.BaseOperator or list of op dict):
+            configs for oprators, eg:
+            [{'name': 'DecodeImage', 'params': {'to_rgb': True}}, {xxx}]
+        context (dict): a context object for mapper
+
+    Returns:
+        a mapper function which accept one argument 'sample' and
+        return the processed result
+    """
+    new_ops = []
+    for _dict in ops:
+        new_dict = {}
+        for i, j in _dict.items():
+            new_dict[i.lower()] = j
+        new_ops.append(new_dict)
+    ops = new_ops
+    op_funcs = []
+    op_repr = []
+    for op in ops:
+        if type(op) is dict and 'op' in op:
+            op_func = getattr(BaseOperator, op['op'])
+            params = copy.deepcopy(op)
+            del params['op']
+            o = op_func(**params)
+        elif not isinstance(op, BaseOperator):
+            op_func = getattr(BaseOperator, op['name'])
+            params = {} if 'params' not in op else op['params']
+            o = op_func(**params)
+        else:
+            assert isinstance(op, BaseOperator), \
+                "invalid operator when build ops"
+            o = op
+        op_funcs.append(o)
+        op_repr.append('{{}}'.format(str(o)))
+    op_repr = '[{}]'.format(','.join(op_repr))
+
+    def _mapper(sample):
+        ctx = {} if context is None else copy.deepcopy(context)
+        for f in op_funcs:
+            try:
+                out = f(sample, ctx)
+                sample = out
+            except Exception as e:
+                logger.warn("fail to map op [{}] with error: {}".format(f, e))
+        return out
+
+    _mapper.ops = op_repr
+    return _mapper
+
+
+def map(ds, mapper, worker_args=None):
+    """
+    Apply 'mapper' to 'ds'
+
+    Args:
+        ds (instance of Dataset): dataset to be mapped
+        mapper (function): action to be executed for every data sample
+        worker_args (dict): configs for concurrent mapper
+    Returns:
+        a mapped dataset
+    """
+
+    if worker_args is not None:
+        return ParallelMappedDataset(ds, mapper, worker_args)
+    else:
+        return MappedDataset(ds, mapper)
+
+
+def batch(ds, batchsize, drop_last=False):
+    """
+    Batch data samples to batches
+    Args:
+        batchsize (int): number of samples for a batch
+        drop_last (bool): drop last few samples if not enough for a batch
+
+    Returns:
+        a batched dataset
+    """
+
+    return BatchedDataset(ds, batchsize, drop_last=drop_last)
+
+
+def batch_map(ds, config):
+    """
+    Post process the batches.
+
+    Args:
+        ds (instance of Dataset): dataset to be mapped
+        mapper (function): action to be executed for every batch
+    Returns:
+        a batched dataset which is processed
+    """
+
+    mapper = build_post_map(**config)
+    return MappedDataset(ds, mapper)
+
+
+for nm in registered_ops:
+    op = getattr(BaseOperator, nm)
+    locals()[nm] = op
+
+__all__ += registered_ops
--- a/ppdet/data/transform/arrange_sample.py
+++ b/ppdet/data/transform/arrange_sample.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# function:
+#    operators to process sample,
+#    eg: decode/resize/crop image
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import logging
+import numpy as np
+from .operators import BaseOperator, register_op
+
+logger = logging.getLogger(__name__)
+
+
+@register_op
+class ArrangeRCNN(BaseOperator):
+    """
+    Transform dict to tuple format needed for training.
+
+    Args:
+        is_mask (bool): whether to use include mask data
+    """
+
+    def __init__(self, is_mask=False):
+        super(ArrangeRCNN, self).__init__()
+        self.is_mask = is_mask
+        assert isinstance(self.is_mask, bool), "wrong type for is_mask"
+
+    def __call__(self, sample, context=None):
+        """
+        Args:
+            sample: a dict which contains image
+                    info and annotation info.
+            context: a dict which contains additional info.
+        Returns:
+            sample: a tuple containing following items
+                (image, im_info, im_id, gt_bbox, gt_class, is_crowd, gt_masks)
+        """
+        im = sample['image']
+        gt_bbox = sample['gt_bbox']
+        gt_class = sample['gt_class']
+        keys = list(sample.keys())
+        if 'is_crowd' in keys:
+            is_crowd = sample['is_crowd']
+        else:
+            raise KeyError("The dataset doesn't have 'is_crowd' key.")
+        if 'im_info' in keys:
+            im_info = sample['im_info']
+        else:
+            raise KeyError("The dataset doesn't have 'im_info' key.")
+        im_id = sample['im_id']
+
+        outs = (im, im_info, im_id, gt_bbox, gt_class, is_crowd)
+        gt_masks = []
+        if self.is_mask and len(sample['gt_poly']) != 0 \
+                and 'is_crowd' in keys:
+            valid = True
+            segms = sample['gt_poly']
+            assert len(segms) == is_crowd.shape[0]
+            for i in range(len(sample['gt_poly'])):
+                segm, iscrowd = segms[i], is_crowd[i]
+                gt_segm = []
+                if iscrowd:
+                    gt_segm.append([[0, 0]])
+                else:
+                    for poly in segm:
+                        if len(poly) == 0:
+                            valid = False
+                            break
+                        gt_segm.append(np.array(poly).reshape(-1, 2))
+                if (not valid) or len(gt_segm) == 0:
+                    break
+                gt_masks.append(gt_segm)
+            outs = outs + (gt_masks, )
+        return outs
+
+
+@register_op
+class ArrangeTestRCNN(BaseOperator):
+    """
+    Transform dict to the tuple format needed for training.
+    """
+
+    def __init__(self):
+        super(ArrangeTestRCNN, self).__init__()
+
+    def __call__(self, sample, context=None):
+        """
+        Args:
+            sample: a dict which contains image
+                    info and annotation info.
+            context: a dict which contains additional info.
+        Returns:
+            sample: a tuple containing the following items:
+                    (image, im_info, im_id)
+        """
+        im = sample['image']
+        keys = list(sample.keys())
+        if 'im_info' in keys:
+            im_info = sample['im_info']
+        else:
+            raise KeyError("The dataset doesn't have 'im_info' key.")
+        im_id = sample['im_id']
+        h = sample['h']
+        w = sample['w']
+        # For rcnn models in eval and infer stage, original image size
+        # is needed to clip the bounding boxes. And box clip op in
+        # bbox prediction needs im_info as input in format of [N, 3],
+        # so im_shape is appended by 1 to match dimension.
+        im_shape = np.array((h, w, 1), dtype=np.float32)
+        outs = (im, im_info, im_id, im_shape)
+        return outs
+
+
+@register_op
+class ArrangeSSD(BaseOperator):
+    """
+    Transform dict to tuple format needed for training.
+
+    Args:
+        is_mask (bool): whether to use include mask data
+    """
+
+    def __init__(self, is_mask=False):
+        super(ArrangeSSD, self).__init__()
+        self.is_mask = is_mask
+        assert isinstance(self.is_mask, bool), "wrong type for is_mask"
+
+    def __call__(self, sample, context=None):
+        """
+        Args:
+            sample: a dict which contains image
+                    info and annotation info.
+            context: a dict which contains additional info.
+        Returns:
+            sample: a tuple containing the following items:
+                    (image, gt_bbox, gt_class, difficult)
+        """
+        im = sample['image']
+        gt_bbox = sample['gt_bbox']
+        gt_class = sample['gt_class']
+        difficult = sample['difficult']
+        outs = (im, gt_bbox, gt_class, difficult)
+        return outs
+
+
+@register_op
+class ArrangeTestSSD(BaseOperator):
+    """
+    Transform dict to tuple format needed for training.
+
+    Args:
+        is_mask (bool): whether to use include mask data
+    """
+
+    def __init__(self, is_mask=False):
+        super(ArrangeTestSSD, self).__init__()
+        self.is_mask = is_mask
+        assert isinstance(self.is_mask, bool), "wrong type for is_mask"
+
+    def __call__(self, sample, context=None):
+        """
+        Args:
+            sample: a dict which contains image
+                    info and annotation info.
+            context: a dict which contains additional info.
+        Returns:
+            sample: a tuple containing the following items: (image)
+        """
+        im = sample['image']
+        im_id = sample['im_id']
+        outs = (im, im_id)
+        return outs
+
+
+@register_op
+class ArrangeYOLO(BaseOperator):
+    """
+    Transform dict to the tuple format needed for training.
+    """
+
+    def __init__(self):
+        super(ArrangeYOLO, self).__init__()
+
+    def __call__(self, sample, context=None):
+        """
+        Args:
+            sample: a dict which contains image
+                    info and annotation info.
+            context: a dict which contains additional info.
+        Returns:
+            sample: a tuple containing the following items:
+                (image, gt_bbox, gt_class, gt_score,
+                 is_crowd, im_info, gt_masks)
+        """
+        im = sample['image']
+        if len(sample['gt_bbox']) != len(sample['gt_class']):
+            raise ValueError("gt num mismatch: bbox and class.")
+        if len(sample['gt_bbox']) != len(sample['gt_score']):
+            raise ValueError("gt num mismatch: bbox and score.")
+        gt_bbox = np.zeros((50, 4), dtype=im.dtype)
+        gt_class = np.zeros((50, ), dtype=np.int32)
+        gt_score = np.zeros((50, ), dtype=im.dtype)
+        gt_num = min(50, len(sample['gt_bbox']))
+        if gt_num > 0:
+            gt_bbox[:gt_num, :] = sample['gt_bbox'][:gt_num, :]
+            gt_class[:gt_num] = sample['gt_class'][:gt_num, 0]
+            gt_score[:gt_num] = sample['gt_score'][:gt_num, 0]
+        # parse [x1, y1, x2, y2] to [x, y, w, h]
+        gt_bbox[:, 2:4] = gt_bbox[:, 2:4] - gt_bbox[:, :2]
+        gt_bbox[:, :2] = gt_bbox[:, :2] + gt_bbox[:, 2:4] / 2.
+        outs = (im, gt_bbox, gt_class, gt_score)
+        return outs
+
+
+@register_op
+class ArrangeTestYOLO(BaseOperator):
+    """
+    Transform dict to the tuple format needed for training.
+    """
+
+    def __init__(self):
+        super(ArrangeTestYOLO, self).__init__()
+
+    def __call__(self, sample, context=None):
+        """
+        Args:
+            sample: a dict which contains image
+                    info and annotation info.
+            context: a dict which contains additional info.
+        Returns:
+            sample: a tuple containing the following items:
+                (image, gt_bbox, gt_class, gt_score, is_crowd,
+                 im_info, gt_masks)
+        """
+        im = sample['image']
+        im_id = sample['im_id']
+        h = sample['h']
+        w = sample['w']
+        im_shape = np.array((h, w))
+        outs = (im, im_shape, im_id)
+        return outs
--- a/ppdet/data/transform/op_helper.py
+++ b/ppdet/data/transform/op_helper.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# this file contains helper methods for BBOX processing
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+
+def meet_emit_constraint(src_bbox, sample_bbox):
+    center_x = (src_bbox[2] + src_bbox[0]) / 2
+    center_y = (src_bbox[3] + src_bbox[1]) / 2
+    if center_x >= sample_bbox[0] and \
+            center_x <= sample_bbox[2] and \
+            center_y >= sample_bbox[1] and \
+            center_y <= sample_bbox[3]:
+        return True
+    return False
+
+
+def clip_bbox(src_bbox):
+    src_bbox[0] = max(min(src_bbox[0], 1.0), 0.0)
+    src_bbox[1] = max(min(src_bbox[1], 1.0), 0.0)
+    src_bbox[2] = max(min(src_bbox[2], 1.0), 0.0)
+    src_bbox[3] = max(min(src_bbox[3], 1.0), 0.0)
+    return src_bbox
+
+
+def bbox_area(src_bbox):
+    width = src_bbox[2] - src_bbox[0]
+    height = src_bbox[3] - src_bbox[1]
+    return width * height
+
+
+def filter_and_process(sample_bbox, bboxes, labels, scores=None):
+    new_bboxes = []
+    new_labels = []
+    new_scores = []
+    for i in range(len(labels)):
+        new_bbox = [0, 0, 0, 0]
+        obj_bbox = [bboxes[i][0], bboxes[i][1], bboxes[i][2], bboxes[i][3]]
+        if not meet_emit_constraint(obj_bbox, sample_bbox):
+            continue
+        sample_width = sample_bbox[2] - sample_bbox[0]
+        sample_height = sample_bbox[3] - sample_bbox[1]
+        new_bbox[0] = (obj_bbox[0] - sample_bbox[0]) / sample_width
+        new_bbox[1] = (obj_bbox[1] - sample_bbox[1]) / sample_height
+        new_bbox[2] = (obj_bbox[2] - sample_bbox[0]) / sample_width
+        new_bbox[3] = (obj_bbox[3] - sample_bbox[1]) / sample_height
+        new_bbox = clip_bbox(new_bbox)
+        if bbox_area(new_bbox) > 0:
+            new_bboxes.append(new_bbox)
+            new_labels.append([labels[i][0]])
+            if scores is not None:
+                new_scores.append([scores[i][0]])
+    bboxes = np.array(new_bboxes)
+    labels = np.array(new_labels)
+    scores = np.array(new_scores)
+    return bboxes, labels, scores
+
+
+def generate_sample_bbox(sampler):
+    scale = np.random.uniform(sampler[2], sampler[3])
+    aspect_ratio = np.random.uniform(sampler[4], sampler[5])
+    aspect_ratio = max(aspect_ratio, (scale**2.0))
+    aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
+    bbox_width = scale * (aspect_ratio**0.5)
+    bbox_height = scale / (aspect_ratio**0.5)
+    xmin_bound = 1 - bbox_width
+    ymin_bound = 1 - bbox_height
+    xmin = np.random.uniform(0, xmin_bound)
+    ymin = np.random.uniform(0, ymin_bound)
+    xmax = xmin + bbox_width
+    ymax = ymin + bbox_height
+    sampled_bbox = [xmin, ymin, xmax, ymax]
+    return sampled_bbox
+
+
+def jaccard_overlap(sample_bbox, object_bbox):
+    if sample_bbox[0] >= object_bbox[2] or \
+        sample_bbox[2] <= object_bbox[0] or \
+        sample_bbox[1] >= object_bbox[3] or \
+        sample_bbox[3] <= object_bbox[1]:
+        return 0
+    intersect_xmin = max(sample_bbox[0], object_bbox[0])
+    intersect_ymin = max(sample_bbox[1], object_bbox[1])
+    intersect_xmax = min(sample_bbox[2], object_bbox[2])
+    intersect_ymax = min(sample_bbox[3], object_bbox[3])
+    intersect_size = (intersect_xmax - intersect_xmin) * (
+        intersect_ymax - intersect_ymin)
+    sample_bbox_size = bbox_area(sample_bbox)
+    object_bbox_size = bbox_area(object_bbox)
+    overlap = intersect_size / (
+        sample_bbox_size + object_bbox_size - intersect_size)
+    return overlap
+
+
+def satisfy_sample_constraint(sampler,
+                              sample_bbox,
+                              gt_bboxes,
+                              satisfy_all=False):
+    if sampler[6] == 0 and sampler[7] == 0:
+        return True
+    satisfied = []
+    for i in range(len(gt_bboxes)):
+        object_bbox = [
+            gt_bboxes[i][0], gt_bboxes[i][1], gt_bboxes[i][2], gt_bboxes[i][3]
+        ]
+        overlap = jaccard_overlap(sample_bbox, object_bbox)
+        if sampler[6] != 0 and \
+                overlap < sampler[6]:
+            satisfied.append(False)
+            continue
+        if sampler[7] != 0 and \
+                overlap > sampler[7]:
+            satisfied.append(False)
+            continue
+        satisfied.append(True)
+        if not satisfy_all:
+            return True
+
+    if satisfy_all:
+        return np.all(satisfied)
+    else:
+        return False
--- a/ppdet/data/transform/operators.py
+++ b/ppdet/data/transform/operators.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# function:
+#    operators to process sample,
+#    eg: decode/resize/crop image
+
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import uuid
+import logging
+import random
+import math
+import numpy as np
+import cv2
+from PIL import Image, ImageEnhance
+
+from ppdet.core.workspace import serializable
+
+from .op_helper import (satisfy_sample_constraint, filter_and_process,
+                        generate_sample_bbox, clip_bbox)
+
+logger = logging.getLogger(__name__)
+
+registered_ops = []
+
+
+def register_op(cls):
+    registered_ops.append(cls.__name__)
+    if not hasattr(BaseOperator, cls.__name__):
+        setattr(BaseOperator, cls.__name__, cls)
+    else:
+        raise KeyError("The {} class has been registered.".format(cls.__name__))
+    return serializable(cls)
+
+
+class BboxError(ValueError):
+    pass
+
+
+class ImageError(ValueError):
+    pass
+
+
+class BaseOperator(object):
+    def __init__(self, name=None):
+        if name is None:
+            name = self.__class__.__name__
+        self._id = name + '_' + str(uuid.uuid4())[-6:]
+
+    def __call__(self, sample, context=None):
+        """ Process a sample.
+        Args:
+            sample (dict): a dict of sample, eg: {'image':xx, 'label': xxx}
+            context (dict): info about this sample processing
+        Returns:
+            result (dict): a processed sample
+        """
+        return sample
+
+    def __str__(self):
+        return str(self._id)
+
+
+@register_op
+class DecodeImage(BaseOperator):
+    def __init__(self, to_rgb=True, with_mixup=False):
+        """ Transform the image data to numpy format.
+
+        Args:
+            to_rgb (bool): whether to convert BGR to RGB
+        """
+
+        super(DecodeImage, self).__init__()
+        self.to_rgb = to_rgb
+        self.with_mixup = with_mixup
+        if not isinstance(self.to_rgb, bool):
+            raise TypeError("{}: input type is invalid.".format(self))
+        if not isinstance(self.with_mixup, bool):
+            raise TypeError("{}: input type is invalid.".format(self))
+
+    def __call__(self, sample, context=None):
+        """ load image if 'im_file' field is not empty but 'image' is"""
+        if 'image' not in sample:
+            with open(sample['im_file'], 'rb') as f:
+                sample['image'] = f.read()
+
+        im = sample['image']
+        data = np.frombuffer(im, dtype='uint8')
+        im = cv2.imdecode(data, 1)  # BGR mode, but need RGB mode
+        if self.to_rgb:
+            im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
+        sample['image'] = im
+
+        if 'h' not in sample:
+            sample['h'] = im.shape[0]
+        if 'w' not in sample:
+            sample['w'] = im.shape[1]
+        # make default im_info with [h, w, 1]
+        sample['im_info'] = np.array(
+            [im.shape[0], im.shape[1], 1.], dtype=np.float32)
+        # decode mixup image
+        if self.with_mixup and 'mixup' in sample:
+            self.__call__(sample['mixup'], context)
+        return sample
+
+
+@register_op
+class ResizeImage(BaseOperator):
+    def __init__(self,
+                 target_size=0,
+                 max_size=0,
+                 interp=cv2.INTER_LINEAR,
+                 use_cv2=True):
+        """
+        Args:
+            target_size (int): the taregt size of image's short side
+            max_size (int): the max size of image
+            interp (int): the interpolation method
+            use_cv2 (bool): use the cv2 interpolation method or use PIL interpolation method
+        """
+        super(ResizeImage, self).__init__()
+        self.target_size = int(target_size)
+        self.max_size = int(max_size)
+        self.interp = int(interp)
+        self.use_cv2 = use_cv2
+        if not (isinstance(self.target_size, int) and isinstance(
+                self.max_size, int) and isinstance(self.interp, int)):
+            raise TypeError("{}: input type is invalid.".format(self))
+
+    def __call__(self, sample, context=None):
+        """ Resise the image numpy.
+        """
+        im = sample['image']
+        if not isinstance(im, np.ndarray):
+            raise TypeError("{}: image type is not numpy.".format(self))
+        if len(im.shape) != 3:
+            raise ImageError('{}: image is not 3-dimensional.'.format(self))
+        im_shape = im.shape
+        im_size_min = np.min(im_shape[0:2])
+        im_size_max = np.max(im_shape[0:2])
+        if float(im_size_min) == 0:
+            raise ZeroDivisionError('{}: min size of image is 0'.format(self))
+        if self.max_size != 0:
+            im_scale = float(self.target_size) / float(im_size_min)
+            # Prevent the biggest axis from being more than max_size
+            if np.round(im_scale * im_size_max) > self.max_size:
+                im_scale = float(self.max_size) / float(im_size_max)
+            im_scale_x = im_scale
+            im_scale_y = im_scale
+            sample['im_info'] = np.array(
+                [
+                    np.round(im_shape[0] * im_scale),
+                    np.round(im_shape[1] * im_scale), im_scale
+                ],
+                dtype=np.float32)
+        else:
+            im_scale_x = float(self.target_size) / float(im_shape[1])
+            im_scale_y = float(self.target_size) / float(im_shape[0])
+        if self.use_cv2:
+            im = cv2.resize(
+                im,
+                None,
+                None,
+                fx=im_scale_x,
+                fy=im_scale_y,
+                interpolation=self.interp)
+        else:
+            im = Image.fromarray(im)
+            im = im.resize((self.target_size, self.target_size), self.interp)
+            im = np.array(im)
+
+        sample['image'] = im
+        return sample
+
+
+@register_op
+class RandomFlipImage(BaseOperator):
+    def __init__(self, prob=0.5, is_normalized=False, is_mask_flip=False):
+        """
+        Args:
+            prob (float): the probability of flipping image
+            is_normalized (bool): whether the bbox scale to [0,1]
+            is_mask_flip (bool): whether flip the segmentation
+        """
+        super(RandomFlipImage, self).__init__()
+        self.prob = prob
+        self.is_normalized = is_normalized
+        self.is_mask_flip = is_mask_flip
+        if not (isinstance(self.prob, float) and
+                isinstance(self.is_normalized, bool) and
+                isinstance(self.is_mask_flip, bool)):
+            raise TypeError("{}: input type is invalid.".format(self))
+
+    def flip_segms(self, segms, height, width):
+        def _flip_poly(poly, width):
+            flipped_poly = np.array(poly)
+            flipped_poly[0::2] = width - np.array(poly[0::2]) - 1
+            return flipped_poly.tolist()
+
+        def _flip_rle(rle, height, width):
+            if 'counts' in rle and type(rle['counts']) == list:
+                rle = mask_util.frPyObjects([rle], height, width)
+            mask = mask_util.decode(rle)
+            mask = mask[:, ::-1, :]
+            rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8))
+            return rle
+
+        def is_poly(segm):
+            assert isinstance(segm, (list, dict)), \
+                "Invalid segm type: {}".format(type(segm))
+            return isinstance(segm, list)
+
+        flipped_segms = []
+        for segm in segms:
+            if is_poly(segm):
+                # Polygon format
+                flipped_segms.append([_flip_poly(poly, width) for poly in segm])
+            else:
+                # RLE format
+                import pycocotools.mask as mask_util
+                flipped_segms.append(_flip_rle(segm, height, width))
+        return flipped_segms
+
+    def __call__(self, sample, context=None):
+        """Filp the image and bounding box.
+        Operators:
+            1. Flip the image numpy.
+            2. Transform the bboxes' x coordinates.
+              (Must judge whether the coordinates are normalized!)
+            3. Transform the segmentations' x coordinates.
+              (Must judge whether the coordinates are normalized!)
+        Output:
+            sample: the image, bounding box and segmentation part
+                    in sample are flipped.
+        """
+        gt_bbox = sample['gt_bbox']
+        im = sample['image']
+        if not isinstance(im, np.ndarray):
+            raise TypeError("{}: image is not a numpy array.".format(self))
+        if len(im.shape) != 3:
+            raise ImageError("{}: image is not 3-dimensional.".format(self))
+        height, width, _ = im.shape
+        if np.random.uniform(0, 1) < self.prob:
+            im = im[:, ::-1, :]
+            if gt_bbox.shape[0] == 0:
+                return sample
+            oldx1 = gt_bbox[:, 0].copy()
+            oldx2 = gt_bbox[:, 2].copy()
+            if self.is_normalized:
+                gt_bbox[:, 0] = 1 - oldx2
+                gt_bbox[:, 2] = 1 - oldx1
+            else:
+                gt_bbox[:, 0] = width - oldx2 - 1
+                gt_bbox[:, 2] = width - oldx1 - 1
+            if gt_bbox.shape[0] != 0 and (gt_bbox[:, 2] < gt_bbox[:, 0]).all():
+                m = "{}: invalid box, x2 should be greater than x1".format(self)
+                raise BboxError(m)
+            sample['gt_bbox'] = gt_bbox
+            if self.is_mask_flip and len(sample['gt_poly']) != 0:
+                sample['gt_poly'] = self.flip_segms(sample['gt_poly'], height,
+                                                    width)
+            sample['flipped'] = True
+            sample['image'] = im
+        return sample
+
+
+@register_op
+class NormalizeImage(BaseOperator):
+    def __init__(self,
+                 mean=[0.485, 0.456, 0.406],
+                 std=[1, 1, 1],
+                 is_scale=True,
+                 is_channel_first=True):
+        """
+        Args:
+            mean (list): the pixel mean
+            std (list): the pixel variance
+        """
+        super(NormalizeImage, self).__init__()
+        self.mean = mean
+        self.std = std
+        self.is_scale = is_scale
+        self.is_channel_first = is_channel_first
+        if not (isinstance(self.mean, list) and isinstance(self.std, list) and
+                isinstance(self.is_scale, bool)):
+            raise TypeError("{}: input type is invalid.".format(self))
+        from functools import reduce
+        if reduce(lambda x, y: x * y, self.std) == 0:
+            raise ValueError('{}: std is invalid!'.format(self))
+
+    def __call__(self, sample, context=None):
+        """Normalize the image.
+        Operators:
+            1.(optional) Scale the image to [0,1]
+            2. Each pixel minus mean and is divided by std
+        """
+        im = sample['image']
+        im = im.astype(np.float32, copy=False)
+        if self.is_channel_first:
+            mean = np.array(self.mean)[:, np.newaxis, np.newaxis]
+            std = np.array(self.std)[:, np.newaxis, np.newaxis]
+        else:
+            mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+            std = np.array(self.std)[np.newaxis, np.newaxis, :]
+        if self.is_scale:
+            im = im / 255.0
+        im -= mean
+        im /= std
+        sample['image'] = im
+        return sample
+
+
+@register_op
+class RandomDistort(BaseOperator):
+    def __init__(self,
+                 brightness_lower=0.5,
+                 brightness_upper=1.5,
+                 contrast_lower=0.5,
+                 contrast_upper=1.5,
+                 saturation_lower=0.5,
+                 saturation_upper=1.5,
+                 hue_lower=-18,
+                 hue_upper=18,
+                 brightness_prob=0.5,
+                 contrast_prob=0.5,
+                 saturation_prob=0.5,
+                 hue_prob=0.5,
+                 count=4,
+                 is_order=False):
+        """
+        Args:
+            brightness_lower/ brightness_upper (float): the brightness
+                between brightness_lower and brightness_upper
+            contrast_lower/ contrast_upper (float): the contrast between
+                contrast_lower and contrast_lower
+            saturation_lower/ saturation_upper (float): the saturation
+                between saturation_lower and saturation_upper
+            hue_lower/ hue_upper (float): the hue between
+                hue_lower and hue_upper
+            brightness_prob (float): the probability of changing brightness
+            contrast_prob (float): the probability of changing contrast
+            saturation_prob (float): the probability of changing saturation
+            hue_prob (float): the probability of changing hue
+            count (int): the kinds of doing distrot
+            is_order (bool): whether determine the order of distortion
+        """
+        super(RandomDistort, self).__init__()
+        self.brightness_lower = brightness_lower
+        self.brightness_upper = brightness_upper
+        self.contrast_lower = contrast_lower
+        self.contrast_upper = contrast_upper
+        self.saturation_lower = saturation_lower
+        self.saturation_upper = saturation_upper
+        self.hue_lower = hue_lower
+        self.hue_upper = hue_upper
+        self.brightness_prob = brightness_prob
+        self.contrast_prob = contrast_prob
+        self.saturation_prob = saturation_prob
+        self.hue_prob = hue_prob
+        self.count = count
+        self.is_order = is_order
+
+    def random_brightness(self, img):
+        brightness_delta = np.random.uniform(self.brightness_lower,
+                                             self.brightness_upper)
+        prob = np.random.uniform(0, 1)
+        if prob < self.brightness_prob:
+            img = ImageEnhance.Brightness(img).enhance(brightness_delta)
+        return img
+
+    def random_contrast(self, img):
+        contrast_delta = np.random.uniform(self.contrast_lower,
+                                           self.contrast_upper)
+        prob = np.random.uniform(0, 1)
+        if prob < self.contrast_prob:
+            img = ImageEnhance.Contrast(img).enhance(contrast_delta)
+        return img
+
+    def random_saturation(self, img):
+        saturation_delta = np.random.uniform(self.saturation_lower,
+                                             self.saturation_upper)
+        prob = np.random.uniform(0, 1)
+        if prob < self.saturation_prob:
+            img = ImageEnhance.Color(img).enhance(saturation_delta)
+        return img
+
+    def random_hue(self, img):
+        hue_delta = np.random.uniform(self.hue_lower, self.hue_upper)
+        prob = np.random.uniform(0, 1)
+        if prob < self.hue_prob:
+            img = np.array(img.convert('HSV'))
+            img[:, :, 0] = img[:, :, 0] + hue_delta
+            img = Image.fromarray(img, mode='HSV').convert('RGB')
+        return img
+
+    def __call__(self, sample, context):
+        """random distort the image"""
+        ops = [
+            self.random_brightness, self.random_contrast,
+            self.random_saturation, self.random_hue
+        ]
+        if self.is_order:
+            prob = np.random.uniform(0, 1)
+            if prob < 0.5:
+                ops = [
+                    self.random_brightness,
+                    self.random_saturation,
+                    self.random_hue,
+                    self.random_contrast,
+                ]
+        else:
+            ops = random.sample(ops, self.count)
+        assert 'image' in sample, "image data not found"
+        im = sample['image']
+        im = Image.fromarray(im)
+        for id in range(self.count):
+            im = ops[id](im)
+        im = np.asarray(im)
+        sample['image'] = im
+        return sample
+
+
+@register_op
+class ExpandImage(BaseOperator):
+    def __init__(self, max_ratio, prob, mean=[127.5, 127.5, 127.5]):
+        """
+        Args:
+            ratio (float): the ratio of expanding
+            prob (float): the probability of expanding image
+            mean (list): the pixel mean
+        """
+        super(ExpandImage, self).__init__()
+        self.max_ratio = max_ratio
+        self.mean = mean
+        self.prob = prob
+
+    def __call__(self, sample, context):
+        """
+        Expand the image and modify bounding box.
+        Operators:
+            1. Scale the image weight and height.
+            2. Construct new images with new height and width.
+            3. Fill the new image with the mean.
+            4. Put original imge into new image.
+            5. Rescale the bounding box.
+            6. Determine if the new bbox is satisfied in the new image.
+        Returns:
+            sample: the image, bounding box are replaced.
+        """
+
+        prob = np.random.uniform(0, 1)
+        assert 'image' in sample, 'not found image data'
+        im = sample['image']
+        gt_bbox = sample['gt_bbox']
+        gt_class = sample['gt_class']
+        im_width = sample['w']
+        im_height = sample['h']
+        if prob < self.prob:
+            if self.max_ratio - 1 >= 0.01:
+                expand_ratio = np.random.uniform(1, self.max_ratio)
+                height = int(im_height * expand_ratio)
+                width = int(im_width * expand_ratio)
+                h_off = math.floor(np.random.uniform(0, height - im_height))
+                w_off = math.floor(np.random.uniform(0, width - im_width))
+                expand_bbox = [
+                    -w_off / im_width, -h_off / im_height,
+                    (width - w_off) / im_width, (height - h_off) / im_height
+                ]
+                expand_im = np.ones((height, width, 3))
+                expand_im = np.uint8(expand_im * np.squeeze(self.mean))
+                expand_im = Image.fromarray(expand_im)
+                im = Image.fromarray(im)
+                expand_im.paste(im, (int(w_off), int(h_off)))
+                expand_im = np.asarray(expand_im)
+                gt_bbox, gt_class, _ = filter_and_process(expand_bbox, gt_bbox,
+                                                          gt_class)
+                sample['image'] = expand_im
+                sample['gt_bbox'] = gt_bbox
+                sample['gt_class'] = gt_class
+                sample['w'] = width
+                sample['h'] = height
+
+        return sample
+
+
+@register_op
+class CropImage(BaseOperator):
+    def __init__(self, batch_sampler, satisfy_all=False, avoid_no_bbox=True):
+        """
+        Args:
+            batch_sampler (list): Multiple sets of different
+                                  parameters for cropping.
+            satisfy_all (bool): whether all boxes must satisfy.
+            avoid_no_bbox (bool): whether to to avoid the 
+                                  situation where the box does not appear.
+            e.g.[[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0],
+                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
+                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
+                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
+                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
+                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
+                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
+           [max sample, max trial, min scale, max scale,
+            min aspect ratio, max aspect ratio,
+            min overlap, max overlap]
+        """
+        super(CropImage, self).__init__()
+        self.batch_sampler = batch_sampler
+        self.satisfy_all = satisfy_all
+        self.avoid_no_bbox = avoid_no_bbox
+
+    def __call__(self, sample, context):
+        """
+        Crop the image and modify bounding box.
+        Operators:
+            1. Scale the image weight and height.
+            2. Crop the image according to a radom sample.
+            3. Rescale the bounding box.
+            4. Determine if the new bbox is satisfied in the new image.
+        Returns:
+            sample: the image, bounding box are replaced.
+        """
+        assert 'image' in sample, "image data not found"
+        im = sample['image']
+        gt_bbox = sample['gt_bbox']
+        gt_class = sample['gt_class']
+        im_width = sample['w']
+        im_height = sample['h']
+        gt_score = sample['gt_score']
+        sampled_bbox = []
+        gt_bbox = gt_bbox.tolist()
+        for sampler in self.batch_sampler:
+            found = 0
+            for i in range(sampler[1]):
+                if found >= sampler[0]:
+                    break
+                sample_bbox = generate_sample_bbox(sampler)
+                if satisfy_sample_constraint(sampler, sample_bbox, gt_bbox,
+                                             self.satisfy_all):
+                    sampled_bbox.append(sample_bbox)
+                    found = found + 1
+        im = np.array(im)
+        while sampled_bbox:
+            idx = int(np.random.uniform(0, len(sampled_bbox)))
+            sample_bbox = sampled_bbox.pop(idx)
+            sample_bbox = clip_bbox(sample_bbox)
+            crop_bbox, crop_class, crop_score = \
+                filter_and_process(sample_bbox, gt_bbox, gt_class, gt_score)
+            if self.avoid_no_bbox:
+                if len(crop_bbox) < 1:
+                    continue
+            xmin = int(sample_bbox[0] * im_width)
+            xmax = int(sample_bbox[2] * im_width)
+            ymin = int(sample_bbox[1] * im_height)
+            ymax = int(sample_bbox[3] * im_height)
+            im = im[ymin:ymax, xmin:xmax]
+            sample['image'] = im
+            sample['gt_bbox'] = crop_bbox
+            sample['gt_class'] = crop_class
+            sample['gt_score'] = crop_score
+            return sample
+        return sample
+
+
+@register_op
+class NormalizeBox(BaseOperator):
+    """Transform the bounding box's coornidates to [0,1]."""
+
+    def __init__(self):
+        super(NormalizeBox, self).__init__()
+
+    def __call__(self, sample, context):
+        gt_bbox = sample['gt_bbox']
+        width = sample['w']
+        height = sample['h']
+        for i in range(gt_bbox.shape[0]):
+            gt_bbox[i][0] = gt_bbox[i][0] / width
+            gt_bbox[i][1] = gt_bbox[i][1] / height
+            gt_bbox[i][2] = gt_bbox[i][2] / width
+            gt_bbox[i][3] = gt_bbox[i][3] / height
+        sample['gt_bbox'] = gt_bbox
+        return sample
+
+
+@register_op
+class Permute(BaseOperator):
+    def __init__(self, to_bgr=True, channel_first=True):
+        """
+        Change the channel.
+        Args:
+            to_bgr (bool): confirm whether to convert RGB to BGR
+            channel_first (bool): confirm whether to change channel
+
+        """
+        super(Permute, self).__init__()
+        self.to_bgr = to_bgr
+        self.channel_first = channel_first
+        if not (isinstance(self.to_bgr, bool) and
+                isinstance(self.channel_first, bool)):
+            raise TypeError("{}: input type is invalid.".format(self))
+
+    def __call__(self, sample, context=None):
+        assert 'image' in sample, "image data not found"
+        im = sample['image']
+        if self.channel_first:
+            im = np.swapaxes(im, 1, 2)
+            im = np.swapaxes(im, 1, 0)
+        if self.to_bgr:
+            im = im[[2, 1, 0], :, :]
+        sample['image'] = im
+        return sample
+
+
+@register_op
+class MixupImage(BaseOperator):
+    def __init__(self, alpha=1.5, beta=1.5):
+        """ Mixup image and gt_bbbox/gt_score
+        Args:
+            alpha (float): alpha parameter of beta distribute
+            beta (float): beta parameter of beta distribute
+        """
+        super(MixupImage, self).__init__()
+        self.alpha = alpha
+        self.beta = beta
+        if self.alpha <= 0.0:
+            raise ValueError("alpha shold be positive in {}".format(self))
+        if self.beta <= 0.0:
+            raise ValueError("beta shold be positive in {}".format(self))
+
+    def _mixup_img(self, img1, img2, factor):
+        h = max(img1.shape[0], img2.shape[0])
+        w = max(img1.shape[1], img2.shape[1])
+        img = np.zeros((h, w, img1.shape[2]), 'float32')
+        img[:img1.shape[0], :img1.shape[1], :] = \
+            img1.astype('float32') * factor
+        img[:img2.shape[0], :img2.shape[1], :] += \
+            img2.astype('float32') * (1.0 - factor)
+        return img.astype('uint8')
+
+    def __call__(self, sample, context=None):
+        if 'mixup' not in sample:
+            return sample
+        factor = np.random.beta(self.alpha, self.beta)
+        factor = max(0.0, min(1.0, factor))
+        if factor >= 1.0:
+            sample.pop('mixup')
+            return sample
+        if factor <= 0.0:
+            return sample['mixup']
+        im = self._mixup_img(sample['image'], sample['mixup']['image'], factor)
+        gt_bbox1 = sample['gt_bbox']
+        gt_bbox2 = sample['mixup']['gt_bbox']
+        gt_bbox = np.concatenate((gt_bbox1, gt_bbox2), axis=0)
+        gt_class1 = sample['gt_class']
+        gt_class2 = sample['mixup']['gt_class']
+        gt_class = np.concatenate((gt_class1, gt_class2), axis=0)
+
+        gt_score1 = sample['gt_score']
+        gt_score2 = sample['mixup']['gt_score']
+        gt_score = np.concatenate(
+            (gt_score1 * factor, gt_score2 * (1. - factor)), axis=0)
+        sample['image'] = im
+        sample['gt_bbox'] = gt_bbox
+        sample['gt_score'] = gt_score
+        sample['gt_class'] = gt_class
+        sample['h'] = im.shape[0]
+        sample['w'] = im.shape[1]
+        sample.pop('mixup')
+        return sample
+
+
+@register_op
+class RandomInterpImage(BaseOperator):
+    def __init__(self, target_size=0, max_size=0):
+        """
+        Random reisze image by multiply interpolate method.
+        Args:
+            target_size (int): the taregt size of image's short side
+            max_size (int): the max size of image
+        """
+        super(RandomInterpImage, self).__init__()
+        self.target_size = target_size
+        self.max_size = max_size
+        if not (isinstance(self.target_size, int) and
+                isinstance(self.max_size, int)):
+            raise TypeError('{}: input type is invalid.'.format(self))
+        interps = [
+            cv2.INTER_NEAREST,
+            cv2.INTER_LINEAR,
+            cv2.INTER_AREA,
+            cv2.INTER_CUBIC,
+            cv2.INTER_LANCZOS4,
+        ]
+        self.resizers = []
+        for interp in interps:
+            self.resizers.append(ResizeImage(target_size, max_size, interp))
+
+    def __call__(self, sample, context=None):
+        """Resise the image numpy by random resizer."""
+        resizer = random.choice(self.resizers)
+        return resizer(sample, context)
--- a/ppdet/data/transform/parallel_map.py
+++ b/ppdet/data/transform/parallel_map.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# function:
+#   transform samples in 'source' using 'mapper'
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import sys
+import six
+import uuid
+import logging
+import signal
+import threading
+from .transformer import ProxiedDataset
+
+logger = logging.getLogger(__name__)
+
+
+class EndSignal(object):
+    def __init__(self, errno=0, errmsg=''):
+        self.errno = errno
+        self.errmsg = errmsg
+
+
+class ParallelMappedDataset(ProxiedDataset):
+    """
+    Transform samples to mapped samples which is similar to 'basic.MappedDataset',
+    but multiple workers (threads or processes) will be used
+
+    Notes:
+        this class is not thread-safe
+    """
+
+    def __init__(self, source, mapper, worker_args):
+        super(ParallelMappedDataset, self).__init__(source)
+        worker_args = {k.lower(): v for k, v in worker_args.items()}
+
+        args = {'bufsize': 100, 'worker_num': 8}
+        args.update(worker_args)
+        self._worker_args = args
+        self._started = False
+        self._source = source
+        self._mapper = mapper
+        self._exit = False
+        self._setup()
+
+    def _setup(self):
+        """setup input/output queues and workers """
+        use_process = False
+        if 'use_process' in self._worker_args:
+            use_process = self._worker_args['use_process']
+
+        bufsize = self._worker_args['bufsize']
+        if use_process:
+            from .shared_queue import SharedQueue as Queue
+            from multiprocessing import Process as Worker
+            from multiprocessing import Event
+        else:
+            if six.PY3:
+                from queue import Queue
+            else:
+                from Queue import Queue
+            from threading import Thread as Worker
+            from threading import Event
+
+        self._inq = Queue(bufsize)
+        self._outq = Queue(bufsize)
+        consumer_num = self._worker_args['worker_num']
+
+        id = str(uuid.uuid4())[-3:]
+        self._producer = threading.Thread(
+            target=self._produce,
+            args=('producer-' + id, self._source, self._inq))
+        self._producer.daemon = True
+
+        self._consumers = []
+        for i in range(consumer_num):
+            p = Worker(
+                target=self._consume,
+                args=('consumer-' + id + '_' + str(i), self._inq, self._outq,
+                      self._mapper))
+            self._consumers.append(p)
+            p.daemon = True
+
+        self._epoch = -1
+        self._feeding_ev = Event()
+        self._produced = 0  # produced sample in self._produce
+        self._consumed = 0  # consumed sample in self.next
+        self._stopped_consumers = 0
+
+    def _produce(self, id, source, inq):
+        """Fetch data from source and feed it to 'inq' queue"""
+        while True:
+            self._feeding_ev.wait()
+            if self._exit:
+                break
+            try:
+                inq.put(source.next())
+                self._produced += 1
+            except StopIteration:
+                self._feeding_ev.clear()
+                self._feeding_ev.wait()  # wait other guy to wake up me
+                logger.debug("producer[{}] starts new epoch".format(id))
+            except Exception as e:
+                msg = "producer[{}] failed with error: {}".format(id, str(e))
+                inq.put(EndSignal(-1, msg))
+                break
+
+        logger.debug("producer[{}] exits".format(id))
+
+    def _consume(self, id, inq, outq, mapper):
+        """Fetch data from 'inq', process it and put result to 'outq'"""
+        while True:
+            sample = inq.get()
+            if isinstance(sample, EndSignal):
+                sample.errmsg += "[consumer[{}] exits]".format(id)
+                outq.put(sample)
+                logger.debug("end signal received, " +
+                             "consumer[{}] exits".format(id))
+                break
+
+            try:
+                result = mapper(sample)
+                outq.put(result)
+            except Exception as e:
+                msg = 'failed to map consumer[%s], error: {}'.format(str(e), id)
+                outq.put(EndSignal(-1, msg))
+                break
+
+    def drained(self):
+        assert self._epoch >= 0, "first epoch has not started yet"
+        return self._source.drained() and self._produced == self._consumed
+
+    def stop(self):
+        """ notify to exit
+        """
+        self._exit = True
+        self._feeding_ev.set()
+        for _ in range(len(self._consumers)):
+            self._inq.put(EndSignal(0, "notify consumers to exit"))
+
+    def next(self):
+        """ get next transformed sample
+        """
+        if self._epoch < 0:
+            self.reset()
+
+        if self.drained():
+            raise StopIteration()
+
+        while True:
+            sample = self._outq.get()
+            if isinstance(sample, EndSignal):
+                self._stopped_consumers += 1
+                if sample.errno != 0:
+                    logger.warn("consumer failed with error: {}".format(
+                        sample.errmsg))
+
+                if self._stopped_consumers < len(self._consumers):
+                    self._inq.put(sample)
+                else:
+                    raise ValueError("all consumers exited, no more samples")
+            else:
+                self._consumed += 1
+                return sample
+
+    def reset(self):
+        """ reset for a new epoch of samples
+        """
+        if self._epoch < 0:
+            self._epoch = 0
+            for p in self._consumers:
+                p.start()
+            self._producer.start()
+        else:
+            if not self.drained():
+                logger.warn("do not reset before epoch[%d] finishes".format(
+                    self._epoch))
+                self._produced = self._produced - self._consumed
+            else:
+                self._produced = 0
+
+            self._epoch += 1
+
+        assert self._stopped_consumers == 0, "some consumers already exited," \
+            + " cannot start another epoch"
+
+        self._source.reset()
+        self._consumed = 0
+        self._feeding_ev.set()
+
+
+# FIXME(dengkaipeng): fix me if you have better impliment
+# handle terminate reader process, do not print stack frame
+def _reader_exit(signum, frame):
+    logger.debug("Reader process exit.")
+    sys.exit()
+
+
+signal.signal(signal.SIGTERM, _reader_exit)
--- a/ppdet/data/transform/post_map.py
+++ b/ppdet/data/transform/post_map.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import logging
+import cv2
+import numpy as np
+
+logger = logging.getLogger(__name__)
+
+
+def build_post_map(coarsest_stride=1,
+                   is_padding=False,
+                   random_shapes=[],
+                   multi_scales=[],
+                   use_padded_im_info=False):
+    """
+    Build a mapper for post-processing batches
+
+    Args:
+        config (dict of parameters):
+          {
+            coarsest_stride (int): stride of the coarsest FPN level
+            is_padding (bool): whether to padding in minibatch
+            random_shapes: (list of int): resize to image to random
+                                          shapes, [] for not resize.
+            multi_scales: (list of int): resize image by random
+                                          scales, [] for not resize.
+          }
+    Returns:
+        a mapper function which accept one argument 'batch' and
+        return the processed result
+    """
+
+    def padding_minibatch(batch_data):
+        if len(batch_data) == 1 and coarsest_stride == 1:
+            return batch_data
+        max_shape = np.array([data[0].shape for data in batch_data]).max(axis=0)
+        if coarsest_stride > 1:
+            max_shape[1] = int(
+                np.ceil(max_shape[1] / coarsest_stride) * coarsest_stride)
+            max_shape[2] = int(
+                np.ceil(max_shape[2] / coarsest_stride) * coarsest_stride)
+        padding_batch = []
+        for data in batch_data:
+            im_c, im_h, im_w = data[0].shape[:]
+            padding_im = np.zeros(
+                (im_c, max_shape[1], max_shape[2]), dtype=np.float32)
+            padding_im[:, :im_h, :im_w] = data[0]
+            if use_padded_im_info:
+                data[1][:2] = max_shape[1:3]
+            padding_batch.append((padding_im, ) + data[1:])
+        return padding_batch
+
+    def random_shape(batch_data):
+        # For YOLO: gt_bbox is normalized, is scale invariant.
+        shape = np.random.choice(random_shapes)
+        scaled_batch = []
+        h, w = batch_data[0][0].shape[1:3]
+        scale_x = float(shape) / w
+        scale_y = float(shape) / h
+        for data in batch_data:
+            im = cv2.resize(
+                data[0].transpose((1, 2, 0)),
+                None,
+                None,
+                fx=scale_x,
+                fy=scale_y,
+                interpolation=cv2.INTER_NEAREST)
+            scaled_batch.append((im.transpose(2, 0, 1), ) + data[1:])
+        return scaled_batch
+
+    def multi_scale_resize(batch_data):
+        # For RCNN: image shape in record in im_info.
+        scale = np.random.choice(multi_scales)
+        scaled_batch = []
+        for data in batch_data:
+            im = cv2.resize(
+                data[0].transpose((1, 2, 0)),
+                None,
+                None,
+                fx=scale,
+                fy=scale,
+                interpolation=cv2.INTER_NEAREST)
+            im_info = [im.shape[:2], scale]
+            scaled_batch.append((im.transpose(2, 0, 1), im_info) + data[2:])
+        return scaled_batch
+
+    def _mapper(batch_data):
+        try:
+            if is_padding:
+                batch_data = padding_minibatch(batch_data)
+            if len(random_shapes) > 0:
+                batch_data = random_shape(batch_data)
+            if len(multi_scales) > 0:
+                batch_data = multi_scale_resize(batch_data)
+        except Exception as e:
+            errmsg = "post-process failed with error: " + str(e)
+            logger.warn(errmsg)
+            raise e
+
+        return batch_data
+
+    return _mapper
--- a/ppdet/data/transform/shared_queue/__init__.py
+++ b/ppdet/data/transform/shared_queue/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+__all__ = ['SharedBuffer', 'SharedMemoryMgr', 'SharedQueue']
+
+from .sharedmemory import SharedBuffer
+from .sharedmemory import SharedMemoryMgr
+from .sharedmemory import SharedMemoryError
+from .queue import SharedQueue
--- a/ppdet/data/transform/shared_queue/queue.py
+++ b/ppdet/data/transform/shared_queue/queue.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import sys
+import six
+if six.PY3:
+    import pickle
+    from io import BytesIO as StringIO
+else:
+    import cPickle as pickle
+    from cStringIO import StringIO
+
+import logging
+import traceback
+import multiprocessing as mp
+from multiprocessing.queues import Queue
+from .sharedmemory import SharedMemoryMgr
+
+logger = logging.getLogger(__name__)
+
+
+class SharedQueueError(ValueError):
+    """ SharedQueueError
+    """
+    pass
+
+
+class SharedQueue(Queue):
+    """ a Queue based on shared memory to communicate data between Process,
+        and it's interface is compatible with 'multiprocessing.queues.Queue'
+    """
+
+    def __init__(self, maxsize=0, mem_mgr=None, memsize=None, pagesize=None):
+        """ init
+        """
+        if six.PY3:
+            super(SharedQueue, self).__init__(maxsize, ctx=mp.get_context())
+        else:
+            super(SharedQueue, self).__init__(maxsize)
+
+        if mem_mgr is not None:
+            self._shared_mem = mem_mgr
+        else:
+            self._shared_mem = SharedMemoryMgr(
+                capacity=memsize, pagesize=pagesize)
+
+    def put(self, obj, **kwargs):
+        """ put an object to this queue
+        """
+        obj = pickle.dumps(obj, -1)
+        buff = None
+        try:
+            buff = self._shared_mem.malloc(len(obj))
+            buff.put(obj)
+            super(SharedQueue, self).put(buff, **kwargs)
+        except Exception as e:
+            stack_info = traceback.format_exc()
+            err_msg = 'failed to put a element to SharedQueue '\
+                'with stack info[%s]' % (stack_info)
+            logger.warn(err_msg)
+
+            if buff is not None:
+                buff.free()
+            raise e
+
+    def get(self, **kwargs):
+        """ get an object from this queue
+        """
+        buff = None
+        try:
+            buff = super(SharedQueue, self).get(**kwargs)
+            data = buff.get()
+            return pickle.load(StringIO(data))
+        except Exception as e:
+            stack_info = traceback.format_exc()
+            err_msg = 'failed to get element from SharedQueue '\
+                        'with stack info[%s]' % (stack_info)
+            logger.warn(err_msg)
+            raise e
+        finally:
+            if buff is not None:
+                buff.free()
+
+    def release(self):
+        self._shared_mem.release()
+        self._shared_mem = None
--- a/ppdet/data/transform/shared_queue/sharedmemory.py
+++ b/ppdet/data/transform/shared_queue/sharedmemory.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# utils for memory management which is allocated on sharedmemory,
+#    note that these structures may not be thread-safe
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import os
+import time
+import math
+import struct
+import sys
+import six
+
+if six.PY3:
+    import pickle
+else:
+    import cPickle as pickle
+
+import json
+import uuid
+import random
+import numpy as np
+import weakref
+import logging
+from multiprocessing import Lock
+from multiprocessing import RawArray
+
+logger = logging.getLogger(__name__)
+
+
+class SharedMemoryError(ValueError):
+    """ SharedMemoryError
+    """
+    pass
+
+
+class SharedBufferError(SharedMemoryError):
+    """ SharedBufferError
+    """
+    pass
+
+
+class MemoryFullError(SharedMemoryError):
+    """ MemoryFullError
+    """
+
+    def __init__(self, errmsg=''):
+        super(MemoryFullError, self).__init__()
+        self.errmsg = errmsg
+
+
+def memcopy(dst, src, offset=0, length=None):
+    """ copy data from 'src' to 'dst' in bytes
+    """
+    length = length if length is not None else len(src)
+    assert type(dst) == np.ndarray, 'invalid type for "dst" in memcopy'
+    if type(src) is not np.ndarray:
+        if type(src) is str and six.PY3:
+            src = src.encode()
+        src = np.frombuffer(src, dtype='uint8', count=len(src))
+
+    dst[:] = src[offset:offset + length]
+
+
+class SharedBuffer(object):
+    """ Buffer allocated from SharedMemoryMgr, and it stores data on shared memory
+
+        note that: 
+            every instance of this should be freed explicitely by calling 'self.free'
+    """
+
+    def __init__(self, owner, capacity, pos, size=0, alloc_status=''):
+        """ Init
+
+            Args:
+                owner (str): manager to own this buffer
+                capacity (int): capacity in bytes for this buffer
+                pos (int): page position in shared memory
+                size (int): bytes already used
+                alloc_status (str): debug info about allocator when allocate this
+        """
+        self._owner = owner
+        self._cap = capacity
+        self._pos = pos
+        self._size = size
+        self._alloc_status = alloc_status
+        assert self._pos >= 0 and self._cap > 0, \
+            "invalid params[%d:%d] to construct SharedBuffer" \
+            % (self._pos, self._cap)
+
+    def owner(self):
+        """ get owner
+        """
+        return SharedMemoryMgr.get_mgr(self._owner)
+
+    def put(self, data, override=False):
+        """ put data to this buffer
+
+        Args:
+            data (str): data to be stored in this buffer
+
+        Returns:
+            None
+
+        Raises:
+            SharedMemoryError when not enough space in this buffer
+        """
+        assert type(data) in [str, bytes], \
+            'invalid type[%s] for SharedBuffer::put' % (str(type(data)))
+        if self._size > 0 and not override:
+            raise SharedBufferError('already has already been setted before')
+
+        if self.capacity() < len(data):
+            raise SharedBufferError('data[%d] is larger than size of buffer[%s]'\
+                % (len(data), str(self)))
+
+        self.owner().put_data(self, data)
+        self._size = len(data)
+
+    def get(self, offset=0, size=None, no_copy=True):
+        """ get the data stored this buffer
+
+        Args:
+            offset (int): position for the start point to 'get'
+            size (int): size to get
+
+        Returns:
+            data (np.ndarray('uint8')): user's data in numpy 
+                which is passed in by 'put'
+            None: if no data stored in
+        """
+        offset = offset if offset >= 0 else self._size + offset
+        if self._size <= 0:
+            return None
+
+        size = self._size if size is None else size
+        assert offset + size <= self._cap, 'invalid offset[%d] '\
+            'or size[%d] for capacity[%d]' % (offset, size, self._cap)
+        return self.owner().get_data(self, offset, size, no_copy=no_copy)
+
+    def size(self):
+        """ bytes of used memory
+        """
+        return self._size
+
+    def resize(self, size):
+        """ resize the used memory to 'size', should not be greater than capacity
+        """
+        assert size >= 0 and size <= self._cap, \
+            "invalid size[%d] for resize" % (size)
+
+        self._size = size
+
+    def capacity(self):
+        """ size of allocated memory
+        """
+        return self._cap
+
+    def __str__(self):
+        """ human readable format
+        """
+        return "SharedBuffer(owner:%s, pos:%d, size:%d, "\
+            "capacity:%d, alloc_status:[%s], pid:%d)" \
+            % (str(self._owner), self._pos, self._size, \
+            self._cap, self._alloc_status, os.getpid())
+
+    def free(self):
+        """ free this buffer to it's owner
+        """
+        if self._owner is not None:
+            self.owner().free(self)
+            self._owner = None
+            self._cap = 0
+            self._pos = -1
+            self._size = 0
+            return True
+        else:
+            return False
+
+
+class PageAllocator(object):
+    """ allocator used to malloc and free shared memory which
+        is split into pages
+    """
+    s_allocator_header = 12
+
+    def __init__(self, base, total_pages, page_size):
+        """ init
+        """
+        self._magic_num = 1234321000 + random.randint(100, 999)
+        self._base = base
+        self._total_pages = total_pages
+        self._page_size = page_size
+
+        header_pages = int(
+            math.ceil((total_pages + self.s_allocator_header) / page_size))
+
+        self._header_pages = header_pages
+        self._free_pages = total_pages - header_pages
+        self._header_size = self._header_pages * page_size
+        self._reset()
+
+    def _dump_alloc_info(self, fname):
+        hpages, tpages, pos, used = self.header()
+
+        start = self.s_allocator_header
+        end = start + self._page_size * hpages
+        alloc_flags = self._base[start:end].tostring()
+        info = {
+            'magic_num': self._magic_num,
+            'header_pages': hpages,
+            'total_pages': tpages,
+            'pos': pos,
+            'used': used
+        }
+        info['alloc_flags'] = alloc_flags
+        fname = fname + '.' + str(uuid.uuid4())[:6]
+        with open(fname, 'wb') as f:
+            f.write(pickle.dumps(info, -1))
+        logger.warn('dump alloc info to file[%s]' % (fname))
+
+    def _reset(self):
+        alloc_page_pos = self._header_pages
+        used_pages = self._header_pages
+        header_info = struct.pack(
+            str('III'), self._magic_num, alloc_page_pos, used_pages)
+        assert len(header_info) == self.s_allocator_header, \
+            'invalid size of header_info'
+
+        memcopy(self._base[0:self.s_allocator_header], header_info)
+        self.set_page_status(0, self._header_pages, '1')
+        self.set_page_status(self._header_pages, self._free_pages, '0')
+
+    def header(self):
+        """ get header info of this allocator
+        """
+        header_str = self._base[0:self.s_allocator_header].tostring()
+        magic, pos, used = struct.unpack(str('III'), header_str)
+
+        assert magic == self._magic_num, \
+            'invalid header magic[%d] in shared memory' % (magic)
+        return self._header_pages, self._total_pages, pos, used
+
+    def empty(self):
+        """ are all allocatable pages available
+        """
+        header_pages, pages, pos, used = self.header()
+        return header_pages == used
+
+    def full(self):
+        """ are all allocatable pages used
+        """
+        header_pages, pages, pos, used = self.header()
+        return header_pages + used == pages
+
+    def __str__(self):
+        header_pages, pages, pos, used = self.header()
+        desc = '{page_info[magic:%d,total:%d,used:%d,header:%d,alloc_pos:%d,pagesize:%d]}' \
+            % (self._magic_num, pages, used, header_pages, pos, self._page_size)
+        return 'PageAllocator:%s' % (desc)
+
+    def set_alloc_info(self, alloc_pos, used_pages):
+        """ set allocating position to new value
+        """
+        memcopy(self._base[4:12], struct.pack(str('II'), alloc_pos, used_pages))
+
+    def set_page_status(self, start, page_num, status):
+        """ set pages from 'start' to 'end' with new same status 'status'
+        """
+        assert status in ['0', '1'], 'invalid status[%s] for page status '\
+            'in allocator[%s]' % (status, str(self))
+        start += self.s_allocator_header
+        end = start + page_num
+        assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
+            'in allocator[%s]' % (end, str(self))
+        memcopy(self._base[start:end], str(status * page_num))
+
+    def get_page_status(self, start, page_num, ret_flag=False):
+        start += self.s_allocator_header
+        end = start + page_num
+        assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
+            'in allocator[%s]' % (end, str(self))
+        status = self._base[start:end].tostring().decode()
+        if ret_flag:
+            return status
+
+        zero_num = status.count('0')
+        if zero_num == 0:
+            return (page_num, 1)
+        else:
+            return (zero_num, 0)
+
+    def malloc_page(self, page_num):
+        header_pages, pages, pos, used = self.header()
+        end = pos + page_num
+        if end > pages:
+            pos = self._header_pages
+            end = pos + page_num
+
+        start_pos = pos
+        flags = ''
+        while True:
+            # maybe flags already has some '0' pages,
+            # so just check 'page_num - len(flags)' pages
+            flags += self.get_page_status(
+                pos, page_num - len(flags), ret_flag=True)
+
+            if flags.count('0') == page_num:
+                break
+
+            # not found enough pages, so shift to next few pages
+            free_pos = flags.rfind('1') + 1
+            flags = flags[free_pos:]
+
+            pos += free_pos
+            end = pos + page_num
+            if end > pages:
+                pos = self._header_pages
+                end = pos + page_num
+                flags = ''
+
+            # not found available pages after scan all pages
+            if pos <= start_pos and end >= start_pos:
+                logger.debug('not found available pages after scan all pages')
+                break
+
+        page_status = (flags.count('0'), 0)
+        if page_status != (page_num, 0):
+            free_pages = self._total_pages - used
+            if free_pages == 0:
+                err_msg = 'all pages have been used:%s' % (str(self))
+            else:
+                err_msg = 'not found available pages with page_status[%s] '\
+                    'and %d free pages' % (str(page_status), free_pages)
+            err_msg = 'failed to malloc %d pages at pos[%d] for reason[%s] and allocator status[%s]' \
+                % (page_num, pos, err_msg, str(self))
+            raise MemoryFullError(err_msg)
+
+        self.set_page_status(pos, page_num, '1')
+        used += page_num
+        self.set_alloc_info(end, used)
+
+        assert self.get_page_status(pos, page_num) == (page_num, 1), \
+            'faild to validate the page status'
+        return pos
+
+    def free_page(self, start, page_num):
+        """ free 'page_num' pages start from 'start'
+        """
+        page_status = self.get_page_status(start, page_num)
+        assert page_status == (page_num, 1), \
+            'invalid status[%s] when free [%d, %d]' \
+                % (str(page_status), start, page_num)
+        self.set_page_status(start, page_num, '0')
+        _, _, pos, used = self.header()
+        used -= page_num
+        self.set_alloc_info(pos, used)
+
+
+DEFAULT_SHARED_MEMORY_SIZE = 1024 * 1024 * 1024
+
+
+class SharedMemoryMgr(object):
+    """ manage a continouse block of memory, provide
+        'malloc' to allocate new buffer, and 'free' to free buffer
+    """
+    s_memory_mgrs = weakref.WeakValueDictionary()
+    s_mgr_num = 0
+    s_log_statis = False
+
+    @classmethod
+    def get_mgr(cls, id):
+        """ get a SharedMemoryMgr with size of 'capacity'
+        """
+        assert id in cls.s_memory_mgrs, 'invalid id[%s] for memory managers' % (
+            id)
+        return cls.s_memory_mgrs[id]
+
+    def __init__(self, capacity=None, pagesize=None):
+        """ init
+        """
+        logger.debug('create SharedMemoryMgr')
+
+        pagesize = 64 * 1024 if pagesize is None else pagesize
+        assert type(pagesize) is int, "invalid type of pagesize[%s]" \
+            % (str(pagesize))
+
+        capacity = DEFAULT_SHARED_MEMORY_SIZE if capacity is None else capacity
+        assert type(capacity) is int, "invalid type of capacity[%s]" \
+            % (str(capacity))
+
+        assert capacity > 0, '"size of shared memory should be greater than 0'
+        self._released = False
+        self._cap = capacity
+        self._page_size = pagesize
+
+        assert self._cap % self._page_size == 0, \
+            "capacity[%d] and pagesize[%d] are not consistent" \
+            % (self._cap, self._page_size)
+        self._total_pages = self._cap // self._page_size
+
+        self._pid = os.getpid()
+        SharedMemoryMgr.s_mgr_num += 1
+        self._id = self._pid * 100 + SharedMemoryMgr.s_mgr_num
+        SharedMemoryMgr.s_memory_mgrs[self._id] = self
+        self._locker = Lock()
+        self._setup()
+
+    def _setup(self):
+        self._shared_mem = RawArray('c', self._cap)
+        self._base = np.frombuffer(
+            self._shared_mem, dtype='uint8', count=self._cap)
+        self._locker.acquire()
+        try:
+            self._allocator = PageAllocator(self._base, self._total_pages,
+                                            self._page_size)
+        finally:
+            self._locker.release()
+
+    def malloc(self, size, wait=True):
+        """ malloc a new SharedBuffer
+
+        Args:
+            size (int): buffer size to be malloc
+            wait (bool): whether to wait when no enough memory
+
+        Returns:
+            SharedBuffer
+
+        Raises:
+            SharedMemoryError when not found available memory
+        """
+        page_num = int(math.ceil(size / self._page_size))
+        size = page_num * self._page_size
+
+        start = None
+        ct = 0
+        errmsg = ''
+        while True:
+            self._locker.acquire()
+            try:
+                start = self._allocator.malloc_page(page_num)
+                alloc_status = str(self._allocator)
+            except MemoryFullError as e:
+                start = None
+                errmsg = e.errmsg
+                if not wait:
+                    raise e
+            finally:
+                self._locker.release()
+
+            if start is None:
+                time.sleep(0.1)
+                if ct % 100 == 0:
+                    logger.warn('not enough space for reason[%s]' % (errmsg))
+
+                ct += 1
+            else:
+                break
+
+        return SharedBuffer(self._id, size, start, alloc_status=alloc_status)
+
+    def free(self, shared_buf):
+        """ free a SharedBuffer
+
+        Args:
+            shared_buf (SharedBuffer): buffer to be freed
+
+        Returns:
+            None
+
+        Raises:
+            SharedMemoryError when failed to release this buffer
+        """
+        assert shared_buf._owner == self._id, "invalid shared_buf[%s] "\
+            "for it's not allocated from me[%s]" % (str(shared_buf), str(self))
+        cap = shared_buf.capacity()
+        start_page = shared_buf._pos
+        page_num = cap // self._page_size
+
+        #maybe we don't need this lock here
+        self._locker.acquire()
+        try:
+            self._allocator.free_page(start_page, page_num)
+        finally:
+            self._locker.release()
+
+    def put_data(self, shared_buf, data):
+        """  fill 'data' into 'shared_buf'
+        """
+        assert len(data) <= shared_buf.capacity(), 'too large data[%d] '\
+            'for this buffer[%s]' % (len(data), str(shared_buf))
+        start = shared_buf._pos * self._page_size
+        end = start + len(data)
+        assert start >= 0 and end <= self._cap, "invalid start "\
+            "position[%d] when put data to buff:%s" % (start, str(shared_buf))
+        self._base[start:end] = np.frombuffer(data, 'uint8', len(data))
+
+    def get_data(self, shared_buf, offset, size, no_copy=True):
+        """ extract 'data' from 'shared_buf' in range [offset, offset + size)
+        """
+        start = shared_buf._pos * self._page_size
+        start += offset
+        if no_copy:
+            return self._base[start:start + size]
+        else:
+            return self._base[start:start + size].tostring()
+
+    def __str__(self):
+        return 'SharedMemoryMgr:{id:%d, %s}' % (self._id, str(self._allocator))
+
+    def __del__(self):
+        if SharedMemoryMgr.s_log_statis:
+            logger.info('destroy [%s]' % (self))
+
+        if not self._released and not self._allocator.empty():
+            logger.warn('not empty when delete this SharedMemoryMgr[%s]' %
+                        (self))
+        else:
+            self._released = True
+
+        if self._id in SharedMemoryMgr.s_memory_mgrs:
+            del SharedMemoryMgr.s_memory_mgrs[self._id]
+            SharedMemoryMgr.s_mgr_num -= 1
--- a/ppdet/data/transform/transformer.py
+++ b/ppdet/data/transform/transformer.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import functools
+import collections
+from ..dataset import Dataset
+
+
+class ProxiedDataset(Dataset):
+    """proxy method called to 'self._ds' when if not defined"""
+
+    def __init__(self, ds):
+        super(ProxiedDataset, self).__init__()
+        self._ds = ds
+        methods = filter(lambda k: not k.startswith('_'),
+                         Dataset.__dict__.keys())
+        for m in methods:
+            func = functools.partial(self._proxy_method, getattr(self, m))
+            setattr(self, m, func)
+
+    def _proxy_method(self, func, *args, **kwargs):
+        """
+        proxy call to 'func', if not available then call self._ds.xxx
+        whose name is the same with func.__name__
+        """
+        method = func.__name__
+        try:
+            return func(*args, **kwargs)
+        except NotImplementedError:
+            ds_func = getattr(self._ds, method)
+            return ds_func(*args, **kwargs)
+
+
+class MappedDataset(ProxiedDataset):
+    def __init__(self, ds, mapper):
+        super(MappedDataset, self).__init__(ds)
+        self._ds = ds
+        self._mapper = mapper
+
+    def next(self):
+        sample = self._ds.next()
+        return self._mapper(sample)
+
+
+class BatchedDataset(ProxiedDataset):
+    """
+    Batching samples
+
+    Args:
+        ds (instance of Dataset): dataset to be batched
+        batchsize (int): sample number for each batch
+        drop_last (bool): drop last samples when not enough for one batch
+    """
+
+    def __init__(self, ds, batchsize, drop_last=False):
+        super(BatchedDataset, self).__init__(ds)
+        self._batchsz = batchsize
+        self._drop_last = drop_last
+
+    def next(self):
+        """proxy to self._ds.next"""
+
+        def empty(x):
+            if isinstance(x, np.ndarray) and x.size == 0:
+                return True
+            elif isinstance(x, collections.Sequence) and len(x) == 0:
+                return True
+            else:
+                return False
+
+        def has_empty(items):
+            if any(x is None for x in items):
+                return True
+            if any(empty(x) for x in items):
+                return True
+            return False
+
+        batch = []
+        for _ in range(self._batchsz):
+            try:
+                out = self._ds.next()
+                while has_empty(out):
+                    out = self._ds.next()
+                batch.append(out)
+            except StopIteration:
+                if not self._drop_last and len(batch) > 0:
+                    return batch
+                else:
+                    raise StopIteration
+        return batch
--- a/ppdet/modeling/__init__.py
+++ b/ppdet/modeling/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+# XXX for triggering decorators
+from . import anchor_heads
+from . import architectures
+from . import backbones
+from . import roi_extractors
+from . import roi_heads
+from . import ops
+from . import target_assigners
+
+from .anchor_heads import *
+from .architectures import *
+from .backbones import *
+from .roi_extractors import *
+from .roi_heads import *
+from .ops import *
+from .target_assigners import *
--- a/ppdet/modeling/anchor_heads/__init__.py
+++ b/ppdet/modeling/anchor_heads/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+from . import rpn_head
+from . import yolo_head
+from . import retina_head
+
+from .rpn_head import *
+from .yolo_head import *
+from .retina_head import *
--- a/ppdet/modeling/anchor_heads/retina_head.py
+++ b/ppdet/modeling/anchor_heads/retina_head.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Normal, Constant
+from paddle.fluid.regularizer import L2Decay
+from ppdet.modeling.ops import (AnchorGenerator, RetinaTargetAssign,
+                                RetinaOutputDecoder)
+
+from ppdet.core.workspace import register, serializable
+
+__all__ = ['RetinaHead']
+
+
+@register
+class RetinaHead(object):
+    """
+    Retina Head
+
+    Args:
+        anchor_generator (object): `AnchorGenerator` instance
+        target_assign (object): `RetinaTargetAssign` instance
+        output_decoder (object): `RetinaOutputDecoder` instance
+        num_convs_per_octave (int): Number of convolution layers in each octave
+        num_chan (int): Number of octave output channels
+        max_level (int): Highest level of FPN output
+        min_level (int): Lowest level of FPN output
+        prior_prob (float): Used to set the bias init for the class prediction layer
+        base_scale (int): Anchors are generated based on this scale
+        num_scales_per_octave (int): Number of anchor scales per octave
+        num_classes (int): Number of classes
+        gamma (float): The parameter in focal loss
+        alpha (float): The parameter in focal loss
+        sigma (float): The parameter in smooth l1 loss
+    """
+    __inject__ = ['anchor_generator', 'target_assign', 'output_decoder']
+
+    def __init__(self,
+                 anchor_generator=AnchorGenerator().__dict__,
+                 target_assign=RetinaTargetAssign().__dict__,
+                 output_decoder=RetinaOutputDecoder().__dict__,
+                 num_convs_per_octave=4,
+                 num_chan=256,
+                 max_level=7,
+                 min_level=3,
+                 prior_prob=0.01,
+                 base_scale=4,
+                 num_scales_per_octave=3,
+                 num_classes=81,
+                 gamma=2.0,
+                 alpha=0.25,
+                 sigma=3.0151134457776365):
+        self.anchor_generator = anchor_generator
+        self.target_assign = target_assign
+        self.output_decoder = output_decoder
+        self.num_convs_per_octave = num_convs_per_octave
+        self.num_chan = num_chan
+        self.max_level = max_level
+        self.min_level = min_level
+        self.prior_prob = prior_prob
+        self.base_scale = base_scale
+        self.num_scales_per_octave = num_scales_per_octave
+        self.num_classes = num_classes
+        self.gamma = gamma
+        self.alpha = alpha
+        self.sigma = sigma
+        if isinstance(anchor_generator, dict):
+            self.anchor_generator = AnchorGenerator(**anchor_generator)
+        if isinstance(target_assign, dict):
+            self.target_assign = RetinaTargetAssign(**target_assign)
+        if isinstance(output_decoder, dict):
+            self.output_decoder = RetinaOutputDecoder(**output_decoder)
+
+    def _class_subnet(self, body_feats, spatial_scale):
+        """
+        Get class predictions of all level FPN level.
+
+        Args:
+            fpn_dict(dict): A dictionary represents the output of FPN with
+                their name.
+            spatial_scale(list): A list of multiplicative spatial scale factor.
+
+        Returns:
+            cls_pred_input(list): Class prediction of all input fpn levels.
+        """
+        assert len(body_feats) == self.max_level - self.min_level + 1
+        fpn_name_list = list(body_feats.keys())
+        cls_pred_list = []
+        for lvl in range(self.min_level, self.max_level + 1):
+            fpn_name = fpn_name_list[self.max_level - lvl]
+            subnet_blob = body_feats[fpn_name]
+            for i in range(self.num_convs_per_octave):
+                conv_name = 'retnet_cls_conv_n{}_fpn{}'.format(i, lvl)
+                conv_share_name = 'retnet_cls_conv_n{}_fpn{}'.format(
+                    i, self.min_level)
+                subnet_blob_in = subnet_blob
+                subnet_blob = fluid.layers.conv2d(
+                    input=subnet_blob_in,
+                    num_filters=self.num_chan,
+                    filter_size=3,
+                    stride=1,
+                    padding=1,
+                    act='relu',
+                    name=conv_name,
+                    param_attr=ParamAttr(
+                        name=conv_share_name + '_w',
+                        initializer=Normal(
+                            loc=0., scale=0.01)),
+                    bias_attr=ParamAttr(
+                        name=conv_share_name + '_b',
+                        learning_rate=2.,
+                        regularizer=L2Decay(0.)))
+
+            # class prediction
+            cls_name = 'retnet_cls_pred_fpn{}'.format(lvl)
+            cls_share_name = 'retnet_cls_pred_fpn{}'.format(self.min_level)
+            num_anchors = self.num_scales_per_octave * len(
+                self.anchor_generator.aspect_ratios)
+            cls_dim = num_anchors * (self.num_classes - 1)
+            # bias initialization: b = -log((1 - pai) / pai)
+            bias_init = float(-np.log((1 - self.prior_prob) / self.prior_prob))
+            out_cls = fluid.layers.conv2d(
+                input=subnet_blob,
+                num_filters=cls_dim,
+                filter_size=3,
+                stride=1,
+                padding=1,
+                act=None,
+                name=cls_name,
+                param_attr=ParamAttr(
+                    name=cls_share_name + '_w',
+                    initializer=Normal(
+                        loc=0., scale=0.01)),
+                bias_attr=ParamAttr(
+                    name=cls_share_name + '_b',
+                    initializer=Constant(value=bias_init),
+                    learning_rate=2.,
+                    regularizer=L2Decay(0.)))
+            cls_pred_list.append(out_cls)
+
+        return cls_pred_list
+
+    def _bbox_subnet(self, body_feats, spatial_scale):
+        """
+        Get bounding box predictions of all level FPN level.
+
+        Args:
+            fpn_dict(dict): A dictionary represents the output of FPN with
+                their name.
+            spatial_scale(list): A list of multiplicative spatial scale factor.
+
+        Returns:
+            bbox_pred_input(list): Bounding box prediction of all input fpn
+                levels.
+        """
+        assert len(body_feats) == self.max_level - self.min_level + 1
+        fpn_name_list = list(body_feats.keys())
+        bbox_pred_list = []
+        for lvl in range(self.min_level, self.max_level + 1):
+            fpn_name = fpn_name_list[self.max_level - lvl]
+            subnet_blob = body_feats[fpn_name]
+            for i in range(self.num_convs_per_octave):
+                conv_name = 'retnet_bbox_conv_n{}_fpn{}'.format(i, lvl)
+                conv_share_name = 'retnet_bbox_conv_n{}_fpn{}'.format(
+                    i, self.min_level)
+                subnet_blob_in = subnet_blob
+                subnet_blob = fluid.layers.conv2d(
+                    input=subnet_blob_in,
+                    num_filters=self.num_chan,
+                    filter_size=3,
+                    stride=1,
+                    padding=1,
+                    act='relu',
+                    name=conv_name,
+                    param_attr=ParamAttr(
+                        name=conv_share_name + '_w',
+                        initializer=Normal(
+                            loc=0., scale=0.01)),
+                    bias_attr=ParamAttr(
+                        name=conv_share_name + '_b',
+                        learning_rate=2.,
+                        regularizer=L2Decay(0.)))
+
+            # bbox prediction
+            bbox_name = 'retnet_bbox_pred_fpn{}'.format(lvl)
+            bbox_share_name = 'retnet_bbox_pred_fpn{}'.format(self.min_level)
+            num_anchors = self.num_scales_per_octave * len(
+                self.anchor_generator.aspect_ratios)
+            bbox_dim = num_anchors * 4
+            out_bbox = fluid.layers.conv2d(
+                input=subnet_blob,
+                num_filters=bbox_dim,
+                filter_size=3,
+                stride=1,
+                padding=1,
+                act=None,
+                name=bbox_name,
+                param_attr=ParamAttr(
+                    name=bbox_share_name + '_w',
+                    initializer=Normal(
+                        loc=0., scale=0.01)),
+                bias_attr=ParamAttr(
+                    name=bbox_share_name + '_b',
+                    learning_rate=2.,
+                    regularizer=L2Decay(0.)))
+            bbox_pred_list.append(out_bbox)
+        return bbox_pred_list
+
+    def _anchor_generate(self, body_feats, spatial_scale):
+        """
+        Get anchor boxes of all level FPN level.
+
+        Args:
+            fpn_dict(dict): A dictionary represents the output of FPN with
+                their name.
+            spatial_scale(list): A list of multiplicative spatial scale factor.
+
+        Return:
+            anchor_input(list): Anchors of all input fpn levels with shape of.
+            anchor_var_input(list): Anchor variance of all input fpn levels with
+                shape.
+        """
+        assert len(body_feats) == self.max_level - self.min_level + 1
+        fpn_name_list = list(body_feats.keys())
+        anchor_list = []
+        anchor_var_list = []
+        for lvl in range(self.min_level, self.max_level + 1):
+            anchor_sizes = []
+            stride = int(1 / spatial_scale[self.max_level - lvl])
+            for octave in range(self.num_scales_per_octave):
+                anchor_size = stride * (
+                    2**(float(octave) /
+                        float(self.num_scales_per_octave))) * self.base_scale
+                anchor_sizes.append(anchor_size)
+            fpn_name = fpn_name_list[self.max_level - lvl]
+            anchor, anchor_var = self.anchor_generator(
+                input=body_feats[fpn_name],
+                anchor_sizes=anchor_sizes,
+                aspect_ratios=self.anchor_generator.aspect_ratios,
+                stride=[stride, stride])
+            anchor_list.append(anchor)
+            anchor_var_list.append(anchor_var)
+        return anchor_list, anchor_var_list
+
+    def _get_output(self, body_feats, spatial_scale):
+        """
+        Get class, bounding box predictions and anchor boxes of all level FPN level.
+
+        Args:
+            fpn_dict(dict): A dictionary represents the output of FPN with
+                their name.
+            spatial_scale(list): A list of multiplicative spatial scale factor.
+
+        Returns:
+            cls_pred_input(list): Class prediction of all input fpn levels.
+            bbox_pred_input(list): Bounding box prediction of all input fpn
+                levels.
+            anchor_input(list): Anchors of all input fpn levels with shape of.
+            anchor_var_input(list): Anchor variance of all input fpn levels with
+                shape.
+        """
+        assert len(body_feats) == self.max_level - self.min_level + 1
+        # class subnet
+        cls_pred_list = self._class_subnet(body_feats, spatial_scale)
+        # bbox subnet
+        bbox_pred_list = self._bbox_subnet(body_feats, spatial_scale)
+        #generate anchors
+        anchor_list, anchor_var_list = self._anchor_generate(body_feats,
+                                                             spatial_scale)
+        cls_pred_reshape_list = []
+        bbox_pred_reshape_list = []
+        anchor_reshape_list = []
+        anchor_var_reshape_list = []
+        for i in range(self.max_level - self.min_level + 1):
+            cls_pred_transpose = fluid.layers.transpose(
+                cls_pred_list[i], perm=[0, 2, 3, 1])
+            cls_pred_reshape = fluid.layers.reshape(
+                cls_pred_transpose, shape=(0, -1, self.num_classes - 1))
+            bbox_pred_transpose = fluid.layers.transpose(
+                bbox_pred_list[i], perm=[0, 2, 3, 1])
+            bbox_pred_reshape = fluid.layers.reshape(
+                bbox_pred_transpose, shape=(0, -1, 4))
+            anchor_reshape = fluid.layers.reshape(anchor_list[i], shape=(-1, 4))
+            anchor_var_reshape = fluid.layers.reshape(
+                anchor_var_list[i], shape=(-1, 4))
+            cls_pred_reshape_list.append(cls_pred_reshape)
+            bbox_pred_reshape_list.append(bbox_pred_reshape)
+            anchor_reshape_list.append(anchor_reshape)
+            anchor_var_reshape_list.append(anchor_var_reshape)
+        output = {}
+        output['cls_pred'] = cls_pred_reshape_list
+        output['bbox_pred'] = bbox_pred_reshape_list
+        output['anchor'] = anchor_reshape_list
+        output['anchor_var'] = anchor_var_reshape_list
+        return output
+
+    def get_prediction(self, body_feats, spatial_scale, im_info):
+        """
+        Get prediction bounding box in test stage.
+
+        Args:
+            fpn_dict(dict): A dictionary represents the output of FPN with
+                their name.
+            spatial_scale(list): A list of multiplicative spatial scale factor.
+            im_info (Variable): A 2-D LoDTensor with shape [B, 3]. B is the
+                number of input images, each element consists of im_height,
+                im_width, im_scale.
+
+        Returns:
+            pred_result(Variable): Prediction result with shape [N, 6]. Each
+                row has 6 values: [label, confidence, xmin, ymin, xmax, ymax].
+                N is the total number of prediction.
+        """
+        output = self._get_output(body_feats, spatial_scale)
+        cls_pred_reshape_list = output['cls_pred']
+        bbox_pred_reshape_list = output['bbox_pred']
+        anchor_reshape_list = output['anchor']
+        anchor_var_reshape_list = output['anchor_var']
+        for i in range(self.max_level - self.min_level + 1):
+            cls_pred_reshape_list[i] = fluid.layers.sigmoid(
+                cls_pred_reshape_list[i])
+        pred_result = self.output_decoder(
+            bboxes=bbox_pred_reshape_list,
+            scores=cls_pred_reshape_list,
+            anchors=anchor_reshape_list,
+            im_info=im_info)
+        return {'bbox': pred_result}
+
+    def get_loss(self, body_feats, spatial_scale, im_info, gt_box, gt_label,
+                 is_crowd):
+        """
+        Calculate the loss of retinanet.
+        Args:
+            fpn_dict(dict): A dictionary represents the output of FPN with
+                their name.
+            spatial_scale(list): A list of multiplicative spatial scale factor.
+            im_info(Variable): A 2-D LoDTensor with shape [B, 3]. B is the
+                number of input images, each element consists of im_height,
+                im_width, im_scale.
+            gt_box(Variable): The ground-truth bounding boxes with shape [M, 4].
+                M is the number of groundtruth.
+            gt_label(Variable): The ground-truth labels with shape [M, 1].
+                M is the number of groundtruth.
+            is_crowd(Variable): Indicates groud-truth is crowd or not with
+                shape [M, 1]. M is the number of groundtruth.
+
+        Returns:
+            Type: dict
+                loss_cls(Variable): focal loss.
+                loss_bbox(Variable): smooth l1 loss.
+        """
+        output = self._get_output(body_feats, spatial_scale)
+        cls_pred_reshape_list = output['cls_pred']
+        bbox_pred_reshape_list = output['bbox_pred']
+        anchor_reshape_list = output['anchor']
+        anchor_var_reshape_list = output['anchor_var']
+
+        cls_pred_input = fluid.layers.concat(cls_pred_reshape_list, axis=1)
+        bbox_pred_input = fluid.layers.concat(bbox_pred_reshape_list, axis=1)
+        anchor_input = fluid.layers.concat(anchor_reshape_list, axis=0)
+        anchor_var_input = fluid.layers.concat(anchor_var_reshape_list, axis=0)
+        score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight, fg_num = \
+            self.target_assign(
+                bbox_pred=bbox_pred_input,
+                cls_logits=cls_pred_input,
+                anchor_box=anchor_input,
+                anchor_var=anchor_var_input,
+                gt_boxes=gt_box,
+                gt_labels=gt_label,
+                is_crowd=is_crowd,
+                im_info=im_info,
+                num_classes=self.num_classes - 1)
+        fg_num = fluid.layers.reduce_sum(fg_num, name='fg_num')
+        loss_cls = fluid.layers.sigmoid_focal_loss(
+            x=score_pred,
+            label=score_tgt,
+            fg_num=fg_num,
+            gamma=self.gamma,
+            alpha=self.alpha)
+        loss_cls = fluid.layers.reduce_sum(loss_cls, name='loss_cls')
+        loss_bbox = fluid.layers.smooth_l1(
+            x=loc_pred,
+            y=loc_tgt,
+            sigma=self.sigma,
+            inside_weight=bbox_weight,
+            outside_weight=bbox_weight)
+        loss_bbox = fluid.layers.reduce_sum(loss_bbox, name='loss_bbox')
+        loss_bbox = loss_bbox / fg_num
+        return {'loss_cls': loss_cls, 'loss_bbox': loss_bbox}
--- a/ppdet/modeling/anchor_heads/rpn_head.py
+++ b/ppdet/modeling/anchor_heads/rpn_head.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Normal
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.core.workspace import register
+from ppdet.modeling.ops import (AnchorGenerator,
+                                RPNTargetAssign, GenerateProposals)
+
+__all__ = ['RPNTargetAssign', 'GenerateProposals', 'RPNHead', 'FPNRPNHead']
+
+
+@register
+class RPNHead(object):
+    """
+    RPN Head
+
+    Args:
+        anchor_generator (object): `AnchorGenerator` instance
+        rpn_target_assign (object): `RPNTargetAssign` instance
+        train_proposal (object): `GenerateProposals` instance for training
+        test_proposal (object): `GenerateProposals` instance for testing
+    """
+    __inject__ = [
+        'anchor_generator', 'rpn_target_assign', 'train_proposal',
+        'test_proposal'
+    ]
+
+    def __init__(self,
+                 anchor_generator=AnchorGenerator().__dict__,
+                 rpn_target_assign=RPNTargetAssign().__dict__,
+                 train_proposal=GenerateProposals(12000, 2000).__dict__,
+                 test_proposal=GenerateProposals().__dict__):
+        super(RPNHead, self).__init__()
+        self.anchor_generator = anchor_generator
+        self.rpn_target_assign = rpn_target_assign
+        self.train_proposal = train_proposal
+        self.test_proposal = test_proposal
+        if isinstance(anchor_generator, dict):
+            self.anchor_generator = AnchorGenerator(**anchor_generator)
+        if isinstance(rpn_target_assign, dict):
+            self.rpn_target_assign = RPNTargetAssign(**rpn_target_assign)
+        if isinstance(train_proposal, dict):
+            self.train_proposal = GenerateProposals(**train_proposal)
+        if isinstance(test_proposal, dict):
+            self.test_proposal = GenerateProposals(**test_proposal)
+
+    def _get_output(self, input):
+        """
+        Get anchor and RPN head output.
+
+        Args:
+            input(Variable): feature map from backbone with shape of [N, C, H, W]
+
+        Returns:
+            rpn_cls_score(Variable): Output of rpn head with shape of
+                [N, num_anchors, H, W].
+            rpn_bbox_pred(Variable): Output of rpn head with shape of
+                [N, num_anchors * 4, H, W].
+        """
+        dim_out = input.shape[1]
+        rpn_conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=dim_out,
+            filter_size=3,
+            stride=1,
+            padding=1,
+            act='relu',
+            name='conv_rpn',
+            param_attr=ParamAttr(
+                name="conv_rpn_w", initializer=Normal(
+                    loc=0., scale=0.01)),
+            bias_attr=ParamAttr(
+                name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.)))
+        # Generate anchors
+        self.anchor, self.anchor_var = self.anchor_generator(input=rpn_conv)
+        num_anchor = self.anchor.shape[2]
+        # Proposal classification scores
+        self.rpn_cls_score = fluid.layers.conv2d(
+            rpn_conv,
+            num_filters=num_anchor,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            act=None,
+            name='rpn_cls_score',
+            param_attr=ParamAttr(
+                name="rpn_cls_logits_w", initializer=Normal(
+                    loc=0., scale=0.01)),
+            bias_attr=ParamAttr(
+                name="rpn_cls_logits_b",
+                learning_rate=2.,
+                regularizer=L2Decay(0.)))
+        # Proposal bbox regression deltas
+        self.rpn_bbox_pred = fluid.layers.conv2d(
+            rpn_conv,
+            num_filters=4 * num_anchor,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            act=None,
+            name='rpn_bbox_pred',
+            param_attr=ParamAttr(
+                name="rpn_bbox_pred_w", initializer=Normal(
+                    loc=0., scale=0.01)),
+            bias_attr=ParamAttr(
+                name="rpn_bbox_pred_b",
+                learning_rate=2.,
+                regularizer=L2Decay(0.)))
+        return self.rpn_cls_score, self.rpn_bbox_pred
+
+    def get_proposals(self, body_feats, im_info, mode='train'):
+        """
+        Get proposals according to the output of backbone.
+
+        Args:
+            body_feats (dict): The dictionary of feature maps from backbone.
+            im_info(Variable): The information of image with shape [N, 3] with
+                shape (height, width, scale).
+            body_feat_names(list): A list of names of feature maps from
+                backbone.
+
+        Returns:
+            rpn_rois(Variable): Output proposals with shape of (rois_num, 4).
+        """
+
+        # In RPN Heads, only the last feature map of backbone is used.
+        # And body_feat_names[-1] represents the last level name of backbone.
+        body_feat = list(body_feats.values())[-1]
+        rpn_cls_score, rpn_bbox_pred = self._get_output(body_feat)
+
+        rpn_cls_score_prob = fluid.layers.sigmoid(
+            rpn_cls_score, name='rpn_cls_score_prob')
+
+        prop_op = self.train_proposal if mode == 'train' else self.test_proposal
+        rpn_rois, rpn_roi_probs = prop_op(
+            scores=rpn_cls_score_prob,
+            bbox_deltas=rpn_bbox_pred,
+            im_info=im_info,
+            anchors=self.anchor,
+            variances=self.anchor_var)
+        return rpn_rois
+
+    def _transform_input(self, rpn_cls_score, rpn_bbox_pred, anchor,
+                         anchor_var):
+        rpn_cls_score = fluid.layers.transpose(rpn_cls_score, perm=[0, 2, 3, 1])
+        rpn_bbox_pred = fluid.layers.transpose(rpn_bbox_pred, perm=[0, 2, 3, 1])
+        anchor = fluid.layers.reshape(anchor, shape=(-1, 4))
+        anchor_var = fluid.layers.reshape(anchor_var, shape=(-1, 4))
+        rpn_cls_score = fluid.layers.reshape(x=rpn_cls_score, shape=(0, -1, 1))
+        rpn_bbox_pred = fluid.layers.reshape(x=rpn_bbox_pred, shape=(0, -1, 4))
+        return rpn_cls_score, rpn_bbox_pred, anchor, anchor_var
+
+    def _get_loss_input(self):
+        for attr in ['rpn_cls_score', 'rpn_bbox_pred', 'anchor', 'anchor_var']:
+            if not getattr(self, attr, None):
+                raise ValueError("self.{} should not be None,".format(attr),
+                                 "call RPNHead.get_proposals first")
+        return self._transform_input(self.rpn_cls_score, self.rpn_bbox_pred,
+                                     self.anchor, self.anchor_var)
+
+    def get_loss(self, im_info, gt_box, is_crowd):
+        """
+        Sample proposals and Calculate rpn loss.
+
+        Args:
+            im_info(Variable): The information of image with shape [N, 3] with
+                shape (height, width, scale).
+            gt_box(Variable): The ground-truth bounding boxes with shape [M, 4].
+                M is the number of groundtruth.
+            is_crowd(Variable): Indicates groud-truth is crowd or not with
+                shape [M, 1]. M is the number of groundtruth.
+
+        Returns:
+            Type: dict
+                rpn_cls_loss(Variable): RPN classification loss.
+                rpn_bbox_loss(Variable): RPN bounding box regression loss.
+
+        """
+        rpn_cls, rpn_bbox, anchor, anchor_var = self._get_loss_input()
+        score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \
+            self.rpn_target_assign(
+                bbox_pred=rpn_bbox,
+                cls_logits=rpn_cls,
+                anchor_box=anchor,
+                anchor_var=anchor_var,
+                gt_boxes=gt_box,
+                is_crowd=is_crowd,
+                im_info=im_info)
+
+        score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32')
+        score_tgt.stop_gradient = True
+        rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
+            x=score_pred, label=score_tgt)
+        rpn_cls_loss = fluid.layers.reduce_mean(
+            rpn_cls_loss, name='loss_rpn_cls')
+
+        loc_tgt = fluid.layers.cast(x=loc_tgt, dtype='float32')
+        loc_tgt.stop_gradient = True
+        rpn_reg_loss = fluid.layers.smooth_l1(
+            x=loc_pred,
+            y=loc_tgt,
+            sigma=3.0,
+            inside_weight=bbox_weight,
+            outside_weight=bbox_weight)
+        rpn_reg_loss = fluid.layers.reduce_sum(
+            rpn_reg_loss, name='loss_rpn_bbox')
+        score_shape = fluid.layers.shape(score_tgt)
+        score_shape = fluid.layers.cast(x=score_shape, dtype='float32')
+        norm = fluid.layers.reduce_prod(score_shape)
+        norm.stop_gradient = True
+        rpn_reg_loss = rpn_reg_loss / norm
+
+        return {'loss_rpn_cls': rpn_cls_loss, 'loss_rpn_bbox': rpn_reg_loss}
+
+
+@register
+class FPNRPNHead(RPNHead):
+    """
+    RPN Head that supports FPN input
+
+    Args:
+        anchor_generator (object): `AnchorGenerator` instance
+        rpn_target_assign (object): `RPNTargetAssign` instance
+        train_proposal (object): `GenerateProposals` instance for training
+        test_proposal (object): `GenerateProposals` instance for testing
+        anchor_start_size (int): size of anchor at the first scale
+        num_chan (int): number of FPN output channels
+        min_level (int): lowest level of FPN output
+        max_level (int): highest level of FPN output
+    """
+
+    __inject__ = [
+        'anchor_generator', 'rpn_target_assign', 'train_proposal',
+        'test_proposal'
+    ]
+
+    def __init__(self,
+                 anchor_generator=AnchorGenerator().__dict__,
+                 rpn_target_assign=RPNTargetAssign().__dict__,
+                 train_proposal=GenerateProposals(12000, 2000).__dict__,
+                 test_proposal=GenerateProposals().__dict__,
+                 anchor_start_size=32,
+                 num_chan=256,
+                 min_level=2,
+                 max_level=6):
+        super(FPNRPNHead, self).__init__(anchor_generator, rpn_target_assign,
+                                         train_proposal, test_proposal)
+        self.anchor_start_size = anchor_start_size
+        self.num_chan = num_chan
+        self.min_level = min_level
+        self.max_level = max_level
+
+        self.fpn_rpn_list = []
+        self.anchors_list = []
+        self.anchor_var_list = []
+
+    def _get_output(self, input, feat_lvl):
+        """
+        Get anchor and FPN RPN head output at one level.
+
+        Args:
+            input(Variable): Body feature from backbone.
+            feat_lvl(int): Indicate the level of rpn output corresponding
+                to the level of feature map.
+
+        Return:
+            rpn_cls_score(Variable): Output of one level of fpn rpn head with
+                shape of [N, num_anchors, H, W].
+            rpn_bbox_pred(Variable): Output of one level of fpn rpn head with
+                shape of [N, num_anchors * 4, H, W].
+        """
+        slvl = str(feat_lvl)
+        conv_name = 'conv_rpn_fpn' + slvl
+        cls_name = 'rpn_cls_logits_fpn' + slvl
+        bbox_name = 'rpn_bbox_pred_fpn' + slvl
+        conv_share_name = 'conv_rpn_fpn' + str(self.min_level)
+        cls_share_name = 'rpn_cls_logits_fpn' + str(self.min_level)
+        bbox_share_name = 'rpn_bbox_pred_fpn' + str(self.min_level)
+
+        num_anchors = len(self.anchor_generator.aspect_ratios)
+        conv_rpn_fpn = fluid.layers.conv2d(
+            input=input,
+            num_filters=self.num_chan,
+            filter_size=3,
+            padding=1,
+            act='relu',
+            name=conv_name,
+            param_attr=ParamAttr(
+                name=conv_share_name + '_w',
+                initializer=Normal(
+                    loc=0., scale=0.01)),
+            bias_attr=ParamAttr(
+                name=conv_share_name + '_b',
+                learning_rate=2.,
+                regularizer=L2Decay(0.)))
+
+        self.anchors, self.anchor_var = self.anchor_generator(
+            input=conv_rpn_fpn,
+            anchor_sizes=(self.anchor_start_size * 2.
+                          **(feat_lvl - self.min_level), ),
+            stride=(2.**feat_lvl, 2.**feat_lvl))
+
+        self.rpn_cls_score = fluid.layers.conv2d(
+            input=conv_rpn_fpn,
+            num_filters=num_anchors,
+            filter_size=1,
+            act=None,
+            name=cls_name,
+            param_attr=ParamAttr(
+                name=cls_share_name + '_w',
+                initializer=Normal(
+                    loc=0., scale=0.01)),
+            bias_attr=ParamAttr(
+                name=cls_share_name + '_b',
+                learning_rate=2.,
+                regularizer=L2Decay(0.)))
+        self.rpn_bbox_pred = fluid.layers.conv2d(
+            input=conv_rpn_fpn,
+            num_filters=num_anchors * 4,
+            filter_size=1,
+            act=None,
+            name=bbox_name,
+            param_attr=ParamAttr(
+                name=bbox_share_name + '_w',
+                initializer=Normal(
+                    loc=0., scale=0.01)),
+            bias_attr=ParamAttr(
+                name=bbox_share_name + '_b',
+                learning_rate=2.,
+                regularizer=L2Decay(0.)))
+        return self.rpn_cls_score, self.rpn_bbox_pred
+
+    def _get_single_proposals(self, body_feat, im_info, feat_lvl, mode='train'):
+        """
+        Get proposals in one level according to the output of fpn rpn head
+
+        Args:
+            body_feat(Variable): the feature map from backone.
+            im_info(Variable): The information of image with shape [N, 3] with
+                format (height, width, scale).
+            feat_lvl(int): Indicate the level of proposals corresponding to
+                the feature maps.
+
+        Returns:
+            rpn_rois_fpn(Variable): Output proposals with shape of (rois_num, 4).
+            rpn_roi_probs_fpn(Variable): Scores of proposals with
+                shape of (rois_num, 1).
+        """
+
+        rpn_cls_logits_fpn, rpn_bbox_pred_fpn = self._get_output(body_feat,
+                                                                 feat_lvl)
+
+        prop_op = self.train_proposal if mode == 'train' else self.test_proposal
+        rpn_cls_prob_fpn = fluid.layers.sigmoid(
+            rpn_cls_logits_fpn, name='rpn_cls_probs_fpn' + str(feat_lvl))
+        rpn_rois_fpn, rpn_roi_probs_fpn = prop_op(
+            scores=rpn_cls_prob_fpn,
+            bbox_deltas=rpn_bbox_pred_fpn,
+            im_info=im_info,
+            anchors=self.anchors,
+            variances=self.anchor_var)
+        return rpn_rois_fpn, rpn_roi_probs_fpn
+
+    def get_proposals(self, fpn_feats, im_info, mode='train'):
+        """
+        Get proposals in multiple levels according to the output of fpn
+        rpn head
+
+        Args:
+            fpn_feats(dict): A dictionary represents the output feature map
+                of FPN with their name.
+            im_info(Variable): The information of image with shape [N, 3] with
+                format (height, width, scale).
+
+        Return:
+            rois_list(Variable): Output proposals in shape of [rois_num, 4]
+        """
+        rois_list = []
+        roi_probs_list = []
+        fpn_feat_names = list(fpn_feats.keys())
+        for lvl in range(self.min_level, self.max_level + 1):
+            fpn_feat_name = fpn_feat_names[self.max_level - lvl]
+            fpn_feat = fpn_feats[fpn_feat_name]
+            rois_fpn, roi_probs_fpn = self._get_single_proposals(
+                fpn_feat, im_info, lvl, mode)
+            self.fpn_rpn_list.append((self.rpn_cls_score, self.rpn_bbox_pred))
+            rois_list.append(rois_fpn)
+            roi_probs_list.append(roi_probs_fpn)
+            self.anchors_list.append(self.anchors)
+            self.anchor_var_list.append(self.anchor_var)
+        prop_op = self.train_proposal if mode == 'train' else self.test_proposal
+        post_nms_top_n = prop_op.post_nms_top_n
+        rois_collect = fluid.layers.collect_fpn_proposals(
+            rois_list,
+            roi_probs_list,
+            self.min_level,
+            self.max_level,
+            post_nms_top_n,
+            name='collect')
+        return rois_collect
+
+    def _get_loss_input(self):
+        rpn_clses = []
+        rpn_bboxes = []
+        anchors = []
+        anchor_vars = []
+        for i in range(len(self.fpn_rpn_list)):
+            single_input = self._transform_input(
+                self.fpn_rpn_list[i][0], self.fpn_rpn_list[i][1],
+                self.anchors_list[i], self.anchor_var_list[i])
+            rpn_clses.append(single_input[0])
+            rpn_bboxes.append(single_input[1])
+            anchors.append(single_input[2])
+            anchor_vars.append(single_input[3])
+
+        rpn_cls = fluid.layers.concat(rpn_clses, axis=1)
+        rpn_bbox = fluid.layers.concat(rpn_bboxes, axis=1)
+        anchors = fluid.layers.concat(anchors)
+        anchor_var = fluid.layers.concat(anchor_vars)
+        return rpn_cls, rpn_bbox, anchors, anchor_var
--- a/ppdet/modeling/anchor_heads/yolo_head.py
+++ b/ppdet/modeling/anchor_heads/yolo_head.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.modeling.ops import MultiClassNMS
+from ppdet.core.workspace import register
+
+__all__ = ['YOLOv3Head']
+
+
+@register
+class YOLOv3Head(object):
+    """
+    Head block for YOLOv3 network
+
+    Args:
+        norm_decay (float): weight decay for normalization layer weights
+        num_classes (int): number of output classes
+        ignore_thresh (float): threshold to ignore confidence loss
+        label_smooth (bool): whether to use label smoothing
+        anchors (list): anchors
+        anchor_masks (list): anchor masks
+        nms (object): an instance of `MultiClassNMS`
+    """
+    __inject__ = ['nms']
+
+    def __init__(self,
+                 norm_decay=0.,
+                 num_classes=80,
+                 ignore_thresh=0.7,
+                 label_smooth=True,
+                 anchors=[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+                          [59, 119], [116, 90], [156, 198], [373, 326]],
+                 anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
+                 nms=MultiClassNMS(
+                     score_threshold=0.01,
+                     nms_top_k=1000,
+                     keep_top_k=100,
+                     nms_threshold=0.45,
+                     background_label=-1).__dict__):
+        self.norm_decay = norm_decay
+        self.num_classes = num_classes
+        self.ignore_thresh = ignore_thresh
+        self.label_smooth = label_smooth
+        self.anchor_masks = anchor_masks
+        self._parse_anchors(anchors)
+        self.nms = nms
+        if isinstance(nms, dict):
+            self.nms = MultiClassNMS(**nms)
+
+    def _conv_bn(self,
+                 input,
+                 ch_out,
+                 filter_size,
+                 stride,
+                 padding,
+                 act='leaky',
+                 is_test=True,
+                 name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=ch_out,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            act=None,
+            param_attr=ParamAttr(name=name + ".conv.weights"),
+            bias_attr=False)
+
+        bn_name = name + ".bn"
+        bn_param_attr = ParamAttr(
+            regularizer=L2Decay(self.norm_decay), name=bn_name + '.scale')
+        bn_bias_attr = ParamAttr(
+            regularizer=L2Decay(self.norm_decay), name=bn_name + '.offset')
+        out = fluid.layers.batch_norm(
+            input=conv,
+            act=None,
+            is_test=is_test,
+            param_attr=bn_param_attr,
+            bias_attr=bn_bias_attr,
+            moving_mean_name=bn_name + '.mean',
+            moving_variance_name=bn_name + '.var')
+
+        if act == 'leaky':
+            out = fluid.layers.leaky_relu(x=out, alpha=0.1)
+        return out
+
+    def _detection_block(self, input, channel, is_test=True, name=None):
+        assert channel % 2 == 0, \
+            "channel {} cannot be divided by 2 in detection block {}" \
+            .format(channel, name)
+
+        conv = input
+        for j in range(2):
+            conv = self._conv_bn(
+                conv,
+                channel,
+                filter_size=1,
+                stride=1,
+                padding=0,
+                is_test=is_test,
+                name='{}.{}.0'.format(name, j))
+            conv = self._conv_bn(
+                conv,
+                channel * 2,
+                filter_size=3,
+                stride=1,
+                padding=1,
+                is_test=is_test,
+                name='{}.{}.1'.format(name, j))
+        route = self._conv_bn(
+            conv,
+            channel,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            is_test=is_test,
+            name='{}.2'.format(name))
+        tip = self._conv_bn(
+            route,
+            channel * 2,
+            filter_size=3,
+            stride=1,
+            padding=1,
+            is_test=is_test,
+            name='{}.tip'.format(name))
+        return route, tip
+
+    def _upsample(self, input, scale=2, name=None):
+        # get dynamic upsample output shape
+        shape_nchw = fluid.layers.shape(input)
+        shape_hw = fluid.layers.slice(
+            shape_nchw, axes=[0], starts=[2], ends=[4])
+        shape_hw.stop_gradient = True
+        in_shape = fluid.layers.cast(shape_hw, dtype='int32')
+        out_shape = in_shape * scale
+        out_shape.stop_gradient = True
+
+        # reisze by actual_shape
+        out = fluid.layers.resize_nearest(
+            input=input, scale=scale, actual_shape=out_shape, name=name)
+        return out
+
+    def _parse_anchors(self, anchors):
+        """
+        Check ANCHORS/ANCHOR_MASKS in config and parse mask_anchors
+
+        """
+        self.anchors = []
+        self.mask_anchors = []
+
+        assert len(anchors) > 0, "ANCHORS not set."
+        assert len(self.anchor_masks) > 0, "ANCHOR_MASKS not set."
+
+        for anchor in anchors:
+            assert len(anchor) == 2, "anchor {} len should be 2".format(anchor)
+            self.anchors.extend(anchor)
+
+        anchor_num = len(anchors)
+        for masks in self.anchor_masks:
+            self.mask_anchors.append([])
+            for mask in masks:
+                assert mask < anchor_num, "anchor mask index overflow"
+                self.mask_anchors[-1].extend(anchors[mask])
+
+    def _get_outputs(self, input, is_train=True):
+        """
+        Get YOLOv3 head output
+
+        Args:
+            input (list): List of Variables, output of backbone stages
+            is_train (bool): whether in train or test mode
+
+        Returns:
+            outputs (list): Variables of each output layer
+        """
+
+        outputs = []
+
+        # get last out_layer_num blocks in reverse order
+        out_layer_num = len(self.anchor_masks)
+        blocks = input[-1:-out_layer_num - 1:-1]
+
+        route = None
+        for i, block in enumerate(blocks):
+            if i > 0:  # perform concat in first 2 detection_block
+                block = fluid.layers.concat(input=[route, block], axis=1)
+            route, tip = self._detection_block(
+                block,
+                channel=512 // (2**i),
+                is_test=(not is_train),
+                name="yolo_block.{}".format(i))
+
+            # out channel number = mask_num * (5 + class_num)
+            num_filters = len(self.anchor_masks[i]) * (self.num_classes + 5)
+            block_out = fluid.layers.conv2d(
+                input=tip,
+                num_filters=num_filters,
+                filter_size=1,
+                stride=1,
+                padding=0,
+                act=None,
+                param_attr=ParamAttr(
+                    name="yolo_output.{}.conv.weights".format(i)),
+                bias_attr=ParamAttr(
+                    regularizer=L2Decay(0.),
+                    name="yolo_output.{}.conv.bias".format(i)))
+            outputs.append(block_out)
+
+            if i < len(blocks) - 1:
+                # do not perform upsample in the last detection_block
+                route = self._conv_bn(
+                    input=route,
+                    ch_out=256 // (2**i),
+                    filter_size=1,
+                    stride=1,
+                    padding=0,
+                    is_test=(not is_train),
+                    name="yolo_transition.{}".format(i))
+                # upsample
+                route = self._upsample(route)
+
+        return outputs
+
+    def get_loss(self, input, gt_box, gt_label, gt_score):
+        """
+        Get final loss of network of YOLOv3.
+
+        Args:
+            input (list): List of Variables, output of backbone stages
+            gt_box (Variable): The ground-truth boudding boxes.
+            gt_label (Variable): The ground-truth class labels.
+            gt_score (Variable): The ground-truth boudding boxes mixup scores.
+
+        Returns:
+            loss (Variable): The loss Variable of YOLOv3 network.
+
+        """
+        outputs = self._get_outputs(input, is_train=True)
+
+        losses = []
+        downsample = 32
+        for i, output in enumerate(outputs):
+            anchor_mask = self.anchor_masks[i]
+            loss = fluid.layers.yolov3_loss(
+                x=output,
+                gt_box=gt_box,
+                gt_label=gt_label,
+                gt_score=gt_score,
+                anchors=self.anchors,
+                anchor_mask=anchor_mask,
+                class_num=self.num_classes,
+                ignore_thresh=self.ignore_thresh,
+                downsample_ratio=downsample,
+                use_label_smooth=self.label_smooth,
+                name="yolo_loss" + str(i))
+            losses.append(fluid.layers.reduce_mean(loss))
+            downsample //= 2
+
+        return sum(losses)
+
+    def get_prediction(self, input, im_shape):
+        """
+        Get prediction result of YOLOv3 network
+
+        Args:
+            input (list): List of Variables, output of backbone stages
+            im_shape (Variable): Variable of shape([h, w]) of each image
+
+        Returns:
+            pred (Variable): The prediction result after non-max suppress.
+
+        """
+
+        outputs = self._get_outputs(input, is_train=False)
+
+        boxes = []
+        scores = []
+        downsample = 32
+        for i, output in enumerate(outputs):
+            box, score = fluid.layers.yolo_box(
+                x=output,
+                img_size=im_shape,
+                anchors=self.mask_anchors[i],
+                class_num=self.num_classes,
+                conf_thresh=self.nms.score_threshold,
+                downsample_ratio=downsample,
+                name="yolo_box" + str(i))
+            boxes.append(box)
+            scores.append(fluid.layers.transpose(score, perm=[0, 2, 1]))
+
+            downsample //= 2
+
+        yolo_boxes = fluid.layers.concat(boxes, axis=1)
+        yolo_scores = fluid.layers.concat(scores, axis=2)
+        pred = self.nms(bboxes=yolo_boxes, scores=yolo_scores)
+        return {'bbox': pred}
--- a/ppdet/modeling/architectures/__init__.py
+++ b/ppdet/modeling/architectures/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+from . import faster_rcnn
+from . import mask_rcnn
+from . import cascade_rcnn
+from . import yolov3
+from . import ssd
+from . import retinanet
+
+from .faster_rcnn import *
+from .mask_rcnn import *
+from .cascade_rcnn import *
+from .yolov3 import *
+from .ssd import *
+from .retinanet import *
--- a/ppdet/modeling/architectures/cascade_rcnn.py
+++ b/ppdet/modeling/architectures/cascade_rcnn.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle.fluid as fluid
+
+from ppdet.core.workspace import register
+
+__all__ = ['CascadeRCNN']
+
+
+@register
+class CascadeRCNN(object):
+    """
+    Cascade R-CNN architecture, see https://arxiv.org/abs/1712.00726
+
+    Args:
+        backbone (object): backbone instance
+        rpn_head (object): `RPNhead` instance
+        bbox_assigner (object): `BBoxAssigner` instance
+        roi_extractor (object): ROI extractor instance
+        bbox_head (object): `BBoxHead` instance
+        fpn (object): feature pyramid network instance
+    """
+
+    __category__ = 'architecture'
+    __inject__ = [
+        'backbone', 'fpn', 'rpn_head', 'bbox_assigner', 'roi_extractor',
+        'bbox_head'
+    ]
+
+    def __init__(self,
+                 backbone,
+                 rpn_head,
+                 roi_extractor='FPNRoIAlign',
+                 bbox_head='CascadeBBoxHead',
+                 bbox_assigner='CascadeBBoxAssigner',
+                 fpn='FPN'):
+        super(CascadeRCNN, self).__init__()
+        assert fpn is not None, "cascade RCNN requires FPN"
+        self.backbone = backbone
+        self.fpn = fpn
+        self.rpn_head = rpn_head
+        self.bbox_assigner = bbox_assigner
+        self.roi_extractor = roi_extractor
+        self.bbox_head = bbox_head
+        # Cascade local cfg
+        self.cls_agnostic_bbox_reg = 2
+        (brw0, brw1, brw2) = self.bbox_assigner.bbox_reg_weights
+        self.cascade_bbox_reg_weights = [
+            [1. / brw0, 1. / brw0, 2. / brw0, 2. / brw0],
+            [1. / brw1, 1. / brw1, 2. / brw1, 2. / brw1],
+            [1. / brw2, 1. / brw2, 2. / brw2, 2. / brw2]
+        ]
+        self.cascade_rcnn_loss_weight = [1.0, 0.5, 0.25]
+
+    def build(self, feed_vars, mode='train'):
+        im = feed_vars['image']
+        im_info = feed_vars['im_info']
+        if mode == 'train':
+            gt_box = feed_vars['gt_box']
+            is_crowd = feed_vars['is_crowd']
+
+        # backbone
+        body_feats = self.backbone(im)
+        # body_feat_names = list(body_feats.keys())
+
+        # FPN
+        if self.fpn is not None:
+            body_feats, spatial_scale = self.fpn.get_output(body_feats)
+
+        # rpn proposals
+        rpn_rois = self.rpn_head.get_proposals(body_feats, im_info, mode=mode)
+
+        if mode == 'train':
+            rpn_loss = self.rpn_head.get_loss(im_info, gt_box, is_crowd)
+
+        proposal_list = []
+        roi_feat_list = []
+        rcnn_pred_list = []
+        rcnn_target_list = []
+
+        proposals = None
+        bbox_pred = None
+        for i in range(3):
+            if i > 0:
+                refined_bbox = self._decode_box(
+                    proposals,
+                    bbox_pred,
+                    curr_stage=i - 1, )
+            else:
+                refined_bbox = rpn_rois
+
+            if mode == 'train':
+                outs = self.bbox_assigner(
+                    input_rois=refined_bbox, feed_vars=feed_vars, curr_stage=i)
+
+                proposals = outs[0]
+                rcnn_target_list.append(outs)
+            else:
+                proposals = refined_bbox
+            proposal_list.append(proposals)
+
+            # extract roi features
+            roi_feat = self.roi_extractor(body_feats, proposals, spatial_scale)
+            roi_feat_list.append(roi_feat)
+
+            # bbox head
+            cls_score, bbox_pred = self.bbox_head.get_output(
+                roi_feat,
+                wb_scalar=1.0 / self.cascade_rcnn_loss_weight[i],
+                name='_' + str(i + 1) if i > 0 else '')
+            rcnn_pred_list.append((cls_score, bbox_pred))
+
+        if mode == 'train':
+            loss = self.bbox_head.get_loss(rcnn_pred_list, rcnn_target_list,
+                                           self.cascade_rcnn_loss_weight)
+            loss.update(rpn_loss)
+            total_loss = fluid.layers.sum(list(loss.values()))
+            loss.update({'loss': total_loss})
+            return loss
+        else:
+            pred = self.bbox_head.get_prediction(
+                im_info, roi_feat_list, rcnn_pred_list, proposal_list,
+                self.cascade_bbox_reg_weights, self.cls_agnostic_bbox_reg)
+            return pred
+
+    def _decode_box(self, proposals, bbox_pred, curr_stage):
+        rcnn_loc_delta_r = fluid.layers.reshape(
+            bbox_pred, (-1, self.cls_agnostic_bbox_reg, 4))
+        # only use fg box delta to decode box
+        rcnn_loc_delta_s = fluid.layers.slice(
+            rcnn_loc_delta_r, axes=[1], starts=[1], ends=[2])
+        refined_bbox = fluid.layers.box_coder(
+            prior_box=proposals,
+            prior_box_var=self.cascade_bbox_reg_weights[curr_stage],
+            target_box=rcnn_loc_delta_s,
+            code_type='decode_center_size',
+            box_normalized=False,
+            axis=1, )
+        refined_bbox = fluid.layers.reshape(refined_bbox, shape=[-1, 4])
+
+        return refined_bbox
+
+    def train(self, feed_vars):
+        return self.build(feed_vars, 'train')
+
+    def eval(self, feed_vars):
+        return self.build(feed_vars, 'test')
+
+    def test(self, feed_vars):
+        return self.build(feed_vars, 'test')
--- a/ppdet/modeling/architectures/faster_rcnn.py
+++ b/ppdet/modeling/architectures/faster_rcnn.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+
+from ppdet.core.workspace import register
+
+__all__ = ['FasterRCNN']
+
+
+@register
+class FasterRCNN(object):
+    """
+    Faster R-CNN architecture, see https://arxiv.org/abs/1506.01497
+    Args:
+        backbone (object): backbone instance
+        rpn_head (object): `RPNhead` instance
+        bbox_assigner (object): `BBoxAssigner` instance
+        roi_extractor (object): ROI extractor instance
+        bbox_head (object): `BBoxHead` instance
+        fpn (object): feature pyramid network instance
+    """
+
+    __category__ = 'architecture'
+    __inject__ = [
+        'backbone', 'rpn_head', 'bbox_assigner', 'roi_extractor', 'bbox_head',
+        'fpn'
+    ]
+
+    def __init__(self,
+                 backbone,
+                 rpn_head,
+                 roi_extractor,
+                 bbox_head='BBoxHead',
+                 bbox_assigner='BBoxAssigner',
+                 fpn=None):
+        super(FasterRCNN, self).__init__()
+        self.backbone = backbone
+        self.rpn_head = rpn_head
+        self.bbox_assigner = bbox_assigner
+        self.roi_extractor = roi_extractor
+        self.bbox_head = bbox_head
+        self.fpn = fpn
+
+    def build(self, feed_vars, mode='train'):
+        im = feed_vars['image']
+        im_info = feed_vars['im_info']
+        if mode == 'train':
+            gt_box = feed_vars['gt_box']
+            is_crowd = feed_vars['is_crowd']
+        else:
+            im_shape = feed_vars['im_info']
+        body_feats = self.backbone(im)
+        body_feat_names = list(body_feats.keys())
+
+        if self.fpn is not None:
+            body_feats, spatial_scale = self.fpn.get_output(body_feats)
+
+        rois = self.rpn_head.get_proposals(body_feats, im_info, mode=mode)
+
+        if mode == 'train':
+            rpn_loss = self.rpn_head.get_loss(im_info, gt_box, is_crowd)
+            # sampled rpn proposals
+            for var in ['gt_label', 'is_crowd', 'gt_box', 'im_info']:
+                assert var in feed_vars, "{} has no {}".format(feed_vars, var)
+            outs = self.bbox_assigner(
+                rpn_rois=rois,
+                gt_classes=feed_vars['gt_label'],
+                is_crowd=feed_vars['is_crowd'],
+                gt_boxes=feed_vars['gt_box'],
+                im_info=feed_vars['im_info'])
+
+            rois = outs[0]
+            labels_int32 = outs[1]
+            bbox_targets = outs[2]
+            bbox_inside_weights = outs[3]
+            bbox_outside_weights = outs[4]
+
+        if self.fpn is None:
+            # in models without FPN, roi extractor only uses the last level of
+            # feature maps. And body_feat_names[-1] represents the name of
+            # last feature map.
+            body_feat = body_feats[body_feat_names[-1]]
+            roi_feat = self.roi_extractor(body_feat, rois)
+        else:
+            roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
+
+        if mode == 'train':
+            loss = self.bbox_head.get_loss(roi_feat, labels_int32, bbox_targets,
+                                           bbox_inside_weights,
+                                           bbox_outside_weights)
+            loss.update(rpn_loss)
+            total_loss = fluid.layers.sum(list(loss.values()))
+            loss.update({'loss': total_loss})
+            return loss
+        else:
+            pred = self.bbox_head.get_prediction(roi_feat, rois, im_info,
+                                                 im_shape)
+            return pred
+
+    def train(self, feed_vars):
+        return self.build(feed_vars, 'train')
+
+    def eval(self, feed_vars):
+        return self.build(feed_vars, 'test')
+
+    def test(self, feed_vars):
+        return self.build(feed_vars, 'test')
--- a/ppdet/modeling/architectures/mask_rcnn.py
+++ b/ppdet/modeling/architectures/mask_rcnn.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+from ppdet.core.workspace import register
+
+__all__ = ['MaskRCNN']
+
+
+@register
+class MaskRCNN(object):
+    """
+    Mask R-CNN architecture, see https://arxiv.org/abs/1703.06870
+    Args:
+        backbone (object): backbone instance
+        rpn_head (object): `RPNhead` instance
+        bbox_assigner (object): `BBoxAssigner` instance
+        roi_extractor (object): ROI extractor instance
+        bbox_head (object): `BBoxHead` instance
+        mask_assigner (object): `MaskAssigner` instance
+        mask_head (object): `MaskHead` instance
+        fpn (object): feature pyramid network instance
+    """
+
+    __category__ = 'architecture'
+    __inject__ = [
+        'backbone', 'rpn_head', 'bbox_assigner', 'roi_extractor', 'bbox_head',
+        'mask_assigner', 'mask_head', 'fpn'
+    ]
+
+    def __init__(self,
+                 backbone,
+                 rpn_head,
+                 bbox_head='BBoxHead',
+                 bbox_assigner='BBoxAssigner',
+                 roi_extractor='RoIAlign',
+                 mask_assigner='MaskAssigner',
+                 mask_head='MaskHead',
+                 fpn=None):
+        super(MaskRCNN, self).__init__()
+        self.backbone = backbone
+        self.rpn_head = rpn_head
+        self.bbox_assigner = bbox_assigner
+        self.roi_extractor = roi_extractor
+        self.bbox_head = bbox_head
+        self.mask_assigner = mask_assigner
+        self.mask_head = mask_head
+        self.fpn = fpn
+
+    def build(self, feed_vars, mode='train'):
+        im = feed_vars['image']
+        assert mode in ['train', 'test'], \
+            "only 'train' and 'test' mode is supported"
+        if mode == 'train':
+            required_fields = [
+                'gt_label', 'gt_box', 'gt_mask', 'is_crowd', 'im_info'
+            ]
+        else:
+            required_fields = ['im_shape', 'im_info']
+        for var in required_fields:
+            assert var in feed_vars, \
+                "{} has no {} field".format(feed_vars, var)
+        im_info = feed_vars['im_info']
+
+        body_feats = self.backbone(im)
+
+        # FPN
+        if self.fpn is not None:
+            body_feats, spatial_scale = self.fpn.get_output(body_feats)
+
+        # RPN proposals
+        rois = self.rpn_head.get_proposals(body_feats, im_info, mode=mode)
+
+        if mode == 'train':
+            rpn_loss = self.rpn_head.get_loss(im_info, feed_vars['gt_box'],
+                                              feed_vars['is_crowd'])
+
+            outs = self.bbox_assigner(
+                rpn_rois=rois,
+                gt_classes=feed_vars['gt_label'],
+                is_crowd=feed_vars['is_crowd'],
+                gt_boxes=feed_vars['gt_box'],
+                im_info=feed_vars['im_info'])
+            rois = outs[0]
+            labels_int32 = outs[1]
+
+            if self.fpn is None:
+                last_feat = body_feats[list(body_feats.keys())[-1]]
+                roi_feat = self.roi_extractor(last_feat, rois)
+            else:
+                roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
+
+            loss = self.bbox_head.get_loss(roi_feat, labels_int32, *outs[2:])
+            loss.update(rpn_loss)
+
+            mask_rois, roi_has_mask_int32, mask_int32 = self.mask_assigner(
+                rois=rois,
+                gt_classes=feed_vars['gt_label'],
+                is_crowd=feed_vars['is_crowd'],
+                gt_segms=feed_vars['gt_mask'],
+                im_info=feed_vars['im_info'],
+                labels_int32=labels_int32)
+            if self.fpn is None:
+                bbox_head_feat = self.bbox_head.get_head_feat()
+                feat = fluid.layers.gather(bbox_head_feat, roi_has_mask_int32)
+            else:
+                feat = self.roi_extractor(
+                    body_feats, mask_rois, spatial_scale, is_mask=True)
+
+            mask_loss = self.mask_head.get_loss(feat, mask_int32)
+            loss.update(mask_loss)
+
+            total_loss = fluid.layers.sum(list(loss.values()))
+            loss.update({'loss': total_loss})
+            return loss
+
+        else:
+
+            if self.fpn is None:
+                last_feat = body_feats[list(body_feats.keys())[-1]]
+                roi_feat = self.roi_extractor(last_feat, rois)
+            else:
+                roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
+
+            bbox_pred = self.bbox_head.get_prediction(roi_feat, rois, im_info,
+                                                      feed_vars['im_shape'])
+            bbox_pred = bbox_pred['bbox']
+
+            # share weight
+            bbox_shape = fluid.layers.shape(bbox_pred)
+            bbox_size = fluid.layers.reduce_prod(bbox_shape)
+            bbox_size = fluid.layers.reshape(bbox_size, [1, 1])
+            size = fluid.layers.fill_constant([1, 1], value=6, dtype='int32')
+            cond = fluid.layers.less_than(x=bbox_size, y=size)
+
+            mask_pred = fluid.layers.create_global_var(
+                shape=[1], value=0.0, dtype='float32', persistable=False)
+
+            with fluid.layers.control_flow.Switch() as switch:
+                with switch.case(cond):
+                    fluid.layers.assign(input=bbox_pred, output=mask_pred)
+                with switch.default():
+                    bbox = fluid.layers.slice(
+                        bbox_pred, [1], starts=[2], ends=[6])
+
+                    im_scale = fluid.layers.slice(
+                        im_info, [1], starts=[2], ends=[3])
+                    im_scale = fluid.layers.sequence_expand(im_scale, bbox)
+
+                    mask_rois = bbox * im_scale
+                    if self.fpn is None:
+                        mask_feat = self.roi_extractor(last_feat, mask_rois)
+                        mask_feat = self.bbox_head.get_head_feat(mask_feat)
+                    else:
+                        mask_feat = self.roi_extractor(
+                            body_feats, mask_rois, spatial_scale, is_mask=True)
+
+                    mask_out = self.mask_head.get_prediction(mask_feat, bbox)
+                    fluid.layers.assign(input=mask_out, output=mask_pred)
+            return {'bbox': bbox_pred, 'mask': mask_pred}
+
+    def train(self, feed_vars):
+        return self.build(feed_vars, 'train')
+
+    def eval(self, feed_vars):
+        return self.build(feed_vars, 'test')
+
+    def test(self, feed_vars):
+        return self.build(feed_vars, 'test')
--- a/ppdet/modeling/architectures/retinanet.py
+++ b/ppdet/modeling/architectures/retinanet.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle.fluid as fluid
+
+from ppdet.core.workspace import register
+
+__all__ = ['RetinaNet']
+
+
+@register
+class RetinaNet(object):
+    """
+    RetinaNet architecture, see https://arxiv.org/abs/1708.02002
+
+    Args:
+        backbone (object): backbone instance
+        fpn (object): feature pyramid network instance
+        retina_head (object): `RetinaHead` instance
+    """
+
+    __category__ = 'architecture'
+    __inject__ = ['backbone', 'fpn', 'retina_head']
+
+    def __init__(self, backbone, fpn, retina_head):
+        super(RetinaNet, self).__init__()
+        self.backbone = backbone
+        self.fpn = fpn
+        self.retina_head = retina_head
+
+    def build(self, feed_vars, mode='train'):
+        im = feed_vars['image']
+        im_info = feed_vars['im_info']
+        if mode == 'train':
+            gt_box = feed_vars['gt_box']
+            gt_label = feed_vars['gt_label']
+            is_crowd = feed_vars['is_crowd']
+        # backbone
+        body_feats = self.backbone(im)
+
+        # FPN
+        body_feats, spatial_scale = self.fpn.get_output(body_feats)
+
+        # retinanet head
+        if mode == 'train':
+            loss = self.retina_head.get_loss(body_feats, spatial_scale, im_info,
+                                             gt_box, gt_label, is_crowd)
+            total_loss = fluid.layers.sum(list(loss.values()))
+            loss.update({'loss': total_loss})
+            return loss
+        else:
+            pred = self.retina_head.get_prediction(body_feats, spatial_scale,
+                                                   im_info)
+            return pred
+
+    def train(self, feed_vars):
+        return self.build(feed_vars, 'train')
+
+    def eval(self, feed_vars):
+        return self.build(feed_vars, 'test')
+
+    def test(self, feed_vars):
+        return self.build(feed_vars, 'test')
--- a/ppdet/modeling/architectures/ssd.py
+++ b/ppdet/modeling/architectures/ssd.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+
+from ppdet.core.workspace import register
+from ppdet.modeling.ops import SSDOutputDecoder, SSDMetric
+
+__all__ = ['SSD']
+
+
+@register
+class SSD(object):
+    """
+    Single Shot MultiBox Detector, see https://arxiv.org/abs/1512.02325
+
+    Args:
+        backbone (object): backbone instance
+        multi_box_head (object): `MultiBoxHead` instance
+        output_decoder (object): `SSDOutputDecoder` instance
+        metric (object): `SSDMetric` instance for training
+        num_classes (int): number of output classes
+    """
+
+    __category__ = 'architecture'
+    __inject__ = ['backbone', 'multi_box_head', 'output_decoder', 'metric']
+
+    def __init__(self,
+                 backbone,
+                 multi_box_head='MultiBoxHead',
+                 output_decoder=SSDOutputDecoder().__dict__,
+                 metric=SSDMetric().__dict__,
+                 num_classes=21):
+        super(SSD, self).__init__()
+        self.backbone = backbone
+        self.multi_box_head = multi_box_head
+        self.num_classes = num_classes
+        self.output_decoder = output_decoder
+        self.metric = metric
+        if isinstance(output_decoder, dict):
+            self.output_decoder = SSDOutputDecoder(**output_decoder)
+        if isinstance(metric, dict):
+            self.metric = SSDMetric(**metric)
+
+    def _forward(self, feed_vars, mode='train'):
+        im = feed_vars['image']
+        if mode == 'train' or mode == 'eval':
+            gt_box = feed_vars['gt_box']
+            gt_label = feed_vars['gt_label']
+            difficult = feed_vars['is_difficult']
+
+        body_feats = self.backbone(im)
+        locs, confs, box, box_var = self.multi_box_head(
+            inputs=body_feats, image=im, num_classes=self.num_classes)
+
+        if mode == 'train':
+            loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box,
+                                         box_var)
+            loss = fluid.layers.reduce_sum(loss)
+            return {'loss': loss}
+        else:
+            pred = self.output_decoder(locs, confs, box, box_var)
+            if mode == 'eval':
+                map_eval = self.metric(
+                    pred,
+                    gt_label,
+                    gt_box,
+                    difficult,
+                    class_num=self.num_classes)
+                _, accum_map = map_eval.get_map_var()
+                return {'map': map_eval, 'accum_map': accum_map}
+            else:
+                return {'bbox': pred}
+
+    def train(self, feed_vars):
+        return self._forward(feed_vars, 'train')
+
+    def eval(self, feed_vars):
+        return self._forward(feed_vars, 'eval')
+
+    def test(self, feed_vars):
+        return self._forward(feed_vars, 'test')
--- a/ppdet/modeling/architectures/yolov3.py
+++ b/ppdet/modeling/architectures/yolov3.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+
+from ppdet.core.workspace import register
+
+__all__ = ['YOLOv3']
+
+
+@register
+class YOLOv3(object):
+    """
+    YOLOv3 network, see https://arxiv.org/abs/1804.02767
+
+    Args:
+        backbone (object): an backbone instance
+        yolo_head (object): an `YOLOv3Head` instance
+    """
+
+    __category__ = 'architecture'
+    __inject__ = ['backbone', 'yolo_head']
+
+    def __init__(self, backbone, yolo_head='YOLOv3Head'):
+        super(YOLOv3, self).__init__()
+        self.backbone = backbone
+        self.yolo_head = yolo_head
+
+    def build(self, feed_vars, mode='train'):
+        im = feed_vars['image']
+        body_feats = self.backbone(im)
+
+        if isinstance(body_feats, OrderedDict):
+            body_feat_names = list(body_feats.keys())
+            body_feats = [body_feats[name] for name in body_feat_names]
+
+        if mode == 'train':
+            gt_box = feed_vars['gt_box']
+            gt_label = feed_vars['gt_label']
+            gt_score = feed_vars['gt_score']
+
+            return {
+                'loss': self.yolo_head.get_loss(body_feats, gt_box, gt_label,
+                                                gt_score)
+            }
+        else:
+            im_shape = feed_vars['im_shape']
+            return self.yolo_head.get_prediction(body_feats, im_shape)
+
+    def train(self, feed_vars):
+        return self.build(feed_vars, mode='train')
+
+    def eval(self, feed_vars):
+        return self.build(feed_vars, mode='test')
+
+    def test(self, feed_vars):
+        return self.build(feed_vars, mode='test')
--- a/ppdet/modeling/backbones/__init__.py
+++ b/ppdet/modeling/backbones/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+from . import resnet
+from . import resnext
+from . import darknet
+from . import mobilenet
+from . import senet
+from . import fpn
+
+from .resnet import *
+from .resnext import *
+from .darknet import *
+from .mobilenet import *
+from .senet import *
+from .fpn import *
--- a/ppdet/modeling/backbones/darknet.py
+++ b/ppdet/modeling/backbones/darknet.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import six
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.core.workspace import register
+
+__all__ = ['DarkNet']
+
+
+@register
+class DarkNet(object):
+    """
+    DarkNet, see https://pjreddie.com/darknet/yolo/
+    Args:
+        depth (int): network depth, currently only darknet 53 is supported
+        norm_type (str): normalization type, 'bn' and 'sync_bn' are supported
+        norm_decay (float): weight decay for normalization layer weights
+    """
+
+    def __init__(self, depth=53, norm_type='bn', norm_decay=0.):
+        assert depth in [53], "unsupported depth value"
+        self.depth = depth
+        self.norm_type = norm_type
+        self.norm_decay = norm_decay
+        self.depth_cfg = {53: ([1, 2, 8, 8, 4], self.basicblock)}
+
+    def _conv_norm(self,
+                   input,
+                   ch_out,
+                   filter_size,
+                   stride,
+                   padding,
+                   act='leaky',
+                   name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=ch_out,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            act=None,
+            param_attr=ParamAttr(name=name + ".conv.weights"),
+            bias_attr=False)
+
+        bn_name = name + ".bn"
+        bn_param_attr = ParamAttr(
+            regularizer=L2Decay(float(self.norm_decay)),
+            name=bn_name + '.scale')
+        bn_bias_attr = ParamAttr(
+            regularizer=L2Decay(float(self.norm_decay)),
+            name=bn_name + '.offset')
+
+        out = fluid.layers.batch_norm(
+            input=conv,
+            act=None,
+            param_attr=bn_param_attr,
+            bias_attr=bn_bias_attr,
+            moving_mean_name=bn_name + '.mean',
+            moving_variance_name=bn_name + '.var')
+
+        # leaky relu here has `alpha` as 0.1, can not be set by
+        # `act` param in fluid.layers.batch_norm above.
+        if act == 'leaky':
+            out = fluid.layers.leaky_relu(x=out, alpha=0.1)
+
+        return out
+
+    def _downsample(self,
+                    input,
+                    ch_out,
+                    filter_size=3,
+                    stride=2,
+                    padding=1,
+                    name=None):
+        return self._conv_norm(
+            input,
+            ch_out=ch_out,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            name=name)
+
+    def basicblock(self, input, ch_out, name=None):
+        conv1 = self._conv_norm(
+            input,
+            ch_out=ch_out,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            name=name + ".0")
+        conv2 = self._conv_norm(
+            conv1,
+            ch_out=ch_out * 2,
+            filter_size=3,
+            stride=1,
+            padding=1,
+            name=name + ".1")
+        out = fluid.layers.elementwise_add(x=input, y=conv2, act=None)
+        return out
+
+    def layer_warp(self, block_func, input, ch_out, count, name=None):
+        out = block_func(input, ch_out=ch_out, name='{}.0'.format(name))
+        for j in six.moves.xrange(1, count):
+            out = block_func(out, ch_out=ch_out, name='{}.{}'.format(name, j))
+        return out
+
+    def __call__(self, input):
+        """
+        Get the backbone of DarkNet, that is output for the 5 stages.
+
+        Args:
+            input (Variable): input variable.
+
+        Returns:
+            The last variables of each stage.
+        """
+        stages, block_func = self.depth_cfg[self.depth]
+        stages = stages[0:5]
+        conv = self._conv_norm(
+            input=input,
+            ch_out=32,
+            filter_size=3,
+            stride=1,
+            padding=1,
+            name="yolo_input")
+        downsample_ = self._downsample(
+            input=conv, ch_out=conv.shape[1] * 2, name="yolo_input.downsample")
+        blocks = []
+        for i, stage in enumerate(stages):
+            block = self.layer_warp(
+                block_func=block_func,
+                input=downsample_,
+                ch_out=32 * 2**i,
+                count=stage,
+                name="stage.{}".format(i))
+            blocks.append(block)
+            if i < len(stages) - 1:  # do not downsaple in the last stage
+                downsample_ = self._downsample(
+                    input=block,
+                    ch_out=block.shape[1] * 2,
+                    name="stage.{}.downsample".format(i))
+        return blocks
--- a/ppdet/modeling/backbones/fpn.py
+++ b/ppdet/modeling/backbones/fpn.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Xavier
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.core.workspace import register
+
+__all__ = ['FPN']
+
+
+@register
+class FPN(object):
+    """
+    Feature Pyramid Network, see https://arxiv.org/abs/1612.03144
+
+    Args:
+        num_chan (int): number of feature channels
+        min_level (int): lowest level of the backbone feature map to use
+        max_level (int): highest level of the backbone feature map to use
+        spatial_scale (list): feature map scaling factor
+        has_extra_convs (bool): whether has extral convolutions in higher levels
+    """
+
+    def __init__(self,
+                 num_chan=256,
+                 min_level=2,
+                 max_level=6,
+                 spatial_scale=[1. / 32., 1. / 16., 1. / 8., 1. / 4.],
+                 has_extra_convs=False):
+        self.num_chan = num_chan
+        self.min_level = min_level
+        self.max_level = max_level
+        self.spatial_scale = spatial_scale
+        self.has_extra_convs = has_extra_convs
+
+    def _add_topdown_lateral(self, body_name, body_input, upper_output):
+        lateral_name = 'fpn_inner_' + body_name + '_lateral'
+        topdown_name = 'fpn_topdown_' + body_name
+        fan = body_input.shape[1]
+        lateral = fluid.layers.conv2d(
+            body_input,
+            self.num_chan,
+            1,
+            param_attr=ParamAttr(
+                name=lateral_name + "_w", initializer=Xavier(fan_out=fan)),
+            bias_attr=ParamAttr(
+                name=lateral_name + "_b",
+                learning_rate=2.,
+                regularizer=L2Decay(0.)),
+            name=lateral_name)
+        shape = fluid.layers.shape(upper_output)
+        shape_hw = fluid.layers.slice(shape, axes=[0], starts=[2], ends=[4])
+        out_shape_ = shape_hw * 2
+        out_shape = fluid.layers.cast(out_shape_, dtype='int32')
+        out_shape.stop_gradient = True
+        topdown = fluid.layers.resize_nearest(
+            upper_output, scale=2., actual_shape=out_shape, name=topdown_name)
+
+        return lateral + topdown
+
+    def get_output(self, body_dict):
+        """
+        Add FPN onto backbone.
+
+        Args:
+            body_dict(OrderedDict): Dictionary of variables and each element is the
+                output of backbone.
+
+        Return:
+            fpn_dict(OrderedDict): A dictionary represents the output of FPN with
+                their name.
+            spatial_scale(list): A list of multiplicative spatial scale factor.
+        """
+        body_name_list = list(body_dict.keys())[::-1]
+        num_backbone_stages = len(body_name_list)
+        self.fpn_inner_output = [[] for _ in range(num_backbone_stages)]
+        fpn_inner_name = 'fpn_inner_' + body_name_list[0]
+        body_input = body_dict[body_name_list[0]]
+        fan = body_input.shape[1]
+        self.fpn_inner_output[0] = fluid.layers.conv2d(
+            body_input,
+            self.num_chan,
+            1,
+            param_attr=ParamAttr(
+                name=fpn_inner_name + "_w", initializer=Xavier(fan_out=fan)),
+            bias_attr=ParamAttr(
+                name=fpn_inner_name + "_b",
+                learning_rate=2.,
+                regularizer=L2Decay(0.)),
+            name=fpn_inner_name)
+        for i in range(1, num_backbone_stages):
+            body_name = body_name_list[i]
+            body_input = body_dict[body_name]
+            top_output = self.fpn_inner_output[i - 1]
+            fpn_inner_single = self._add_topdown_lateral(body_name, body_input,
+                                                         top_output)
+            self.fpn_inner_output[i] = fpn_inner_single
+        fpn_dict = {}
+        fpn_name_list = []
+        for i in range(num_backbone_stages):
+            fpn_name = 'fpn_' + body_name_list[i]
+            fan = self.fpn_inner_output[i].shape[1] * 3 * 3
+            fpn_output = fluid.layers.conv2d(
+                self.fpn_inner_output[i],
+                self.num_chan,
+                filter_size=3,
+                padding=1,
+                param_attr=ParamAttr(
+                    name=fpn_name + "_w", initializer=Xavier(fan_out=fan)),
+                bias_attr=ParamAttr(
+                    name=fpn_name + "_b",
+                    learning_rate=2.,
+                    regularizer=L2Decay(0.)),
+                name=fpn_name)
+            fpn_dict[fpn_name] = fpn_output
+            fpn_name_list.append(fpn_name)
+        if not self.has_extra_convs and self.max_level - self.min_level == len(
+                self.spatial_scale):
+            body_top_name = fpn_name_list[0]
+            body_top_extension = fluid.layers.pool2d(
+                fpn_dict[body_top_name],
+                1,
+                'max',
+                pool_stride=2,
+                name=body_top_name + '_subsampled_2x')
+            fpn_dict[body_top_name + '_subsampled_2x'] = body_top_extension
+            fpn_name_list.insert(0, body_top_name + '_subsampled_2x')
+            self.spatial_scale.insert(0, self.spatial_scale[0] * 0.5)
+        # Coarser FPN levels introduced for RetinaNet
+        highest_backbone_level = self.min_level + len(self.spatial_scale) - 1
+        if self.has_extra_convs and self.max_level > highest_backbone_level:
+            fpn_blob = body_dict[body_name_list[0]]
+            for i in range(highest_backbone_level + 1, self.max_level + 1):
+                fpn_blob_in = fpn_blob
+                fpn_name = 'fpn_' + str(i)
+                if i > highest_backbone_level + 1:
+                    fpn_blob_in = fluid.layers.relu(fpn_blob)
+                fan = fpn_blob_in.shape[1] * 3 * 3
+                fpn_blob = fluid.layers.conv2d(
+                    input=fpn_blob_in,
+                    num_filters=self.num_chan,
+                    filter_size=3,
+                    stride=2,
+                    padding=1,
+                    param_attr=ParamAttr(
+                        name=fpn_name + "_w", initializer=Xavier(fan_out=fan)),
+                    bias_attr=ParamAttr(
+                        name=fpn_name + "_b",
+                        learning_rate=2.,
+                        regularizer=L2Decay(0.)),
+                    name=fpn_name)
+                fpn_dict[fpn_name] = fpn_blob
+                fpn_name_list.insert(0, fpn_name)
+                self.spatial_scale.insert(0, self.spatial_scale[0] * 0.5)
+        res_dict = OrderedDict([(k, fpn_dict[k]) for k in fpn_name_list])
+        return res_dict, self.spatial_scale
--- a/ppdet/modeling/backbones/mobilenet.py
+++ b/ppdet/modeling/backbones/mobilenet.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.core.workspace import register
+
+__all__ = ['MobileNet']
+
+
+@register
+class MobileNet(object):
+    """
+    MobileNet v1, see https://arxiv.org/abs/1704.04861
+
+    Args:
+        norm_type (str): normalization type, 'bn' and 'sync_bn' are supported
+        norm_decay (float): weight decay for normalization layer weights
+        conv_group_scale (int): scaling factor for convolution groups
+        with_extra_blocks (bool): if extra blocks should be added
+        extra_block_filters (list): number of filter for each extra block
+    """
+
+    def __init__(self,
+                 norm_type='bn',
+                 norm_decay=0.,
+                 conv_group_scale=1,
+                 with_extra_blocks=False,
+                 extra_block_filters=[[256, 512], [128, 256], [128, 256],
+                                      [64, 128]]):
+        self.norm_type = norm_type
+        self.norm_decay = norm_decay
+        self.conv_group_scale = conv_group_scale
+        self.with_extra_blocks = with_extra_blocks
+        self.extra_block_filters = extra_block_filters
+
+    def _conv_norm(self,
+                   input,
+                   filter_size,
+                   num_filters,
+                   stride,
+                   padding,
+                   channels=None,
+                   num_groups=1,
+                   act='relu',
+                   use_cudnn=True,
+                   name=None):
+        parameter_attr = ParamAttr(
+            learning_rate=0.1,
+            initializer=fluid.initializer.MSRA(),
+            name=name + "_weights")
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            groups=num_groups,
+            act=None,
+            use_cudnn=use_cudnn,
+            param_attr=parameter_attr,
+            bias_attr=False)
+
+        bn_name = name + "_bn"
+        norm_decay = self.norm_decay
+        bn_param_attr = ParamAttr(
+            regularizer=L2Decay(norm_decay), name=bn_name + '_scale')
+        bn_bias_attr = ParamAttr(
+            regularizer=L2Decay(norm_decay), name=bn_name + '_offset')
+        return fluid.layers.batch_norm(
+            input=conv,
+            act=act,
+            param_attr=bn_param_attr,
+            bias_attr=bn_bias_attr,
+            moving_mean_name=bn_name + '_mean',
+            moving_variance_name=bn_name + '_variance')
+
+    def depthwise_separable(self,
+                            input,
+                            num_filters1,
+                            num_filters2,
+                            num_groups,
+                            stride,
+                            scale,
+                            name=None):
+        depthwise_conv = self._conv_norm(
+            input=input,
+            filter_size=3,
+            num_filters=int(num_filters1 * scale),
+            stride=stride,
+            padding=1,
+            num_groups=int(num_groups * scale),
+            use_cudnn=False,
+            name=name + "_dw")
+
+        pointwise_conv = self._conv_norm(
+            input=depthwise_conv,
+            filter_size=1,
+            num_filters=int(num_filters2 * scale),
+            stride=1,
+            padding=0,
+            name=name + "_sep")
+        return pointwise_conv
+
+    def _extra_block(self,
+                     input,
+                     num_filters1,
+                     num_filters2,
+                     num_groups,
+                     stride,
+                     scale,
+                     name=None):
+        pointwise_conv = self._conv_norm(
+            input=input,
+            filter_size=1,
+            num_filters=int(num_filters1 * scale),
+            stride=1,
+            num_groups=int(num_groups * scale),
+            padding=0,
+            name=name + "_extra1")
+        normal_conv = self._conv_norm(
+            input=pointwise_conv,
+            filter_size=3,
+            num_filters=int(num_filters2 * scale),
+            stride=2,
+            num_groups=int(num_groups * scale),
+            padding=1,
+            name=name + "_extra2")
+        return normal_conv
+
+    def __call__(self, input):
+        scale = self.conv_group_scale
+
+        blocks = []
+        # input 1/1
+        out = self._conv_norm(input, 3, int(32 * scale), 2, 1, 3, name="conv1")
+        # 1/2
+        out = self.depthwise_separable(
+            out, 32, 64, 32, 1, scale, name="conv2_1")
+        out = self.depthwise_separable(
+            out, 64, 128, 64, 2, scale, name="conv2_2")
+        # 1/4
+        out = self.depthwise_separable(
+            out, 128, 128, 128, 1, scale, name="conv3_1")
+        out = self.depthwise_separable(
+            out, 128, 256, 128, 2, scale, name="conv3_2")
+        # 1/8
+        blocks.append(out)
+        out = self.depthwise_separable(
+            out, 256, 256, 256, 1, scale, name="conv4_1")
+        out = self.depthwise_separable(
+            out, 256, 512, 256, 2, scale, name="conv4_2")
+        # 1/16
+        blocks.append(out)
+        for i in range(5):
+            out = self.depthwise_separable(
+                out, 512, 512, 512, 1, scale, name="conv5_" + str(i + 1))
+        module11 = out
+
+        out = self.depthwise_separable(
+            out, 512, 1024, 512, 2, scale, name="conv5_6")
+        # 1/32
+        out = self.depthwise_separable(
+            out, 1024, 1024, 1024, 1, scale, name="conv6")
+        module13 = out
+        blocks.append(out)
+        if not self.with_extra_blocks:
+            return blocks
+
+        num_filters = self.extra_block_filters
+        module14 = self._extra_block(module13, num_filters[0][0],
+                                     num_filters[0][1], 1, 2, scale, "conv7_1")
+        module15 = self._extra_block(module14, num_filters[1][0],
+                                     num_filters[1][1], 1, 2, scale, "conv7_2")
+        module16 = self._extra_block(module15, num_filters[2][0],
+                                     num_filters[2][1], 1, 2, scale, "conv7_3")
+        module17 = self._extra_block(module16, num_filters[3][0],
+                                     num_filters[3][1], 1, 2, scale, "conv7_4")
+        return module11, module13, module14, module15, module16, module17
--- a/ppdet/modeling/backbones/name_adapter.py
+++ b/ppdet/modeling/backbones/name_adapter.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+class NameAdapter(object):
+    """Fix the backbones variable names for pretrained weight"""
+
+    def __init__(self, model):
+        super(NameAdapter, self).__init__()
+        self.model = model
+
+    @property
+    def model_type(self):
+        return getattr(self.model, '_model_type', '')
+
+    @property
+    def variant(self):
+        return getattr(self.model, 'variant', '')
+
+    def fix_conv_norm_name(self, name):
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:]
+        # the naming rule is same as pretrained weight
+        if self.model_type == 'SEResNeXt':
+            bn_name = name + "_bn"
+        return bn_name
+
+    def fix_shortcut_name(self, name):
+        if self.model_type == 'SEResNeXt':
+            name = 'conv' + name + '_prj'
+        return name
+
+    def fix_bottleneck_name(self, name):
+        if self.model_type == 'SEResNeXt':
+            conv_name1 = 'conv' + name + '_x1'
+            conv_name2 = 'conv' + name + '_x2'
+            conv_name3 = 'conv' + name + '_x3'
+            shortcut_name = name
+        else:
+            conv_name1 = name + "_branch2a"
+            conv_name2 = name + "_branch2b"
+            conv_name3 = name + "_branch2c"
+            shortcut_name = name + "_branch1"
+        return conv_name1, conv_name2, conv_name3, shortcut_name
+
+    def fix_layer_warp_name(self, stage_num, count, i):
+        name = 'res' + str(stage_num)
+        if count > 10 and stage_num == 4:
+            if i == 0:
+                conv_name = name + "a"
+            else:
+                conv_name = name + "b" + str(i)
+        else:
+            conv_name = name + chr(ord("a") + i)
+        if self.model_type == 'SEResNeXt':
+            conv_name = str(stage_num + 2) + '_' + str(i + 1)
+        return conv_name
+
+    def fix_c1_stage_name(self):
+        return "res_conv1" if self.model_type == 'ResNeXt' else "conv1"
--- a/ppdet/modeling/backbones/resnet.py
+++ b/ppdet/modeling/backbones/resnet.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.framework import Variable
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.core.workspace import register, serializable
+from numbers import Integral
+
+from .name_adapter import NameAdapter
+
+__all__ = ['ResNet', 'ResNetC5']
+
+
+@register
+@serializable
+class ResNet(object):
+    """
+    Residual Network, see https://arxiv.org/abs/1512.03385
+    Args:
+        depth (int): ResNet depth, should be 18, 34, 50, 101, 152.
+        freeze_at (int): freeze the backbone at which stage
+        norm_type (str): normalization type, 'bn'/'sync_bn'/'affine_channel'
+        freeze_norm (bool): freeze normalization layers
+        norm_decay (float): weight decay for normalization layer weights
+        variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
+        feature_maps (list): index of stages whose feature maps are returned
+    """
+
+    def __init__(self,
+                 depth=50,
+                 freeze_at=2,
+                 norm_type='affine_channel',
+                 freeze_norm=True,
+                 norm_decay=0.,
+                 variant='b',
+                 feature_maps=[2, 3, 4, 5]):
+        super(ResNet, self).__init__()
+
+        if isinstance(feature_maps, Integral):
+            feature_maps = [feature_maps]
+
+        assert depth in [18, 34, 50, 101, 152], \
+            "depth {} not in [18, 34, 50, 101, 152]"
+        assert variant in ['a', 'b', 'c', 'd'], "invalid ResNet variant"
+        assert 0 <= freeze_at <= 4, "freeze_at should be 0, 1, 2, 3 or 4"
+        assert len(feature_maps) > 0, "need one or more feature maps"
+        assert norm_type in ['bn', 'sync_bn', 'affine_channel']
+
+        self.depth = depth
+        self.freeze_at = freeze_at
+        self.norm_type = norm_type
+        self.norm_decay = norm_decay
+        self.freeze_norm = freeze_norm
+        self.variant = variant
+        self._model_type = 'ResNet'
+        self.feature_maps = feature_maps
+        self.depth_cfg = {
+            18: ([2, 2, 2, 2], self.basicblock),
+            34: ([3, 4, 6, 3], self.basicblock),
+            50: ([3, 4, 6, 3], self.bottleneck),
+            101: ([3, 4, 23, 3], self.bottleneck),
+            152: ([3, 8, 36, 3], self.bottleneck)
+        }
+        self.stage_filters = [64, 128, 256, 512]
+        self._c1_out_chan_num = 64
+        self.na = NameAdapter(self)
+
+    def _conv_norm(self,
+                   input,
+                   num_filters,
+                   filter_size,
+                   stride=1,
+                   groups=1,
+                   act=None,
+                   name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights"),
+            bias_attr=False,
+            name=name + '.conv2d.output.1')
+
+        bn_name = self.na.fix_conv_norm_name(name)
+
+        norm_lr = 0. if self.freeze_norm else 1.
+        norm_decay = self.norm_decay
+        pattr = ParamAttr(
+            name=bn_name + '_scale',
+            learning_rate=norm_lr,
+            regularizer=L2Decay(norm_decay))
+        battr = ParamAttr(
+            name=bn_name + '_offset',
+            learning_rate=norm_lr,
+            regularizer=L2Decay(norm_decay))
+
+        if self.norm_type in ['bn', 'sync_bn']:
+            out = fluid.layers.batch_norm(
+                input=conv,
+                act=act,
+                name=bn_name + '.output.1',
+                param_attr=pattr,
+                bias_attr=battr,
+                moving_mean_name=bn_name + '_mean',
+                moving_variance_name=bn_name + '_variance', )
+            scale = fluid.framework._get_var(pattr.name)
+            bias = fluid.framework._get_var(battr.name)
+        elif self.norm_type == 'affine_channel':
+            scale = fluid.layers.create_parameter(
+                shape=[conv.shape[1]],
+                dtype=conv.dtype,
+                attr=pattr,
+                default_initializer=fluid.initializer.Constant(1.))
+            bias = fluid.layers.create_parameter(
+                shape=[conv.shape[1]],
+                dtype=conv.dtype,
+                attr=battr,
+                default_initializer=fluid.initializer.Constant(0.))
+            out = fluid.layers.affine_channel(
+                x=conv, scale=scale, bias=bias, act=act)
+        if self.freeze_norm:
+            scale.stop_gradient = True
+            bias.stop_gradient = True
+        return out
+
+    def _shortcut(self, input, ch_out, stride, is_first, name):
+        max_pooling_in_short_cut = self.variant == 'd'
+        ch_in = input.shape[1]
+        # the naming rule is same as pretrained weight
+        name = self.na.fix_shortcut_name(name)
+        if ch_in != ch_out or stride != 1 or (self.depth < 50 and is_first):
+            if max_pooling_in_short_cut and not is_first:
+                input = fluid.layers.pool2d(
+                    input=input,
+                    pool_size=2,
+                    pool_stride=2,
+                    pool_padding=0,
+                    ceil_mode=True,
+                    pool_type='avg')
+                return self._conv_norm(input, ch_out, 1, 1, name=name)
+            return self._conv_norm(input, ch_out, 1, stride, name=name)
+        else:
+            return input
+
+    def bottleneck(self, input, num_filters, stride, is_first, name):
+        if self.variant == 'a':
+            stride1, stride2 = stride, 1
+        else:
+            stride1, stride2 = 1, stride
+
+        # ResNeXt
+        groups = getattr(self, 'groups', 1)
+        group_width = getattr(self, 'group_width', -1)
+        if groups == 1:
+            expand = 4
+        elif (groups * group_width) == 256:
+            expand = 1
+        else:  # FIXME hard code for now, handles 32x4d, 64x4d and 32x8d
+            num_filters = num_filters // 2
+            expand = 2
+
+        conv_name1, conv_name2, conv_name3, \
+            shortcut_name = self.na.fix_bottleneck_name(name)
+        conv_def = [[num_filters, 1, stride1, 'relu', 1, conv_name1],
+                    [num_filters, 3, stride2, 'relu', groups, conv_name2],
+                    [num_filters * expand, 1, 1, None, 1, conv_name3]]
+
+        residual = input
+        for (c, k, s, act, g, _name) in conv_def:
+            residual = self._conv_norm(
+                input=residual,
+                num_filters=c,
+                filter_size=k,
+                stride=s,
+                act=act,
+                groups=g,
+                name=_name)
+        short = self._shortcut(
+            input,
+            num_filters * expand,
+            stride,
+            is_first=is_first,
+            name=shortcut_name)
+        # Squeeze-and-Excitation
+        if callable(getattr(self, '_squeeze_excitation', None)):
+            residual = self._squeeze_excitation(
+                input=residual, num_channels=num_filters, name='fc' + name)
+        return fluid.layers.elementwise_add(
+            x=short, y=residual, act='relu', name=name + ".add.output.5")
+
+    def basicblock(self, input, num_filters, stride, is_first, name):
+        conv0 = self._conv_norm(
+            input=input,
+            num_filters=num_filters,
+            filter_size=3,
+            act='relu',
+            stride=stride,
+            name=name + "_branch2a")
+        conv1 = self._conv_norm(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            act=None,
+            name=name + "_branch2b")
+        short = self._shortcut(
+            input, num_filters, stride, is_first, name=name + "_branch1")
+        return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
+
+    def layer_warp(self, input, stage_num):
+        """
+        Args:
+            input (Variable): input variable.
+            stage_num (int): the stage number, should be 2, 3, 4, 5
+
+        Returns:
+            The last variable in endpoint-th stage.
+        """
+        assert stage_num in [2, 3, 4, 5]
+
+        stages, block_func = self.depth_cfg[self.depth]
+        count = stages[stage_num - 2]
+
+        ch_out = self.stage_filters[stage_num - 2]
+        is_first = False if stage_num != 2 else True
+        # Make the layer name and parameter name consistent
+        # with ImageNet pre-trained model
+        conv = input
+        for i in range(count):
+            conv_name = self.na.fix_layer_warp_name(stage_num, count, i)
+            if self.depth < 50:
+                is_first = True if i == 0 and stage_num == 2 else False
+            conv = block_func(
+                input=conv,
+                num_filters=ch_out,
+                stride=2 if i == 0 and stage_num != 2 else 1,
+                is_first=is_first,
+                name=conv_name)
+        return conv
+
+    def c1_stage(self, input):
+        out_chan = self._c1_out_chan_num
+
+        conv1_name = self.na.fix_c1_stage_name()
+
+        if self.variant in ['c', 'd']:
+            conv_def = [
+                [out_chan // 2, 3, 2, "conv1_1"],
+                [out_chan // 2, 3, 1, "conv1_2"],
+                [out_chan, 3, 1, "conv1_3"],
+            ]
+        else:
+            conv_def = [[out_chan, 7, 2, conv1_name]]
+
+        for (c, k, s, _name) in conv_def:
+            input = self._conv_norm(
+                input=input,
+                num_filters=c,
+                filter_size=k,
+                stride=s,
+                act='relu',
+                name=_name)
+
+        output = fluid.layers.pool2d(
+            input=input,
+            pool_size=3,
+            pool_stride=2,
+            pool_padding=1,
+            pool_type='max')
+        return output
+
+    def __call__(self, input):
+        assert isinstance(input, Variable)
+        assert not (set(self.feature_maps) - set([2, 3, 4, 5])), \
+            "feature maps {} not in [2, 3, 4, 5]".format(self.feature_maps)
+
+        res_endpoints = []
+
+        res = input
+        feature_maps = self.feature_maps
+        severed_head = getattr(self, 'severed_head', False)
+        if not severed_head:
+            res = self.c1_stage(res)
+            feature_maps = range(2, max(self.feature_maps) + 1)
+
+        for i in feature_maps:
+            res = self.layer_warp(res, i)
+            if i in self.feature_maps:
+                res_endpoints.append(res)
+            if self.freeze_at >= i:
+                res.stop_gradient = True
+
+        return OrderedDict([('res{}_sum'.format(self.feature_maps[idx]), feat)
+                            for idx, feat in enumerate(res_endpoints)])
+
+
+@register
+@serializable
+class ResNetC5(ResNet):
+    __doc__ = ResNet.__doc__
+
+    def __init__(self,
+                 depth=50,
+                 freeze_at=2,
+                 norm_type='affine_channel',
+                 freeze_norm=True,
+                 norm_decay=0.,
+                 variant='b',
+                 feature_maps=[5]):
+        super(ResNetC5, self).__init__(
+            depth, freeze_at, norm_type, freeze_norm, norm_decay,
+            variant, feature_maps)
+        self.severed_head = True
--- a/ppdet/modeling/backbones/resnext.py
+++ b/ppdet/modeling/backbones/resnext.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from ppdet.core.workspace import register, serializable
+from .resnet import ResNet
+
+__all__ = ['ResNeXt']
+
+
+@register
+@serializable
+class ResNeXt(ResNet):
+    """
+    ResNeXt, see https://arxiv.org/abs/1611.05431
+    Args:
+        depth (int): network depth, should be 50, 101, 152.
+        groups (int): group convolution cardinality
+        group_width (int): width of each group convolution
+        freeze_at (int): freeze the backbone at which stage
+        norm_type (str): normalization type, 'bn', 'sync_bn' or 'affine_channel'
+        freeze_norm (bool): freeze normalization layers
+        norm_decay (float): weight decay for normalization layer weights
+        variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
+        feature_maps (list): index of the stages whose feature maps are returned
+    """
+
+    def __init__(self,
+                 depth=50,
+                 groups=64,
+                 group_width=4,
+                 freeze_at=2,
+                 norm_type='affine_channel',
+                 freeze_norm=True,
+                 norm_decay=True,
+                 variant='a',
+                 feature_maps=[2, 3, 4, 5]):
+        assert depth in [50, 101, 152], "depth {} should be 50, 101 or 152"
+        super(ResNeXt, self).__init__(depth, freeze_at, norm_type, freeze_norm,
+                                      norm_decay, variant, feature_maps)
+        self.depth_cfg = {
+            50: ([3, 4, 6, 3], self.bottleneck),
+            101: ([3, 4, 23, 3], self.bottleneck),
+            152: ([3, 8, 36, 3], self.bottleneck)
+        }
+        self.stage_filters = [256, 512, 1024, 2048]
+        self.groups = groups
+        self.group_width = group_width
+        self._model_type = 'ResNeXt'
+
+
+@register
+@serializable
+class ResNeXtC5(ResNeXt):
+    __doc__ = ResNeXt.__doc__
+
+    def __init__(self,
+                 depth=50,
+                 groups=64,
+                 group_width=4,
+                 freeze_at=2,
+                 norm_type='affine_channel',
+                 freeze_norm=True,
+                 norm_decay=True,
+                 variant='a',
+                 feature_maps=[5]):
+        super(ResNeXtC5, self).__init__(depth, groups, group_width, freeze_at,
+                                        norm_type, freeze_norm, norm_decay,
+                                        variant, feature_maps)
+        self.severed_head = True
--- a/ppdet/modeling/backbones/senet.py
+++ b/ppdet/modeling/backbones/senet.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import math
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+
+from ppdet.core.workspace import register, serializable
+from .resnext import ResNeXt
+
+__all__ = ['SENet', 'SENetC5']
+
+
+@register
+@serializable
+class SENet(ResNeXt):
+    """
+    Squeeze-and-Excitation Networks, see https://arxiv.org/abs/1709.01507
+    Args:
+        depth (int): SENet depth, should be 50, 101, 152
+        groups (int): group convolution cardinality
+        group_width (int): width of each group convolution
+        freeze_at (int): freeze the backbone at which stage
+        norm_type (str): normalization type, 'bn', 'sync_bn' or 'affine_channel'
+        freeze_norm (bool): freeze normalization layers
+        norm_decay (float): weight decay for normalization layer weights
+        variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
+        feature_maps (list): index of the stages whose feature maps are returned
+    """
+
+    def __init__(self,
+                 depth=50,
+                 groups=64,
+                 group_width=4,
+                 freeze_at=2,
+                 norm_type='affine_channel',
+                 freeze_norm=True,
+                 norm_decay=0.,
+                 variant='d',
+                 feature_maps=[2, 3, 4, 5]):
+        super(SENet, self).__init__(depth, groups, group_width, freeze_at,
+                                    norm_type, freeze_norm, norm_decay, variant,
+                                    feature_maps)
+        if depth < 152:
+            self.stage_filters = [128, 256, 512, 1024]
+        else:
+            self.stage_filters = [256, 512, 1024, 2048]
+        self.reduction_ratio = 16
+        self._c1_out_chan_num = 128
+        self._model_type = 'SEResNeXt'
+
+    def _squeeze_excitation(self, input, num_channels, name=None):
+        pool = fluid.layers.pool2d(
+            input=input,
+            pool_size=0,
+            pool_type='avg',
+            global_pooling=True,
+            use_cudnn=False)
+        stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+        squeeze = fluid.layers.fc(
+            input=pool,
+            size=int(num_channels / self.reduction_ratio),
+            act='relu',
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.Uniform(-stdv, stdv),
+                name=name + '_sqz_weights'),
+            bias_attr=ParamAttr(name=name + '_sqz_offset'))
+        stdv = 1.0 / math.sqrt(squeeze.shape[1] * 1.0)
+        excitation = fluid.layers.fc(
+            input=squeeze,
+            size=num_channels,
+            act='sigmoid',
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.Uniform(-stdv, stdv),
+                name=name + '_exc_weights'),
+            bias_attr=ParamAttr(name=name + '_exc_offset'))
+        scale = fluid.layers.elementwise_mul(x=input, y=excitation, axis=0)
+        return scale
+
+
+@register
+@serializable
+class SENetC5(SENet):
+    __doc__ = SENet.__doc__
+
+    def __init__(self,
+                 depth=50,
+                 groups=64,
+                 group_width=4,
+                 freeze_at=2,
+                 norm_type='affine_channel',
+                 freeze_norm=True,
+                 norm_decay=0.,
+                 variant='d',
+                 feature_maps=[5]):
+        super(SENetC5, self).__init__(depth, groups, group_width, freeze_at,
+                                      norm_type, freeze_norm, norm_decay,
+                                      variant, feature_maps)
+        self.severed_head = True
--- a/ppdet/modeling/model_input.py
+++ b/ppdet/modeling/model_input.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+from collections import OrderedDict
+
+from paddle import fluid
+
+__all__ = ['create_feed']
+
+# yapf: disable
+feed_var_def = [
+    {'name': 'im_info',       'shape': [3],  'dtype': 'float32', 'lod_level': 0},
+    {'name': 'im_id',         'shape': [1],  'dtype': 'int32',   'lod_level': 0},
+    {'name': 'gt_box',        'shape': [4],  'dtype': 'float32', 'lod_level': 1},
+    {'name': 'gt_label',      'shape': [1],  'dtype': 'int32',   'lod_level': 1},
+    {'name': 'is_crowd',      'shape': [1],  'dtype': 'int32',   'lod_level': 1},
+    {'name': 'gt_mask',       'shape': [2],  'dtype': 'float32', 'lod_level': 3},
+    {'name': 'is_difficult',  'shape': [1],  'dtype': 'int32',   'lod_level': 1},
+    {'name': 'gt_score',      'shape': [1],  'dtype': 'float32', 'lod_level': 0},
+    {'name': 'im_shape',      'shape': [3],  'dtype': 'float32',   'lod_level': 0},
+]
+# yapf: enable
+
+
+def create_feed(feed, use_pyreader=True):
+    image_shape = feed.image_shape
+    feed_var_map = {var['name']: var for var in feed_var_def}
+    feed_var_map['image'] = {
+        'name': 'image',
+        'shape': image_shape,
+        'dtype': 'float32',
+        'lod_level': 0
+    }
+
+    # YOLO var dim is fixed
+    if getattr(feed, 'num_max_boxes', None) is not None:
+        feed_var_map['gt_label']['shape'] = [feed.num_max_boxes]
+        feed_var_map['gt_score']['shape'] = [feed.num_max_boxes]
+        feed_var_map['gt_box']['shape'] = [feed.num_max_boxes, 4]
+        feed_var_map['gt_label']['lod_level'] = 0
+        feed_var_map['gt_score']['lod_level'] = 0
+        feed_var_map['gt_box']['lod_level'] = 0
+        feed_var_map['im_shape']['shape'] = [2]
+        feed_var_map['im_shape']['dtype'] = 'int32'
+
+    feed_vars = OrderedDict([(key, fluid.layers.data(
+        name=feed_var_map[key]['name'],
+        shape=feed_var_map[key]['shape'],
+        dtype=feed_var_map[key]['dtype'],
+        lod_level=feed_var_map[key]['lod_level'])) for key in feed.fields])
+
+    pyreader = None
+    if use_pyreader:
+        pyreader = fluid.io.PyReader(
+            feed_list=list(feed_vars.values()),
+            capacity=64,
+            use_double_buffer=True,
+            iterable=False)
+    return pyreader, feed_vars
--- a/ppdet/modeling/ops.py
+++ b/ppdet/modeling/ops.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from numbers import Integral
+
+from paddle import fluid
+from ppdet.core.workspace import register, serializable
+
+__all__ = [
+    'AnchorGenerator', 'RPNTargetAssign', 'GenerateProposals', 'MultiClassNMS',
+    'BBoxAssigner', 'MaskAssigner', 'RoIAlign', 'RoIPool', 'MultiBoxHead',
+    'SSDOutputDecoder', 'SSDMetric', 'RetinaTargetAssign', 'RetinaOutputDecoder'
+]
+
+
+@register
+@serializable
+class AnchorGenerator(object):
+    __op__ = fluid.layers.anchor_generator
+    __append_doc__ = True
+
+    def __init__(self,
+                 stride=[16.0, 16.0],
+                 anchor_sizes=[32, 64, 128, 256, 512],
+                 aspect_ratios=[0.5, 1., 2.],
+                 variance=[1., 1., 1., 1.]):
+        super(AnchorGenerator, self).__init__()
+        self.anchor_sizes = anchor_sizes
+        self.aspect_ratios = aspect_ratios
+        self.variance = variance
+        self.stride = stride
+
+
+@register
+@serializable
+class RPNTargetAssign(object):
+    __op__ = fluid.layers.rpn_target_assign
+    __append_doc__ = True
+
+    def __init__(self,
+                 rpn_batch_size_per_im=256,
+                 rpn_straddle_thresh=0.,
+                 rpn_fg_fraction=0.5,
+                 rpn_positive_overlap=0.7,
+                 rpn_negative_overlap=0.3,
+                 use_random=True):
+        super(RPNTargetAssign, self).__init__()
+        self.rpn_batch_size_per_im = rpn_batch_size_per_im
+        self.rpn_straddle_thresh = rpn_straddle_thresh
+        self.rpn_fg_fraction = rpn_fg_fraction
+        self.rpn_positive_overlap = rpn_positive_overlap
+        self.rpn_negative_overlap = rpn_negative_overlap
+        self.use_random = use_random
+
+
+@register
+@serializable
+class GenerateProposals(object):
+    __op__ = fluid.layers.generate_proposals
+    __append_doc__ = True
+
+    def __init__(self,
+                 pre_nms_top_n=6000,
+                 post_nms_top_n=1000,
+                 nms_thresh=.5,
+                 min_size=.1,
+                 eta=1.):
+        super(GenerateProposals, self).__init__()
+        self.pre_nms_top_n = pre_nms_top_n
+        self.post_nms_top_n = post_nms_top_n
+        self.nms_thresh = nms_thresh
+        self.min_size = min_size
+        self.eta = eta
+
+
+@register
+class MaskAssigner(object):
+    __op__ = fluid.layers.generate_mask_labels
+    __append_doc__ = True
+
+    def __init__(self, num_classes=81, resolution=14):
+        super(MaskAssigner, self).__init__()
+        self.num_classes = num_classes
+        self.resolution = resolution
+
+
+@register
+@serializable
+class MultiClassNMS(object):
+    __op__ = fluid.layers.multiclass_nms
+    __append_doc__ = True
+
+    def __init__(self,
+                 score_threshold=.05,
+                 nms_top_k=-1,
+                 keep_top_k=100,
+                 nms_threshold=.5,
+                 normalized=False,
+                 nms_eta=1.0,
+                 background_label=0):
+        super(MultiClassNMS, self).__init__()
+        self.score_threshold = score_threshold
+        self.nms_top_k = nms_top_k
+        self.keep_top_k = keep_top_k
+        self.nms_threshold = nms_threshold
+        self.normalized = normalized
+        self.nms_eta = nms_eta
+        self.background_label = background_label
+
+
+@register
+class BBoxAssigner(object):
+    __op__ = fluid.layers.generate_proposal_labels
+    __append_doc__ = True
+
+    def __init__(self,
+                 batch_size_per_im=512,
+                 fg_fraction=.25,
+                 fg_thresh=.5,
+                 bg_thresh_hi=.5,
+                 bg_thresh_lo=0.,
+                 bbox_reg_weights=[0.1, 0.1, 0.2, 0.2],
+                 num_classes=81,
+                 shuffle_before_sample=True):
+        super(BBoxAssigner, self).__init__()
+        self.batch_size_per_im = batch_size_per_im
+        self.fg_fraction = fg_fraction
+        self.fg_thresh = fg_thresh
+        self.bg_thresh_hi = bg_thresh_hi
+        self.bg_thresh_lo = bg_thresh_lo
+        self.bbox_reg_weights = bbox_reg_weights
+        self.class_nums = num_classes
+        self.use_random = shuffle_before_sample
+
+
+@register
+class RoIAlign(object):
+    __op__ = fluid.layers.roi_align
+    __append_doc__ = True
+
+    def __init__(self, resolution=7, spatial_scale=1. / 16, sampling_ratio=0):
+        super(RoIAlign, self).__init__()
+        if isinstance(resolution, Integral):
+            resolution = [resolution, resolution]
+        self.pooled_height = resolution[0]
+        self.pooled_width = resolution[1]
+        self.spatial_scale = spatial_scale
+        self.sampling_ratio = sampling_ratio
+
+
+@register
+class RoIPool(object):
+    __op__ = fluid.layers.roi_pool
+    __append_doc__ = True
+
+    def __init__(self, resolution=7, spatial_scale=1. / 16):
+        super(RoIPool, self).__init__()
+        if isinstance(resolution, Integral):
+            resolution = [resolution, resolution]
+        self.pooled_height = resolution[0]
+        self.pooled_width = resolution[1]
+        self.spatial_scale = spatial_scale
+
+
+@register
+class MultiBoxHead(object):
+    __op__ = fluid.layers.multi_box_head
+    __append_doc__ = True
+
+    def __init__(self,
+                 min_ratio=20,
+                 max_ratio=90,
+                 min_sizes=[60.0, 105.0, 150.0, 195.0, 240.0, 285.0],
+                 max_sizes=[[], 150.0, 195.0, 240.0, 285.0, 300.0],
+                 aspect_ratios=[[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.],
+                                [2., 3.]],
+                 base_size=300,
+                 offset=0.5,
+                 flip=True):
+        super(MultiBoxHead, self).__init__()
+        self.min_ratio = min_ratio
+        self.max_ratio = max_ratio
+        self.min_sizes = min_sizes
+        self.max_sizes = max_sizes
+        self.aspect_ratios = aspect_ratios
+        self.base_size = base_size
+        self.offset = offset
+        self.flip = flip
+
+
+@register
+@serializable
+class SSDOutputDecoder(object):
+    __op__ = fluid.layers.detection_output
+    __append_doc__ = True
+
+    def __init__(self,
+                 nms_threshold=0.45,
+                 nms_top_k=400,
+                 keep_top_k=200,
+                 score_threshold=0.01,
+                 nms_eta=1.0,
+                 background_label=0):
+        super(SSDOutputDecoder, self).__init__()
+        self.nms_threshold = nms_threshold
+        self.background_label = background_label
+        self.nms_top_k = nms_top_k
+        self.keep_top_k = keep_top_k
+        self.score_threshold = score_threshold
+        self.nms_eta = nms_eta
+
+
+@register
+@serializable
+class SSDMetric(object):
+    __op__ = fluid.metrics.DetectionMAP
+    __append_doc__ = True
+
+    def __init__(self,
+                 overlap_threshold=0.5,
+                 evaluate_difficult=False,
+                 ap_version='integral'):
+        super(SSDMetric, self).__init__()
+        self.overlap_threshold = overlap_threshold
+        self.evaluate_difficult = evaluate_difficult
+        self.ap_version = ap_version
+
+
+@register
+@serializable
+class RetinaTargetAssign(object):
+    __op__ = fluid.layers.retinanet_target_assign
+    __append_doc__ = True
+
+    def __init__(self, positive_overlap=0.5, negative_overlap=0.4):
+        super(RetinaTargetAssign, self).__init__()
+        self.positive_overlap = positive_overlap
+        self.negative_overlap = negative_overlap
+
+
+@register
+@serializable
+class RetinaOutputDecoder(object):
+    __op__ = fluid.layers.retinanet_detection_output
+    __append_doc__ = True
+
+    def __init__(self,
+                 score_thresh=0.05,
+                 nms_thresh=0.3,
+                 pre_nms_top_n=1000,
+                 detections_per_im=100,
+                 nms_eta=1.0):
+        super(RetinaOutputDecoder, self).__init__()
+        self.score_threshold = score_thresh
+        self.nms_threshold = nms_thresh
+        self.nms_top_k = pre_nms_top_n
+        self.keep_top_k = detections_per_im
+        self.nms_eta = nms_eta
--- a/ppdet/modeling/roi_extractors/__init__.py
+++ b/ppdet/modeling/roi_extractors/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+from . import roi_extractor
+from .roi_extractor import *
--- a/ppdet/modeling/roi_extractors/roi_extractor.py
+++ b/ppdet/modeling/roi_extractors/roi_extractor.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle.fluid as fluid
+
+from ppdet.core.workspace import register
+from ppdet.modeling.ops import RoIAlign, RoIPool
+
+__all__ = ['RoIPool', 'RoIAlign', 'FPNRoIAlign']
+
+
+@register
+class FPNRoIAlign(object):
+    """
+    RoI align pooling for FPN feature maps
+    Args:
+        pooled_height (int): output height
+        pooled_height (int): output width
+        sampling_ratio (int): number of sampling points
+        min_level (int): lowest level of FPN layer
+        max_level (int): highest level of FPN layer
+        canconical_level (int): the canconical FPN feature map level
+        canonical_size (int): the canconical FPN feature map size
+    """
+
+    def __init__(self,
+                 sampling_ratio=0,
+                 min_level=2,
+                 max_level=5,
+                 canconical_level=4,
+                 canonical_size=224,
+                 box_resolution=7,
+                 mask_resolution=14):
+        super(FPNRoIAlign, self).__init__()
+        self.sampling_ratio = sampling_ratio
+        self.min_level = min_level
+        self.max_level = max_level
+        self.canconical_level = canconical_level
+        self.canonical_size = canonical_size
+        self.box_resolution = box_resolution
+        self.mask_resolution = mask_resolution
+
+    def __call__(self, head_inputs, rois, spatial_scale, is_mask=False):
+        """
+        Adopt RoI align onto several level of feature maps to get RoI features.
+        Distribute RoIs to different levels by area and get a list of RoI
+        features by distributed RoIs and their corresponding feature maps.
+
+        Returns:
+            roi_feat(Variable): RoI features with shape of [M, C, R, R],
+                where M is the number of RoIs and R is RoI resolution
+
+        """
+        k_min = self.min_level
+        k_max = self.max_level
+        num_roi_lvls = k_max - k_min + 1
+        name_list = list(head_inputs.keys())
+        input_name_list = name_list[-num_roi_lvls:]
+        spatial_scale = spatial_scale[-num_roi_lvls:]
+        rois_dist, restore_index = fluid.layers.distribute_fpn_proposals(
+            rois, k_min, k_max, self.canconical_level, self.canonical_size)
+        # rois_dist is in ascend order
+        roi_out_list = []
+        resolution = is_mask and self.mask_resolution or self.box_resolution
+        for lvl in range(num_roi_lvls):
+            name_index = num_roi_lvls - lvl - 1
+            rois_input = rois_dist[lvl]
+            head_input = head_inputs[input_name_list[name_index]]
+            sc = spatial_scale[name_index]
+            roi_out = fluid.layers.roi_align(
+                input=head_input,
+                rois=rois_input,
+                pooled_height=resolution,
+                pooled_width=resolution,
+                spatial_scale=sc,
+                sampling_ratio=self.sampling_ratio)
+            roi_out_list.append(roi_out)
+        roi_feat_shuffle = fluid.layers.concat(roi_out_list)
+        roi_feat_ = fluid.layers.gather(roi_feat_shuffle, restore_index)
+        roi_feat = fluid.layers.lod_reset(roi_feat_, rois)
+
+        return roi_feat
--- a/ppdet/modeling/roi_heads/__init__.py
+++ b/ppdet/modeling/roi_heads/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+from . import bbox_head
+from . import mask_head
+from . import cascade_head
+
+from .bbox_head import *
+from .mask_head import *
+from .cascade_head import *
--- a/ppdet/modeling/roi_heads/bbox_head.py
+++ b/ppdet/modeling/roi_heads/bbox_head.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Normal, Xavier
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.modeling.ops import MultiClassNMS
+from ppdet.core.workspace import register, serializable
+
+__all__ = ['BBoxHead', 'TwoFCHead']
+
+
+@register
+@serializable
+class BoxCoder(object):
+    __op__ = fluid.layers.box_coder
+    __append_doc__ = True
+
+    def __init__(self,
+                 prior_box_var=[0.1, 0.1, 0.2, 0.2],
+                 code_type='decode_center_size',
+                 box_normalized=False,
+                 axis=1):
+        super(BoxCoder, self).__init__()
+        self.prior_box_var = prior_box_var
+        self.code_type = code_type
+        self.box_normalized = box_normalized
+        self.axis = axis
+
+
+@register
+class TwoFCHead(object):
+    """
+    RCNN head with two Fully Connected layers
+
+    Args:
+        num_chan (int): num of filters for the fc layers
+    """
+
+    def __init__(self, num_chan=1024):
+        super(TwoFCHead, self).__init__()
+        self.num_chan = num_chan
+
+    def __call__(self, roi_feat):
+        fan = roi_feat.shape[1] * roi_feat.shape[2] * roi_feat.shape[3]
+        fc6 = fluid.layers.fc(input=roi_feat,
+                              size=self.num_chan,
+                              act='relu',
+                              name='fc6',
+                              param_attr=ParamAttr(
+                                  name='fc6_w',
+                                  initializer=Xavier(fan_out=fan)),
+                              bias_attr=ParamAttr(
+                                  name='fc6_b',
+                                  learning_rate=2.,
+                                  regularizer=L2Decay(0.)))
+        head_feat = fluid.layers.fc(input=fc6,
+                                    size=self.num_chan,
+                                    act='relu',
+                                    name='fc7',
+                                    param_attr=ParamAttr(
+                                        name='fc7_w', initializer=Xavier()),
+                                    bias_attr=ParamAttr(
+                                        name='fc7_b',
+                                        learning_rate=2.,
+                                        regularizer=L2Decay(0.)))
+        return head_feat
+
+
+@register
+class BBoxHead(object):
+    """
+    RCNN bbox head
+
+    Args:
+        head (object): the head module instance, e.g., `ResNetC5` or `TwoFCHead`
+        box_coder (object): `BoxCoder` instance
+        nms (object): `MultiClassNMS` instance
+        num_classes: number of output classes
+    """
+    __inject__ = ['head', 'box_coder', 'nms']
+
+    def __init__(self,
+                 head,
+                 box_coder=BoxCoder().__dict__,
+                 nms=MultiClassNMS().__dict__,
+                 num_classes=81):
+        super(BBoxHead, self).__init__()
+        self.head = head
+        self.num_classes = num_classes
+        self.box_coder = box_coder
+        self.nms = nms
+        if isinstance(box_coder, dict):
+            self.box_coder = BoxCoder(**box_coder)
+        if isinstance(nms, dict):
+            self.nms = MultiClassNMS(**nms)
+        self.head_feat = None
+
+    def get_head_feat(self, input=None):
+        """
+        Get the bbox head feature map.
+        """
+
+        if input is not None:
+            feat = self.head(input)
+            if isinstance(feat, OrderedDict):
+                feat = list(feat.values())[0]
+            self.head_feat = feat
+        return self.head_feat
+
+    def _get_output(self, roi_feat):
+        """
+        Get bbox head output.
+
+        Args:
+            roi_feat (Variable): RoI feature from RoIExtractor.
+
+        Returns:
+            cls_score(Variable): Output of rpn head with shape of
+                [N, num_anchors, H, W].
+            bbox_pred(Variable): Output of rpn head with shape of
+                [N, num_anchors * 4, H, W].
+        """
+        head_feat = self.get_head_feat(roi_feat)
+        # when ResNetC5 output a single feature map
+        if not isinstance(self.head, TwoFCHead):
+            head_feat = fluid.layers.pool2d(
+                head_feat, pool_type='avg', global_pooling=True)
+        cls_score = fluid.layers.fc(input=head_feat,
+                                    size=self.num_classes,
+                                    act=None,
+                                    name='cls_score',
+                                    param_attr=ParamAttr(
+                                        name='cls_score_w',
+                                        initializer=Normal(
+                                            loc=0.0, scale=0.01)),
+                                    bias_attr=ParamAttr(
+                                        name='cls_score_b',
+                                        learning_rate=2.,
+                                        regularizer=L2Decay(0.)))
+        bbox_pred = fluid.layers.fc(input=head_feat,
+                                    size=4 * self.num_classes,
+                                    act=None,
+                                    name='bbox_pred',
+                                    param_attr=ParamAttr(
+                                        name='bbox_pred_w',
+                                        initializer=Normal(
+                                            loc=0.0, scale=0.001)),
+                                    bias_attr=ParamAttr(
+                                        name='bbox_pred_b',
+                                        learning_rate=2.,
+                                        regularizer=L2Decay(0.)))
+        return cls_score, bbox_pred
+
+    def get_loss(self, roi_feat, labels_int32, bbox_targets,
+                 bbox_inside_weights, bbox_outside_weights):
+        """
+        Get bbox_head loss.
+
+        Args:
+            roi_feat (Variable): RoI feature from RoIExtractor.
+            labels_int32(Variable): Class label of a RoI with shape [P, 1].
+                P is the number of RoI.
+            bbox_targets(Variable): Box label of a RoI with shape
+                [P, 4 * class_nums].
+            bbox_inside_weights(Variable): Indicates whether a box should
+                contribute to loss. Same shape as bbox_targets.
+            bbox_outside_weights(Variable): Indicates whether a box should
+                contribute to loss. Same shape as bbox_targets.
+
+        Return:
+            Type: Dict
+                loss_cls(Variable): bbox_head loss.
+                loss_bbox(Variable): bbox_head loss.
+        """
+
+        cls_score, bbox_pred = self._get_output(roi_feat)
+
+        labels_int64 = fluid.layers.cast(x=labels_int32, dtype='int64')
+        labels_int64.stop_gradient = True
+        loss_cls = fluid.layers.softmax_with_cross_entropy(
+            logits=cls_score, label=labels_int64, numeric_stable_mode=True)
+        loss_cls = fluid.layers.reduce_mean(loss_cls)
+        loss_bbox = fluid.layers.smooth_l1(
+            x=bbox_pred,
+            y=bbox_targets,
+            inside_weight=bbox_inside_weights,
+            outside_weight=bbox_outside_weights,
+            sigma=1.0)
+        loss_bbox = fluid.layers.reduce_mean(loss_bbox)
+        return {'loss_cls': loss_cls, 'loss_bbox': loss_bbox}
+
+    def get_prediction(self, roi_feat, rois, im_info, im_shape):
+        """
+        Get prediction bounding box in test stage.
+
+        Args:
+            rois (Variable): Output of generate_proposals in rpn head.
+            im_info (Variable): A 2-D LoDTensor with shape [B, 3]. B is the
+                number of input images, each element consists of im_height,
+                im_width, im_scale.
+            cls_score (Variable), bbox_pred(Variable): Output of get_output.
+
+        Returns:
+            pred_result(Variable): Prediction result with shape [N, 6]. Each
+                row has 6 values: [label, confidence, xmin, ymin, xmax, ymax].
+                N is the total number of prediction.
+        """
+        cls_score, bbox_pred = self._get_output(roi_feat)
+
+        im_scale = fluid.layers.slice(im_info, [1], starts=[2], ends=[3])
+        im_scale = fluid.layers.sequence_expand(im_scale, rois)
+        boxes = rois / im_scale
+        cls_prob = fluid.layers.softmax(cls_score, use_cudnn=False)
+        bbox_pred = fluid.layers.reshape(bbox_pred, (-1, self.num_classes, 4))
+        decoded_box = self.box_coder(prior_box=boxes, target_box=bbox_pred)
+        cliped_box = fluid.layers.box_clip(input=decoded_box, im_info=im_shape)
+        pred_result = self.nms(bboxes=cliped_box, scores=cls_prob)
+        return {'bbox': pred_result}
--- a/ppdet/modeling/roi_heads/cascade_head.py
+++ b/ppdet/modeling/roi_heads/cascade_head.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Normal, Xavier
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.modeling.ops import MultiClassNMS
+from ppdet.core.workspace import register
+
+__all__ = ['CascadeBBoxHead']
+
+
+@register
+class CascadeBBoxHead(object):
+    """
+    Cascade RCNN bbox head
+
+    Args:
+        head (object): the head module instance
+        nms (object): `MultiClassNMS` instance
+        num_classes: number of output classes
+    """
+    __inject__ = ['head', 'nms']
+
+    def __init__(self, head, nms=MultiClassNMS().__dict__, num_classes=81):
+        super(CascadeBBoxHead, self).__init__()
+        self.head = head
+        self.nms = nms
+        self.num_classes = num_classes
+        if isinstance(nms, dict):
+            self.nms = MultiClassNMS(**nms)
+
+    def get_output(self,
+                   roi_feat,
+                   cls_agnostic_bbox_reg=2,
+                   wb_scalar=2.0,
+                   name=''):
+        """
+        Get bbox head output.
+
+        Args:
+            roi_feat (Variable): RoI feature from RoIExtractor.
+            cls_agnostic_bbox_reg(Int): BBox regressor are class agnostic.
+            wb_scalar(Float): Weights and Bias's learning rate.
+            name(String): Layer's name
+
+        Returns:
+            cls_score(Variable): cls score.
+            bbox_pred(Variable): bbox regression.
+        """
+        head_feat = self.head(roi_feat, wb_scalar, name)
+        cls_score = fluid.layers.fc(input=head_feat,
+                                    size=self.num_classes,
+                                    act=None,
+                                    name='cls_score' + name,
+                                    param_attr=ParamAttr(
+                                        name='cls_score%s_w' % name,
+                                        initializer=Normal(
+                                            loc=0.0, scale=0.01),
+                                        learning_rate=wb_scalar),
+                                    bias_attr=ParamAttr(
+                                        name='cls_score%s_b' % name,
+                                        learning_rate=wb_scalar,
+                                        regularizer=L2Decay(0.)))
+        bbox_pred = fluid.layers.fc(input=head_feat,
+                                    size=4 * cls_agnostic_bbox_reg,
+                                    act=None,
+                                    name='bbox_pred' + name,
+                                    param_attr=ParamAttr(
+                                        name='bbox_pred%s_w' % name,
+                                        initializer=Normal(
+                                            loc=0.0, scale=0.001),
+                                        learning_rate=wb_scalar),
+                                    bias_attr=ParamAttr(
+                                        name='bbox_pred%s_b' % name,
+                                        learning_rate=wb_scalar,
+                                        regularizer=L2Decay(0.)))
+        return cls_score, bbox_pred
+
+    def get_loss(self, rcnn_pred_list, rcnn_target_list, rcnn_loss_weight_list):
+        """
+        Get bbox_head loss.
+
+        Args:
+            rcnn_pred_list(List): Cascade RCNN's head's output including
+                bbox_pred and cls_score
+            rcnn_target_list(List): Cascade rcnn's bbox and label target
+            rcnn_loss_weight_list(List): The weight of location and class loss
+
+        Return:
+            loss_cls(Variable): bbox_head loss.
+            loss_bbox(Variable): bbox_head loss.
+        """
+        loss_dict = {}
+        for i, (rcnn_pred, rcnn_target
+                ) in enumerate(zip(rcnn_pred_list, rcnn_target_list)):
+            labels_int64 = fluid.layers.cast(x=rcnn_target[1], dtype='int64')
+            labels_int64.stop_gradient = True
+
+            loss_cls = fluid.layers.softmax_with_cross_entropy(
+                logits=rcnn_pred[0],
+                label=labels_int64,
+                numeric_stable_mode=True, )
+            loss_cls = fluid.layers.reduce_mean(
+                loss_cls, name='loss_cls_' + str(i)) * rcnn_loss_weight_list[i]
+
+            loss_bbox = fluid.layers.smooth_l1(
+                x=rcnn_pred[1],
+                y=rcnn_target[2],
+                inside_weight=rcnn_target[3],
+                outside_weight=rcnn_target[4],
+                sigma=1.0,  # detectron use delta = 1./sigma**2
+            )
+            loss_bbox = fluid.layers.reduce_mean(
+                loss_bbox,
+                name='loss_bbox_' + str(i)) * rcnn_loss_weight_list[i]
+
+            loss_dict['loss_cls_%d' % i] = loss_cls
+            loss_dict['loss_loc_%d' % i] = loss_bbox
+
+        return loss_dict
+
+    def get_prediction(self,
+                       im_info,
+                       roi_feat_list,
+                       rcnn_pred_list,
+                       proposal_list,
+                       cascade_bbox_reg_weights,
+                       cls_agnostic_bbox_reg=2):
+        """
+        Get prediction bounding box in test stage.
+        :
+        Args:
+            im_info (Variable): A 2-D LoDTensor with shape [B, 3]. B is the
+                number of input images, each element consists
+                of im_height, im_width, im_scale.
+            rois_feat_list (List): RoI feature from RoIExtractor.
+            rcnn_pred_list (Variable): Cascade rcnn's head's output
+                including bbox_pred and cls_score
+            proposal_list (List): RPN proposal boxes.
+            cascade_bbox_reg_weights (List): BBox decode var.
+            cls_agnostic_bbox_reg(Int): BBox regressor are class agnostic
+
+        Returns:
+            pred_result(Variable): Prediction result with shape [N, 6]. Each
+               row has 6 values: [label, confidence, xmin, ymin, xmax, ymax].
+               N is the total number of prediction.
+        """
+        self.im_scale = fluid.layers.slice(im_info, [1], starts=[2], ends=[3])
+        boxes_cls_prob_l = []
+
+        rcnn_pred = rcnn_pred_list[-1]  # stage 3
+        repreat_num = 1
+        repreat_num = 3
+        bbox_reg_w = cascade_bbox_reg_weights[-1]
+        for i in range(repreat_num):
+            # cls score
+            if i < 2:
+                cls_score = self._head_share(
+                    roi_feat_list[-1],  # roi_feat_3
+                    name='_' + str(i + 1) if i > 0 else '')
+            else:
+                cls_score = rcnn_pred[0]
+            cls_prob = fluid.layers.softmax(cls_score, use_cudnn=False)
+            boxes_cls_prob_l.append(cls_prob)
+
+        boxes_cls_prob_mean = (
+            boxes_cls_prob_l[0] + boxes_cls_prob_l[1] + boxes_cls_prob_l[2]
+        ) / 3.0
+
+        # bbox pred
+        proposals_boxes = proposal_list[-1]
+        im_scale_lod = fluid.layers.sequence_expand(self.im_scale,
+                                                    proposals_boxes)
+        proposals_boxes = proposals_boxes / im_scale_lod
+        bbox_pred = rcnn_pred[1]
+        bbox_pred_new = fluid.layers.reshape(bbox_pred,
+                                             (-1, cls_agnostic_bbox_reg, 4))
+        if cls_agnostic_bbox_reg == 2:
+            # only use fg box delta to decode box
+            bbox_pred_new = fluid.layers.slice(
+                bbox_pred_new, axes=[1], starts=[1], ends=[2])
+            bbox_pred_new = fluid.layers.expand(bbox_pred_new, [1, 81, 1])
+        decoded_box = fluid.layers.box_coder(
+            prior_box=proposals_boxes,
+            prior_box_var=bbox_reg_w,
+            target_box=bbox_pred_new,
+            code_type='decode_center_size',
+            box_normalized=False,
+            axis=1)
+
+        # TODO: notice detectron use img.shape
+        box_out = fluid.layers.box_clip(input=decoded_box, im_info=im_info)
+
+        pred_result = self.nms(bboxes=box_out, scores=boxes_cls_prob_mean)
+        return {"bbox": pred_result}
+
+    def _head_share(self, roi_feat, wb_scalar=2.0, name=''):
+        # FC6 FC7
+        fan = roi_feat.shape[1] * roi_feat.shape[2] * roi_feat.shape[3]
+        fc6 = fluid.layers.fc(input=roi_feat,
+                              size=self.head.num_chan,
+                              act='relu',
+                              name='fc6' + name,
+                              param_attr=ParamAttr(
+                                  name='fc6%s_w' % name,
+                                  initializer=Xavier(fan_out=fan),
+                                  learning_rate=wb_scalar, ),
+                              bias_attr=ParamAttr(
+                                  name='fc6%s_b' % name,
+                                  learning_rate=2.0,
+                                  regularizer=L2Decay(0.)))
+        fc7 = fluid.layers.fc(input=fc6,
+                              size=self.head.num_chan,
+                              act='relu',
+                              name='fc7' + name,
+                              param_attr=ParamAttr(
+                                  name='fc7%s_w' % name,
+                                  initializer=Xavier(),
+                                  learning_rate=wb_scalar, ),
+                              bias_attr=ParamAttr(
+                                  name='fc7%s_b' % name,
+                                  learning_rate=2.0,
+                                  regularizer=L2Decay(0.)))
+        cls_score = fluid.layers.fc(input=fc7,
+                                    size=self.num_classes,
+                                    act=None,
+                                    name='cls_score' + name,
+                                    param_attr=ParamAttr(
+                                        name='cls_score%s_w' % name,
+                                        initializer=Normal(
+                                            loc=0.0, scale=0.01),
+                                        learning_rate=wb_scalar, ),
+                                    bias_attr=ParamAttr(
+                                        name='cls_score%s_b' % name,
+                                        learning_rate=2.0,
+                                        regularizer=L2Decay(0.)))
+        return cls_score
+
+
+@register
+class FC6FC7Head(object):
+    """
+    Cascade RCNN head with two Fully Connected layers
+
+    Args:
+        num_chan (int): num of filters for the fc layers
+    """
+
+    def __init__(self, num_chan):
+        super(FC6FC7Head, self).__init__()
+        self.num_chan = num_chan
+
+    def __call__(self, roi_feat, wb_scalar=1.0, name=''):
+        fan = roi_feat.shape[1] * roi_feat.shape[2] * roi_feat.shape[3]
+        fc6 = fluid.layers.fc(input=roi_feat,
+                              size=self.num_chan,
+                              act='relu',
+                              name='fc6' + name,
+                              param_attr=ParamAttr(
+                                  name='fc6%s_w' % name,
+                                  initializer=Xavier(fan_out=fan),
+                                  learning_rate=wb_scalar),
+                              bias_attr=ParamAttr(
+                                  name='fc6%s_b' % name,
+                                  learning_rate=wb_scalar,
+                                  regularizer=L2Decay(0.)))
+        head_feat = fluid.layers.fc(input=fc6,
+                                    size=self.num_chan,
+                                    act='relu',
+                                    name='fc7' + name,
+                                    param_attr=ParamAttr(
+                                        name='fc7%s_w' % name,
+                                        initializer=Xavier(),
+                                        learning_rate=wb_scalar),
+                                    bias_attr=ParamAttr(
+                                        name='fc7%s_b' % name,
+                                        learning_rate=wb_scalar,
+                                        regularizer=L2Decay(0.)))
+        return head_feat
--- a/ppdet/modeling/roi_heads/mask_head.py
+++ b/ppdet/modeling/roi_heads/mask_head.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import MSRA
+from paddle.fluid.regularizer import L2Decay
+
+from ppdet.core.workspace import register
+
+__all__ = ['MaskHead']
+
+
+@register
+class MaskHead(object):
+    """
+    RCNN mask head
+    Args:
+        num_convs (int): num of convolutions, 4 for FPN, 1 otherwise
+        num_chan_reduced (int): num of channels after first convolution
+        resolution (int): size of the output mask
+        dilation (int): dilation rate
+        num_classes (int): number of output classes
+    """
+
+    def __init__(self,
+                 num_convs=0,
+                 num_chan_reduced=256,
+                 resolution=14,
+                 dilation=1,
+                 num_classes=81):
+        super(MaskHead, self).__init__()
+        self.num_convs = num_convs
+        self.num_chan_reduced = num_chan_reduced
+        self.resolution = resolution
+        self.dilation = dilation
+        self.num_classes = num_classes
+
+    def _mask_conv_head(self, roi_feat, num_convs):
+        for i in range(num_convs):
+            layer_name = "mask_inter_feat_" + str(i + 1)
+            fan = self.num_chan_reduced * 3 * 3
+            roi_feat = fluid.layers.conv2d(
+                input=roi_feat,
+                num_filters=self.num_chan_reduced,
+                filter_size=3,
+                padding=1 * self.dilation,
+                act='relu',
+                stride=1,
+                dilation=self.dilation,
+                name=layer_name,
+                param_attr=ParamAttr(
+                    name=layer_name + '_w',
+                    initializer=MSRA(
+                        uniform=False, fan_in=fan)),
+                bias_attr=ParamAttr(
+                    name=layer_name + '_b',
+                    learning_rate=2.,
+                    regularizer=L2Decay(0.)))
+        fan = roi_feat.shape[1] * 2 * 2
+        feat = fluid.layers.conv2d_transpose(
+            input=roi_feat,
+            num_filters=self.num_chan_reduced,
+            filter_size=2,
+            stride=2,
+            act='relu',
+            param_attr=ParamAttr(
+                name='conv5_mask_w',
+                initializer=MSRA(
+                    uniform=False, fan_in=fan)),
+            bias_attr=ParamAttr(
+                name='conv5_mask_b', learning_rate=2., regularizer=L2Decay(0.)))
+        return feat
+
+    def _get_output(self, roi_feat):
+        class_num = self.num_classes
+        # configure the conv number for FPN if necessary
+        head_feat = self._mask_conv_head(roi_feat, self.num_convs)
+        fan = class_num
+        mask_logits = fluid.layers.conv2d(
+            input=head_feat,
+            num_filters=class_num,
+            filter_size=1,
+            act=None,
+            param_attr=ParamAttr(
+                name='mask_fcn_logits_w',
+                initializer=MSRA(
+                    uniform=False, fan_in=fan)),
+            bias_attr=ParamAttr(
+                name="mask_fcn_logits_b",
+                learning_rate=2.,
+                regularizer=L2Decay(0.)))
+        return mask_logits
+
+    def get_loss(self, roi_feat, mask_int32):
+        mask_logits = self._get_output(roi_feat)
+        num_classes = self.num_classes
+        resolution = self.resolution
+        dim = num_classes * resolution * resolution
+        mask_logits = fluid.layers.reshape(mask_logits, (-1, dim))
+
+        mask_label = fluid.layers.cast(x=mask_int32, dtype='float32')
+        mask_label.stop_gradient = True
+        loss_mask = fluid.layers.sigmoid_cross_entropy_with_logits(
+            x=mask_logits, label=mask_label, ignore_index=-1, normalize=True)
+        loss_mask = fluid.layers.reduce_sum(loss_mask, name='loss_mask')
+        return {'loss_mask': loss_mask}
+
+    def get_prediction(self, roi_feat, bbox_pred):
+        """
+        Get prediction mask in test stage.
+
+        Args:
+            roi_feat (Variable): RoI feature from RoIExtractor.
+            bbox_pred (Variable): predicted bbox.
+
+        Returns:
+            mask_pred (Variable): Prediction mask with shape
+                [N, num_classes, resolution, resolution].
+        """
+        mask_logits = self._get_output(roi_feat)
+        mask_prob = fluid.layers.sigmoid(mask_logits)
+        mask_prob = fluid.layers.lod_reset(mask_prob, bbox_pred)
+        return mask_prob
--- a/ppdet/modeling/target_assigners.py
+++ b/ppdet/modeling/target_assigners.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+
+from ppdet.core.workspace import register
+from ppdet.modeling.ops import BBoxAssigner, MaskAssigner
+
+__all__ = ['BBoxAssigner', 'MaskAssigner', 'CascadeBBoxAssigner']
+
+
+@register
+class CascadeBBoxAssigner(object):
+    def __init__(self,
+                 batch_size_per_im=512,
+                 fg_fraction=.25,
+                 fg_thresh=[0.5, 0.6, 0.7],
+                 bg_thresh_hi=[0.5, 0.6, 0.7],
+                 bg_thresh_lo=[0., 0., 0.],
+                 bbox_reg_weights=[10, 20, 30],
+                 num_classes=81,
+                 shuffle_before_sample=True):
+        super(CascadeBBoxAssigner, self).__init__()
+        self.batch_size_per_im = batch_size_per_im
+        self.fg_fraction = fg_fraction
+        self.fg_thresh = fg_thresh
+        self.bg_thresh_hi = bg_thresh_hi
+        self.bg_thresh_lo = bg_thresh_lo
+        self.bbox_reg_weights = bbox_reg_weights
+        self.class_nums = num_classes
+        self.use_random = shuffle_before_sample
+
+    def __call__(self, input_rois, feed_vars, curr_stage):
+
+        curr_bbox_reg_w = [
+            1. / self.bbox_reg_weights[curr_stage],
+            2. / self.bbox_reg_weights[curr_stage],
+            2. / self.bbox_reg_weights[curr_stage],
+            2. / self.bbox_reg_weights[curr_stage],
+        ]
+        outs = fluid.layers.generate_proposal_labels(
+            rpn_rois=input_rois,
+            gt_classes=feed_vars['gt_label'],
+            is_crowd=feed_vars['is_crowd'],
+            gt_boxes=feed_vars['gt_box'],
+            im_info=feed_vars['im_info'],
+            batch_size_per_im=self.batch_size_per_im,
+            fg_thresh=self.fg_thresh[curr_stage],
+            bg_thresh_hi=self.bg_thresh_hi[curr_stage],
+            bg_thresh_lo=self.bg_thresh_lo[curr_stage],
+            bbox_reg_weights=curr_bbox_reg_w,
+            use_random=self.use_random,
+            class_nums=2,
+            is_cls_agnostic=True,
+            is_cascade_rcnn=True if curr_stage > 0 else False)
+        return outs
--- a/ppdet/modeling/tests/__init__.py
+++ b/ppdet/modeling/tests/__init__.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/ppdet/modeling/tests/decorator_helper.py
+++ b/ppdet/modeling/tests/decorator_helper.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle.fluid as fluid
+
+__all__ = ['prog_scope']
+
+
+def prog_scope():
+    def __impl__(fn):
+        def __fn__(*args, **kwargs):
+            prog = fluid.Program()
+            startup_prog = fluid.Program()
+            scope = fluid.core.Scope()
+            with fluid.scope_guard(scope):
+                with fluid.program_guard(prog, startup_prog):
+                    with fluid.unique_name.guard():
+                        fn(*args, **kwargs)
+
+        return __fn__
+
+    return __impl__
--- a/ppdet/modeling/tests/test_architectures.py
+++ b/ppdet/modeling/tests/test_architectures.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import unittest
+import numpy as np
+
+import paddle.fluid as fluid
+
+from ppdet.modeling.tests.decorator_helper import prog_scope
+from ppdet.core.workspace import load_config, merge_config, create
+from ppdet.modeling.model_input import create_feed
+
+
+class TestFasterRCNN(unittest.TestCase):
+    def setUp(self):
+        self.set_config()
+        self.cfg = load_config(self.cfg_file)
+        self.detector_type = self.cfg['architecture']
+
+    def set_config(self):
+        self.cfg_file = 'configs/faster_rcnn_r50_1x.yml'
+
+    @prog_scope()
+    def test_train(self):
+        train_feed = create(self.cfg['train_feed'])
+        model = create(self.detector_type)
+        _, feed_vars = create_feed(train_feed)
+        train_fetches = model.train(feed_vars)
+
+    @prog_scope()
+    def test_test(self):
+        test_feed = create(self.cfg['eval_feed'])
+        model = create(self.detector_type)
+        _, feed_vars = create_feed(test_feed)
+        test_fetches = model.eval(feed_vars)
+
+
+class TestMaskRCNN(TestFasterRCNN):
+    def set_config(self):
+        self.cfg_file = 'configs/mask_rcnn_r50_1x.yml'
+
+
+class TestCascadeRCNN(TestFasterRCNN):
+    def set_config(self):
+        self.cfg_file = 'configs/cascade_rcnn_r50_fpn_1x.yml'
+
+
+class TestYolov3(TestFasterRCNN):
+    def set_config(self):
+        self.cfg_file = 'configs/yolov3_darknet.yml'
+
+
+class TestRetinaNet(TestFasterRCNN):
+    def set_config(self):
+        self.cfg_file = 'configs/retinanet_r50_fpn_1x.yml'
+
+
+class TestSSD(TestFasterRCNN):
+    def set_config(self):
+        self.cfg_file = 'configs/ssd_mobilenet_v1_voc.yml'
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/ppdet/optimizer.py
+++ b/ppdet/optimizer.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import logging
+
+from paddle import fluid
+
+import paddle.fluid.optimizer as optimizer
+import paddle.fluid.regularizer as regularizer
+
+from ppdet.core.workspace import register, serializable
+
+__all__ = ['LearningRate', 'OptimizerBuilder']
+
+logger = logging.getLogger(__name__)
+
+
+@serializable
+class PiecewiseDecay(object):
+    """
+    Multi step learning rate decay
+
+    Args:
+        gamma (float): decay factor
+        milestones (list): steps at which to decay learning rate
+    """
+
+    def __init__(self, gamma=0.1, milestones=[6000, 8000], values=None):
+        super(PiecewiseDecay, self).__init__()
+        self.gamma = gamma
+        self.milestones = milestones
+        self.values = values
+
+    def __call__(self, base_lr=None, learning_rate=None):
+        if self.values is not None:
+            return fluid.layers.piecewise_decay(self.milestones, self.values)
+        assert base_lr is not None, "either base LR or values should be provided"
+        values = [base_lr]
+        lr = base_lr
+        for _ in self.milestones:
+            lr *= self.gamma
+            values.append(lr)
+        return fluid.layers.piecewise_decay(self.milestones, values)
+
+
+@serializable
+class LinearWarmup(object):
+    """
+    Warm up learning rate linearly
+
+    Args:
+        steps (int): warm up steps
+        start_factor (float): initial learning rate factor
+    """
+
+    def __init__(self, steps=500, start_factor=1. / 3):
+        super(LinearWarmup, self).__init__()
+        self.steps = steps
+        self.start_factor = start_factor
+
+    def __call__(self, base_lr, learning_rate):
+        start_lr = base_lr * self.start_factor
+
+        return fluid.layers.linear_lr_warmup(
+            learning_rate=learning_rate,
+            warmup_steps=self.steps,
+            start_lr=start_lr,
+            end_lr=base_lr)
+
+
+@register
+class LearningRate(object):
+    """
+    Learning Rate configuration
+
+    Args:
+        base_lr (float): base learning rate
+        schedulers (list): learning rate schedulers
+    """
+    __category__ = 'optim'
+
+    def __init__(self,
+                 base_lr=0.01,
+                 schedulers=[PiecewiseDecay(), LinearWarmup()]):
+        super(LearningRate, self).__init__()
+        self.base_lr = base_lr
+        self.schedulers = schedulers
+
+    def __call__(self):
+        lr = None
+        for sched in self.schedulers:
+            lr = sched(self.base_lr, lr)
+        return lr
+
+
+@register
+class OptimizerBuilder():
+    """
+    Build optimizer handles
+
+    Args:
+        regularizer (object): an `Regularizer` instance
+        optimizer (object): an `Optimizer` instance
+    """
+    __category__ = 'optim'
+
+    def __init__(self,
+                 regularizer={'type': 'L2',
+                              'factor': .0001},
+                 optimizer={'type': 'Momentum',
+                            'momentum': .9}):
+        self.regularizer = regularizer
+        self.optimizer = optimizer
+
+    def __call__(self, learning_rate):
+        reg_type = self.regularizer['type'] + 'Decay'
+        reg_factor = self.regularizer['factor']
+        regularization = getattr(regularizer, reg_type)(reg_factor)
+
+        optim_args = self.optimizer.copy()
+        optim_type = optim_args['type']
+        del optim_args['type']
+        op = getattr(optimizer, optim_type)
+        return op(learning_rate=learning_rate,
+                  regularization=regularization,
+                  **optim_args)
--- a/ppdet/utils/__init__.py
+++ b/ppdet/utils/__init__.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/ppdet/utils/checkpoint.py
+++ b/ppdet/utils/checkpoint.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import os
+import shutil
+import numpy as np
+
+import paddle.fluid as fluid
+
+from .download import get_weights_path
+
+import logging
+logger = logging.getLogger(__name__)
+
+__all__ = ['load_checkpoint', 'load_and_fusebn', 'save']
+
+
+def is_url(path):
+    """
+    Whether path is URL.
+    Args:
+        path (string): URL string or not.
+    """
+    return path.startswith('http://') or path.startswith('https://')
+
+
+def load_pretrain(exe, prog, path):
+    """
+    Load model from the given path.
+    Args:
+        exe (fluid.Executor): The fluid.Executor object.
+        prog (fluid.Program): load weight to which Program object.
+        path (string): URL string or loca model path.
+    """
+    if is_url(path):
+        path = get_weights_path(path)
+
+    if not os.path.exists(path):
+        logger.info('Model path {} does not exists.'.format(path))
+
+    logger.info('Loading pretrained model from {}...'.format(path))
+
+    def _if_exist(var):
+        b = os.path.exists(os.path.join(path, var.name))
+        if b:
+            logger.debug('load weight {}'.format(var.name))
+        return b
+
+    fluid.io.load_vars(exe, path, prog, predicate=_if_exist)
+
+
+def load_checkpoint(exe, prog, path):
+    """
+    Load model from the given path.
+    Args:
+        exe (fluid.Executor): The fluid.Executor object.
+        prog (fluid.Program): load weight to which Program object.
+        path (string): URL string or loca model path.
+    """
+    if is_url(path):
+        path = get_weights_path(path)
+
+    if not os.path.exists(path):
+        logger.info('Model path {} does not exists.'.format(path))
+
+    logger.info('Loading checkpoint from {}...'.format(path))
+    fluid.io.load_persistables(exe, path, prog)
+
+
+def save(exe, prog, path):
+    """
+    Load model from the given path.
+    Args:
+        exe (fluid.Executor): The fluid.Executor object.
+        prog (fluid.Program): save weight from which Program object.
+        path (string): the path to save model.
+    """
+    if os.path.isdir(path):
+        shutil.rmtree(path)
+    logger.info('Save model to {}.'.format(path))
+    fluid.io.save_persistables(exe, path, prog)
+
+
+def load_and_fusebn(exe, prog, path):
+    """
+    Fuse params of batch norm to scale and bias.
+
+    Args:
+        exe (fluid.Executor): The fluid.Executor object.
+        prog (fluid.Program): save weight from which Program object.
+        path (string): the path to save model.
+    """
+    logger.info('Load model and fuse batch norm from {}...'.format(path))
+    if is_url(path):
+        path = get_weights_path(path)
+
+    def _if_exist(var):
+        b = os.path.exists(os.path.join(path, var.name))
+        if b:
+            logger.debug('load weight {}'.format(var.name))
+        return b
+
+    all_vars = list(filter(_if_exist, prog.list_vars()))
+
+    # Since the program uses affine-channel, there is no running mean and var
+    # in the program, here append running mean and var.
+    # NOTE, the params of batch norm should be like:
+    #  x_scale
+    #  x_offset
+    #  x_mean
+    #  x_variance
+    #  x is any prefix
+    mean_variances = set()
+    bn_vars = []
+
+    bn_in_path = True
+
+    inner_prog = fluid.Program()
+    inner_start_prog = fluid.Program()
+    with fluid.program_guard(inner_prog, inner_start_prog):
+        for block in prog.blocks:
+            ops = list(block.ops)
+            if not bn_in_path:
+                break
+            for op in ops:
+                if op.type == 'affine_channel':
+                    # remove 'scale' as prefix
+                    scale_name = op.input('Scale')[0]  # _scale
+                    bias_name = op.input('Bias')[0]  # _offset
+                    prefix = scale_name[:-5]
+                    mean_name = prefix + 'mean'
+                    variance_name = prefix + 'variance'
+
+                    if not os.path.exists(os.path.join(path, mean_name)):
+                        bn_in_path = False
+                        break
+                    if not os.path.exists(os.path.join(path, variance_name)):
+                        bn_in_path = False
+                        break
+
+                    bias = block.var(bias_name)
+                    mean_vb = fluid.layers.create_parameter(
+                        bias.shape, bias.dtype, mean_name)
+                    variance_vb = fluid.layers.create_parameter(
+                        bias.shape, bias.dtype, variance_name)
+                    mean_variances.add(mean_vb)
+                    mean_variances.add(variance_vb)
+
+                    bn_vars.append(
+                        [scale_name, bias_name, mean_name, variance_name])
+
+    if not bn_in_path:
+        raise ValueError("The model in path {} has not params of batch norm.")
+
+    # load running mean and running variance on cpu place into global scope.
+    place = fluid.CPUPlace()
+    exe_cpu = fluid.Executor(place)
+    fluid.io.load_vars(exe_cpu, path, vars=[v for v in mean_variances])
+
+    # load params on real place into global scope.
+    fluid.io.load_vars(exe, path, prog, vars=all_vars)
+
+    eps = 1e-5
+    for names in bn_vars:
+        scale_name, bias_name, mean_name, var_name = names
+
+        scale = fluid.global_scope().find_var(scale_name).get_tensor()
+        bias = fluid.global_scope().find_var(bias_name).get_tensor()
+        mean = fluid.global_scope().find_var(mean_name).get_tensor()
+        var = fluid.global_scope().find_var(var_name).get_tensor()
+
+        scale_arr = np.array(scale)
+        bias_arr = np.array(bias)
+        mean_arr = np.array(mean)
+        var_arr = np.array(var)
+
+        bn_std = np.sqrt(np.add(var_arr, eps))
+        new_scale = np.float32(np.divide(scale_arr, bn_std))
+        new_bias = bias_arr - mean_arr * new_scale
+
+        # fuse to scale and bias in affine_channel
+        scale.set(new_scale, exe.place)
+        bias.set(new_bias, exe.place)
--- a/ppdet/utils/cli.py
+++ b/ppdet/utils/cli.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from argparse import ArgumentParser, RawDescriptionHelpFormatter
+
+import yaml
+
+__all__ = ['ColorTTY', 'ArgsParser']
+
+
+class ColorTTY(object):
+    def __init__(self):
+        super(ColorTTY, self).__init__()
+        self.colors = ['red', 'green', 'yellow', 'blue', 'magenta', 'cyan']
+
+    def __getattr__(self, attr):
+        if attr in self.colors:
+            color = self.colors.index(attr) + 31
+
+            def color_message(message):
+                return "[{}m{}[0m".format(color, message)
+
+            setattr(self, attr, color_message)
+            return color_message
+
+    def bold(self, message):
+        return self.with_code('01', message)
+
+    def with_code(self, code, message):
+        return "[{}m{}[0m".format(code, message)
+
+
+class ArgsParser(ArgumentParser):
+
+    def __init__(self):
+        super(ArgsParser, self).__init__(
+            formatter_class=RawDescriptionHelpFormatter)
+        self.add_argument("-c", "--config", help="configuration file to use")
+        self.add_argument("-o", "--opt", nargs='*',
+                          help="set configuration options")
+
+    def parse_args(self, argv=None):
+        args = super(ArgsParser, self).parse_args(argv)
+        assert args.config is not None, \
+            "Please specify --config=configure_file_path."
+        args.opt = self._parse_opt(args.opt)
+        return args
+
+    def _parse_opt(self, opts):
+        config = {}
+        if not opts:
+            return config
+        for s in opts:
+            s = s.strip()
+            k, v = s.split('=')
+            if '.' not in k:
+                config[k] = v
+            else:
+                keys = k.split('.')
+                config[keys[0]] = {}
+                cur = config[keys[0]]
+                for idx, key in enumerate(keys[1:]):
+                    if idx == len(keys) - 2:
+                        cur[key] = yaml.load(v, Loader=yaml.Loader)
+                    else:
+                        cur[key] = {}
+                        cur = cur[key]
+        return config
--- a/ppdet/utils/coco_eval.py
+++ b/ppdet/utils/coco_eval.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import os
+import sys
+import json
+import cv2
+import numpy as np
+from pycocotools.coco import COCO
+from pycocotools.cocoeval import COCOeval
+import pycocotools.mask as mask_util
+
+import logging
+logger = logging.getLogger(__name__)
+
+__all__ = [
+    'bbox_eval', 'mask_eval', 'bbox2out', 'mask2out', 'get_category_info'
+]
+
+
+def clip_bbox(bbox):
+    xmin = max(min(bbox[0], 1.), 0.)
+    ymin = max(min(bbox[1], 1.), 0.)
+    xmax = max(min(bbox[2], 1.), 0.)
+    ymax = max(min(bbox[3], 1.), 0.)
+    return xmin, ymin, xmax, ymax
+
+
+def bbox_eval(results, anno_file, outfile, with_background=True):
+    assert 'bbox' in results[0]
+    assert outfile.endswith('.json')
+
+    coco_gt = COCO(anno_file)
+    cat_ids = coco_gt.getCatIds()
+
+    # when with_background = True, mapping category to classid, like:
+    #   background:0, first_class:1, second_class:2, ...
+    clsid2catid = dict(
+        {i + int(with_background): catid
+         for i, catid in enumerate(cat_ids)})
+
+    xywh_results = bbox2out(results, clsid2catid)
+    with open(outfile, 'w') as f:
+        json.dump(xywh_results, f)
+
+    logger.info("Start evaluate...")
+    coco_dt = coco_gt.loadRes(outfile)
+    coco_ev = COCOeval(coco_gt, coco_dt, 'bbox')
+    coco_ev.evaluate()
+    coco_ev.accumulate()
+    coco_ev.summarize()
+    # flush coco evaluation result
+    sys.stdout.flush()
+
+
+def mask_eval(results, anno_file, outfile, resolution, thresh_binarize=0.5):
+    assert 'mask' in results[0]
+    assert outfile.endswith('.json')
+
+    coco_gt = COCO(anno_file)
+    clsid2catid = {i + 1: v for i, v in enumerate(coco_gt.getCatIds())}
+
+    segm_results = mask2out(results, clsid2catid, resolution, thresh_binarize)
+    with open(outfile, 'w') as f:
+        json.dump(segm_results, f)
+
+    logger.info("Start evaluate...")
+    coco_dt = coco_gt.loadRes(outfile)
+    coco_ev = COCOeval(coco_gt, coco_dt, 'segm')
+    coco_ev.evaluate()
+    coco_ev.accumulate()
+    coco_ev.summarize()
+
+
+def bbox2out(results, clsid2catid, is_bbox_normalized=False):
+    xywh_res = []
+    for t in results:
+        bboxes = t['bbox'][0]
+        lengths = t['bbox'][1][0]
+        im_ids = np.array(t['im_id'][0])
+        if bboxes.shape == (1, 1) or bboxes is None:
+            continue
+
+        k = 0
+        for i in range(len(lengths)):
+            num = lengths[i]
+            im_id = int(im_ids[i][0])
+            for j in range(num):
+                dt = bboxes[k]
+                clsid, score, xmin, ymin, xmax, ymax = dt.tolist()
+                catid = clsid2catid[clsid]
+
+                if is_bbox_normalized:
+                    xmin, ymin, xmax, ymax = \
+                            clip_bbox([xmin, ymin, xmax, ymax])
+                    w = xmax - xmin
+                    h = ymax - ymin
+                else:
+                    w = xmax - xmin + 1
+                    h = ymax - ymin + 1
+
+                bbox = [xmin, ymin, w, h]
+                coco_res = {
+                    'image_id': im_id,
+                    'category_id': catid,
+                    'bbox': bbox,
+                    'score': score
+                }
+                xywh_res.append(coco_res)
+                k += 1
+    return xywh_res
+
+
+def mask2out(results, clsid2catid, resolution, thresh_binarize=0.5):
+    scale = (resolution + 2.0) / resolution
+
+    segm_res = []
+
+    # for each batch
+    for t in results:
+        bboxes = t['bbox'][0]
+
+        lengths = t['bbox'][1][0]
+        im_ids = np.array(t['im_id'][0])
+        if bboxes.shape == (1, 1) or bboxes is None:
+            continue
+        if len(bboxes.tolist()) == 0:
+            continue
+
+        masks = t['mask'][0]
+        im_shape = t['im_shape'][0][0]
+
+        s = 0
+        # for each sample
+        for i in range(len(lengths)):
+            num = lengths[i]
+            im_id = int(im_ids[i][0])
+
+            bbox = bboxes[s:s + num][:, 2:]
+            clsid_scores = bboxes[s:s + num][:, 0:2]
+            mask = masks[s:s + num]
+            s += num
+
+            im_h = int(im_shape[0])
+            im_w = int(im_shape[1])
+
+            expand_bbox = expand_boxes(bbox, scale)
+            expand_bbox = expand_bbox.astype(np.int32)
+
+            padded_mask = np.zeros(
+                (resolution + 2, resolution + 2), dtype=np.float32)
+
+            for j in range(num):
+                xmin, ymin, xmax, ymax = expand_bbox[j].tolist()
+                clsid, score = clsid_scores[j].tolist()
+                clsid = int(clsid)
+                padded_mask[1:-1, 1:-1] = mask[j, clsid, :, :]
+
+                catid = clsid2catid[clsid]
+
+                w = xmax - xmin + 1
+                h = ymax - ymin + 1
+                w = np.maximum(w, 1)
+                h = np.maximum(h, 1)
+
+                resized_mask = cv2.resize(padded_mask, (w, h))
+                resized_mask = np.array(
+                    resized_mask > thresh_binarize, dtype=np.uint8)
+                im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
+
+                x0 = min(max(xmin, 0), im_w)
+                x1 = min(max(xmax + 1, 0), im_w)
+                y0 = min(max(ymin, 0), im_h)
+                y1 = min(max(ymax + 1, 0), im_h)
+
+                im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), (
+                    x0 - xmin):(x1 - xmin)]
+                segm = mask_util.encode(
+                    np.array(
+                        im_mask[:, :, np.newaxis], order='F'))[0]
+                catid = clsid2catid[clsid]
+                segm['counts'] = segm['counts'].decode('utf8')
+                coco_res = {
+                    'image_id': im_id,
+                    'category_id': catid,
+                    'segmentation': segm,
+                    'score': score
+                }
+                segm_res.append(coco_res)
+    return segm_res
+
+
+def expand_boxes(boxes, scale):
+    """
+    Expand an array of boxes by a given scale.
+    """
+    w_half = (boxes[:, 2] - boxes[:, 0]) * .5
+    h_half = (boxes[:, 3] - boxes[:, 1]) * .5
+    x_c = (boxes[:, 2] + boxes[:, 0]) * .5
+    y_c = (boxes[:, 3] + boxes[:, 1]) * .5
+
+    w_half *= scale
+    h_half *= scale
+
+    boxes_exp = np.zeros(boxes.shape)
+    boxes_exp[:, 0] = x_c - w_half
+    boxes_exp[:, 2] = x_c + w_half
+    boxes_exp[:, 1] = y_c - h_half
+    boxes_exp[:, 3] = y_c + h_half
+
+    return boxes_exp
+
+
+def get_category_info(anno_file=None,
+                      with_background=True,
+                      use_default_label=False):
+    if use_default_label or anno_file is None \
+            or not os.path.exists(anno_file):
+        logger.info("Not found annotation file {}, load "
+                    "coco17 categories.".format(anno_file))
+        return coco17_category_info(with_background)
+    else:
+        logger.info("Load categories from {}".format(anno_file))
+        return get_category_info_from_anno(anno_file, with_background)
+
+
+def get_category_info_from_anno(anno_file, with_background=True):
+    """
+    Get class id to category id map and category id
+    to category name map from annotation file.
+
+    Args:
+        anno_file (str): annotation file path
+        with_background (bool, default True):
+            whether load background as class 0.
+    """
+    coco = COCO(anno_file)
+    cats = coco.loadCats(coco.getCatIds())
+    clsid2catid = {
+        i + int(with_background): cat['id']
+        for i, cat in enumerate(cats)
+    }
+    catid2name = {cat['id']: cat['name'] for cat in cats}
+
+    return clsid2catid, catid2name
+
+
+def coco17_category_info(with_background=True):
+    """
+    Get class id to category id map and category id
+    to category name map of COCO2017 dataset
+
+    Args:
+        with_background (bool, default True):
+            whether load background as class 0.
+    """
+    clsid2catid = {
+        1: 1,
+        2: 2,
+        3: 3,
+        4: 4,
+        5: 5,
+        6: 6,
+        7: 7,
+        8: 8,
+        9: 9,
+        10: 10,
+        11: 11,
+        12: 13,
+        13: 14,
+        14: 15,
+        15: 16,
+        16: 17,
+        17: 18,
+        18: 19,
+        19: 20,
+        20: 21,
+        21: 22,
+        22: 23,
+        23: 24,
+        24: 25,
+        25: 27,
+        26: 28,
+        27: 31,
+        28: 32,
+        29: 33,
+        30: 34,
+        31: 35,
+        32: 36,
+        33: 37,
+        34: 38,
+        35: 39,
+        36: 40,
+        37: 41,
+        38: 42,
+        39: 43,
+        40: 44,
+        41: 46,
+        42: 47,
+        43: 48,
+        44: 49,
+        45: 50,
+        46: 51,
+        47: 52,
+        48: 53,
+        49: 54,
+        50: 55,
+        51: 56,
+        52: 57,
+        53: 58,
+        54: 59,
+        55: 60,
+        56: 61,
+        57: 62,
+        58: 63,
+        59: 64,
+        60: 65,
+        61: 67,
+        62: 70,
+        63: 72,
+        64: 73,
+        65: 74,
+        66: 75,
+        67: 76,
+        68: 77,
+        69: 78,
+        70: 79,
+        71: 80,
+        72: 81,
+        73: 82,
+        74: 84,
+        75: 85,
+        76: 86,
+        77: 87,
+        78: 88,
+        79: 89,
+        80: 90
+    }
+
+    catid2name = {
+        0: 'background',
+        1: 'person',
+        2: 'bicycle',
+        3: 'car',
+        4: 'motorcycle',
+        5: 'airplane',
+        6: 'bus',
+        7: 'train',
+        8: 'truck',
+        9: 'boat',
+        10: 'traffic light',
+        11: 'fire hydrant',
+        13: 'stop sign',
+        14: 'parking meter',
+        15: 'bench',
+        16: 'bird',
+        17: 'cat',
+        18: 'dog',
+        19: 'horse',
+        20: 'sheep',
+        21: 'cow',
+        22: 'elephant',
+        23: 'bear',
+        24: 'zebra',
+        25: 'giraffe',
+        27: 'backpack',
+        28: 'umbrella',
+        31: 'handbag',
+        32: 'tie',
+        33: 'suitcase',
+        34: 'frisbee',
+        35: 'skis',
+        36: 'snowboard',
+        37: 'sports ball',
+        38: 'kite',
+        39: 'baseball bat',
+        40: 'baseball glove',
+        41: 'skateboard',
+        42: 'surfboard',
+        43: 'tennis racket',
+        44: 'bottle',
+        46: 'wine glass',
+        47: 'cup',
+        48: 'fork',
+        49: 'knife',
+        50: 'spoon',
+        51: 'bowl',
+        52: 'banana',
+        53: 'apple',
+        54: 'sandwich',
+        55: 'orange',
+        56: 'broccoli',
+        57: 'carrot',
+        58: 'hot dog',
+        59: 'pizza',
+        60: 'donut',
+        61: 'cake',
+        62: 'chair',
+        63: 'couch',
+        64: 'potted plant',
+        65: 'bed',
+        67: 'dining table',
+        70: 'toilet',
+        72: 'tv',
+        73: 'laptop',
+        74: 'mouse',
+        75: 'remote',
+        76: 'keyboard',
+        77: 'cell phone',
+        78: 'microwave',
+        79: 'oven',
+        80: 'toaster',
+        81: 'sink',
+        82: 'refrigerator',
+        84: 'book',
+        85: 'clock',
+        86: 'vase',
+        87: 'scissors',
+        88: 'teddy bear',
+        89: 'hair drier',
+        90: 'toothbrush'
+    }
+
+    if not with_background:
+        clsid2catid = {k - 1: v for k, v in clsid2catid.items()}
+
+    return clsid2catid, catid2name
--- a/ppdet/utils/colormap.py
+++ b/ppdet/utils/colormap.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import numpy as np
+
+
+def colormap(rgb=False):
+    """
+    Get colormap
+    """
+    color_list = np.array([
+        0.000, 0.447, 0.741, 0.850, 0.325, 0.098, 0.929, 0.694, 0.125, 0.494,
+        0.184, 0.556, 0.466, 0.674, 0.188, 0.301, 0.745, 0.933, 0.635, 0.078,
+        0.184, 0.300, 0.300, 0.300, 0.600, 0.600, 0.600, 1.000, 0.000, 0.000,
+        1.000, 0.500, 0.000, 0.749, 0.749, 0.000, 0.000, 1.000, 0.000, 0.000,
+        0.000, 1.000, 0.667, 0.000, 1.000, 0.333, 0.333, 0.000, 0.333, 0.667,
+        0.000, 0.333, 1.000, 0.000, 0.667, 0.333, 0.000, 0.667, 0.667, 0.000,
+        0.667, 1.000, 0.000, 1.000, 0.333, 0.000, 1.000, 0.667, 0.000, 1.000,
+        1.000, 0.000, 0.000, 0.333, 0.500, 0.000, 0.667, 0.500, 0.000, 1.000,
+        0.500, 0.333, 0.000, 0.500, 0.333, 0.333, 0.500, 0.333, 0.667, 0.500,
+        0.333, 1.000, 0.500, 0.667, 0.000, 0.500, 0.667, 0.333, 0.500, 0.667,
+        0.667, 0.500, 0.667, 1.000, 0.500, 1.000, 0.000, 0.500, 1.000, 0.333,
+        0.500, 1.000, 0.667, 0.500, 1.000, 1.000, 0.500, 0.000, 0.333, 1.000,
+        0.000, 0.667, 1.000, 0.000, 1.000, 1.000, 0.333, 0.000, 1.000, 0.333,
+        0.333, 1.000, 0.333, 0.667, 1.000, 0.333, 1.000, 1.000, 0.667, 0.000,
+        1.000, 0.667, 0.333, 1.000, 0.667, 0.667, 1.000, 0.667, 1.000, 1.000,
+        1.000, 0.000, 1.000, 1.000, 0.333, 1.000, 1.000, 0.667, 1.000, 0.167,
+        0.000, 0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000,
+        0.000, 0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000,
+        0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000,
+        0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000, 0.000,
+        0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000, 0.833,
+        0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.143, 0.143, 0.143, 0.286,
+        0.286, 0.286, 0.429, 0.429, 0.429, 0.571, 0.571, 0.571, 0.714, 0.714,
+        0.714, 0.857, 0.857, 0.857, 1.000, 1.000, 1.000
+    ]).astype(np.float32)
+    color_list = color_list.reshape((-1, 3)) * 255
+    if not rgb:
+        color_list = color_list[:, ::-1]
+    return color_list
--- a/ppdet/utils/download.py
+++ b/ppdet/utils/download.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+
+import os
+import os.path as osp
+import shutil
+import requests
+import tqdm
+import hashlib
+import tarfile
+import zipfile
+
+from .voc_utils import merge_and_create_list
+
+import logging
+logger = logging.getLogger(__name__)
+
+__all__ = ['get_weights_path', 'get_dataset_path']
+
+WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/weights")
+DATASET_HOME = osp.expanduser("~/.cache/paddle/dataset")
+
+# dict of {dataset_name: (downalod_info, sub_dirs)}
+# download info: (url, md5sum)
+DATASETS = {
+    'coco': ([
+        ('http://images.cocodataset.org/zips/train2017.zip',
+         'cced6f7f71b7629ddf16f17bbcfab6b2', ),
+        ('http://images.cocodataset.org/zips/val2017.zip',
+         '442b8da7639aecaf257c1dceb8ba8c80', ),
+        ('http://images.cocodataset.org/annotations/annotations_trainval2017.zip',
+         'f4bbac642086de4f52a3fdda2de5fa2c', ),
+    ], ["annotations", "train2017", "val2017"]),
+    'voc': ([
+        ('http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar',
+         '6cd6e144f989b92b3379bac3b3de84fd', ),
+        ('http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar',
+         'c52e279531787c972589f7e41ab4ae64', ),
+        ('http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar',
+         'b6e924de25625d8de591ea690078ad9f', ),
+    ], ["VOCdevkit/VOC_all"]),
+}
+
+DOWNLOAD_RETRY_LIMIT = 3
+
+
+def get_weights_path(url):
+    """Get weights path from WEIGHT_HOME, if not exists,
+    download it from url.
+    """
+    return get_path(url, WEIGHTS_HOME)
+
+
+def get_dataset_path(path):
+    """
+    If path exists, return path.
+    Otherwise, get dataset path from DATASET_HOME, if not exists,
+    download it.
+    """
+    if _dataset_exists(path):
+        logger.debug("Dataset path: {}".format(osp.realpath(path)))
+        return path
+
+    logger.info("Dataset {} not exitst, try searching {} or "
+                "downloading dataset...".format(
+                    osp.realpath(path), DATASET_HOME))
+
+    for name, dataset in DATASETS.items():
+        if path.lower().find(name) >= 0:
+            logger.info("Parse dataset_dir {} as dataset "
+                        "{}".format(path, name))
+            data_dir = osp.join(DATASET_HOME, name)
+
+            # For voc, only check merged dir VOC_all
+            if name == 'voc':
+                check_dir = osp.join(data_dir, dataset[1][0])
+                if osp.exists(check_dir):
+                    logger.info("Found {}".format(check_dir))
+                    return data_dir
+
+            for url, md5sum in dataset[0]:
+                get_path(url, data_dir, md5sum)
+
+            # voc should merge dir and create list after download
+            if name == 'voc':
+                logger.info("Download voc dataset successed, merge "
+                            "VOC2007 and VOC2012 to VOC_all...")
+                output_dir = osp.join(data_dir, dataset[1][0])
+                devkit_dir = "/".join(output_dir.split('/')[:-1])
+                years = ['2007', '2012']
+                # merge dir in output_tmp_dir at first, move to 
+                # output_dir after merge sucessed.
+                output_tmp_dir = osp.join(data_dir, 'tmp')
+                if osp.isdir(output_tmp_dir):
+                    shutil.rmtree(output_tmp_dir)
+                # NOTE(dengkaipeng): since using auto download VOC
+                # dataset, VOC default label list should be used, 
+                # do not generate label_list.txt here. For default
+                # label, see ../data/source/voc_loader.py
+                merge_and_create_list(devkit_dir, years, 
+                                      output_tmp_dir)
+                shutil.move(output_tmp_dir, output_dir)
+                # remove source directory VOC2007 and VOC2012
+                shutil.rmtree(osp.join(devkit_dir, "VOC2007"))
+                shutil.rmtree(osp.join(devkit_dir, "VOC2012"))
+            return data_dir
+
+    # not match any dataset in DATASETS
+    raise ValueError("{} not exists and unknow dataset type".format(path))
+
+
+def get_path(url, root_dir, md5sum=None):
+    """ Download from given url to root_dir.
+    if file or directory specified by url is exists under
+    root_dir, return the path directly, otherwise download
+    from url and decompress it, return the path.
+
+    url (str): download url
+    root_dir (str): root dir for downloading, it should be
+                    WEIGHTS_HOME or DATASET_HOME
+    md5sum (str): md5 sum of download package
+    """
+    # parse path after download to decompress under root_dir
+    fname = url.split('/')[-1]
+    zip_formats = ['.zip', '.tar', '.gz']
+    fpath = fname
+    for zip_format in zip_formats:
+        fpath = fpath.replace(zip_format, '')
+    fullpath = osp.join(root_dir, fpath)
+
+    # For same zip file, decompressed directory name different
+    # from zip file name, rename by following map
+    decompress_name_map = {
+        "VOC": "VOCdevkit/VOC_all",
+        "annotations_trainval": "annotations"
+    }
+    for k, v in decompress_name_map.items():
+        if fullpath.find(k) >= 0:
+            fullpath = '/'.join(fullpath.split('/')[:-1] + [v])
+
+    if osp.exists(fullpath):
+        logger.info("Found {}".format(fullpath))
+    else:
+        fullname = _download(url, root_dir, md5sum)
+        _decompress(fullname)
+
+    return fullpath
+
+
+def _dataset_exists(path):
+    """
+    Check if user define dataset exists
+    """
+    if not osp.exists(path):
+        return False
+
+    for name, dataset in DATASETS.items():
+        if path.lower().find(name) >= 0:
+            for sub_dir in dataset[1]:
+                if not osp.exists(osp.join(path, sub_dir)):
+                    return False
+            return True
+    return True
+
+
+def _download(url, path, md5sum=None):
+    """
+    Download from url, save to path.
+
+    url (str): download url
+    path (str): download to given path
+    """
+    if not osp.exists(path):
+        os.makedirs(path)
+
+    fname = url.split('/')[-1]
+    fullname = osp.join(path, fname)
+    retry_cnt = 0
+
+    while not (osp.exists(fullname) and _md5check(fullname, md5sum)):
+        if retry_cnt < DOWNLOAD_RETRY_LIMIT:
+            retry_cnt += 1
+        else:
+            raise RuntimeError("Download from {} failed. "
+                               "Retry limit reached".format(url))
+
+        logger.info("Downloading {} from {}".format(fname, url))
+
+        req = requests.get(url, stream=True)
+        if req.status_code != 200:
+            raise RuntimeError("Downloading from {} failed with code "
+                               "{}!".format(url, req.status_code))
+
+        total_size = req.headers.get('content-length')
+        with open(fullname, 'wb') as f:
+            if total_size:
+                for chunk in tqdm.tqdm(
+                        req.iter_content(chunk_size=1024),
+                        total=(int(total_size) + 1023) // 1024,
+                        unit='KB'):
+                    f.write(chunk)
+            else:
+                for chunk in req.iter_content(chunk_size=1024):
+                    if chunk:
+                        f.write(chunk)
+
+    return fullname
+
+
+def _md5check(fullname, md5sum=None):
+    if md5sum is None:
+        return True
+
+    logger.info("File {} md5 checking...".format(fullname))
+    md5 = hashlib.md5()
+    with open(fullname, 'rb') as f:
+        for chunk in iter(lambda: f.read(4096), b""):
+            md5.update(chunk)
+    calc_md5sum = md5.hexdigest()
+
+    if calc_md5sum != md5sum:
+        logger.info("File {} md5 check failed, {}(calc) != "
+                    "{}(base)".format(fullname, calc_md5sum, md5sum))
+        return False
+    return True
+
+
+def _decompress(fname):
+    """
+    Decompress for zip and tar file
+    """
+    logger.info("Decompressing {}...".format(fname))
+
+    # For protecting decompressing interupted,
+    # decompress to fpath_tmp directory firstly, if decompress
+    # successed, move decompress files to fpath and delete
+    # fpath_tmp and remove download compress file.
+    fpath = '/'.join(fname.split('/')[:-1])
+    fpath_tmp = osp.join(fpath, 'tmp')
+    if osp.isdir(fpath_tmp):
+        shutil.rmtree(fpath_tmp)
+        os.makedirs(fpath_tmp)
+
+    if fname.find('tar') >= 0:
+        with tarfile.open(fname) as tf:
+            tf.extractall(path=fpath_tmp)
+    elif fname.find('zip') >= 0:
+        with zipfile.ZipFile(fname) as zf:
+            zf.extractall(path=fpath_tmp)
+    else:
+        raise TypeError("Unsupport compress file type {}".format(fname))
+
+    for f in os.listdir(fpath_tmp):
+        src_dir = osp.join(fpath_tmp, f)
+        dst_dir = osp.join(fpath, f)
+        _move_and_merge_tree(src_dir, dst_dir)
+
+    shutil.rmtree(fpath_tmp)
+    os.remove(fname)
+
+
+def _move_and_merge_tree(src, dst):
+    """
+    Move src directory to dst, if dst is already exists, 
+    merge src to dst
+    """
+    if not osp.exists(dst):
+        shutil.move(src, dst)
+    else:
+        for fp in os.listdir(src):
+            src_fp = osp.join(src, fp)
+            dst_fp = osp.join(dst, fp)
+            if osp.isdir(src_fp):
+                if osp.isdir(dst_fp):
+                    _move_and_merge_tree(src_fp, dst_fp)
+                else:
+                    shutil.move(src_fp, dst_fp)
+            elif osp.isfile(src_fp) and \
+                    not osp.isfile(dst_fp):
+                shutil.move(src_fp, dst_fp)
--- a/ppdet/utils/eval_utils.py
+++ b/ppdet/utils/eval_utils.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import logging
+import numpy as np
+
+import paddle.fluid as fluid
+
+__all__ = ['parse_fetches', 'eval_run', 'eval_results']
+
+logger = logging.getLogger(__name__)
+
+
+def parse_fetches(fetches, prog=None, extra_keys=None):
+    """
+    Parse fetch variable infos from model fetches,
+    values for fetch_list and keys for stat
+    """
+    keys, values = [], []
+    cls = []
+    for k, v in fetches.items():
+        if hasattr(v, 'name'):
+            keys.append(k)
+            v.persistable = True
+            values.append(v.name)
+        else:
+            cls.append(v)
+
+    if prog is not None and extra_keys is not None:
+        for k in extra_keys:
+            try:
+                v = fluid.framework._get_var(k, prog)
+                v.persistable = True
+                keys.append(k)
+                values.append(v.name)
+            except Exception:
+                pass
+
+    return keys, values, cls
+
+
+def eval_run(exe, compile_program, pyreader, keys, values, cls):
+    """
+    Run evaluation program, return program outputs.
+    """
+    iter_id = 0
+    results = []
+    if len(cls) != 0:
+        values = []
+        for i in range(len(cls)):
+            _, accum_map = cls[i].get_map_var()
+            cls[i].reset(exe)
+            values.append(accum_map)
+
+    try:
+        pyreader.start()
+        while True:
+            outs = exe.run(compile_program,
+                           fetch_list=values,
+                           return_numpy=False)
+            res = {
+                k: (np.array(v), v.recursive_sequence_lengths())
+                for k, v in zip(keys, outs)
+            }
+            results.append(res)
+            if iter_id % 100 == 0:
+                logger.info('Test iter {}'.format(iter_id))
+            iter_id += 1
+    except (StopIteration, fluid.core.EOFException):
+        pyreader.reset()
+    logger.info('Test finish iter {}'.format(iter_id))
+
+    return results
+
+
+def eval_results(results, feed, metric, resolution=None, output_file=None):
+    """Evaluation for evaluation program results"""
+    if metric == 'COCO':
+        from ppdet.utils.coco_eval import bbox_eval, mask_eval
+        anno_file = getattr(feed.dataset, 'annotation', None)
+        with_background = getattr(feed, 'with_background', True)
+        output = 'bbox.json'
+        if output_file:
+            output = '{}_bbox.json'.format(output_file)
+        bbox_eval(results, anno_file, output, with_background)
+        if 'mask' in results[0]:
+            output = 'mask.json'
+            if output_file:
+                output = '{}_mask.json'.format(output_file)
+            mask_eval(results, anno_file, output, resolution)
+    else:
+        res = np.mean(results[-1]['accum_map'][0])
+        logger.info('Test mAP: {}'.format(res))
--- a/ppdet/utils/stats.py
+++ b/ppdet/utils/stats.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import collections
+import numpy as np
+import datetime
+
+__all__ = ['TrainingStats', 'Time']
+
+
+class SmoothedValue(object):
+    """Track a series of values and provide access to smoothed values over a
+    window or the global series average.
+    """
+
+    def __init__(self, window_size):
+        self.deque = collections.deque(maxlen=window_size)
+
+    def add_value(self, value):
+        self.deque.append(value)
+
+    def get_median_value(self):
+        return np.median(self.deque)
+
+
+def Time():
+    return datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')
+
+
+class TrainingStats(object):
+    def __init__(self, window_size, stats_keys):
+        self.smoothed_losses_and_metrics = {
+            key: SmoothedValue(window_size)
+            for key in stats_keys
+        }
+
+    def update(self, stats):
+        for k, v in self.smoothed_losses_and_metrics.items():
+            v.add_value(stats[k])
+
+    def get(self, extras=None):
+        stats = collections.OrderedDict()
+        if extras:
+            for k, v in extras.items():
+                stats[k] = v
+        for k, v in self.smoothed_losses_and_metrics.items():
+            stats[k] = round(v.get_median_value(), 6)
+
+        return stats
+
+    def log(self, extras=None):
+        d = self.get(extras)
+        strs = ', '.join(
+            str(dict({
+                x.encode('utf-8'): y
+            })).strip('{}') for x, y in d.items())
+        return strs
--- a/ppdet/utils/visualizer.py
+++ b/ppdet/utils/visualizer.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import numpy as np
+import pycocotools.mask as mask_util
+from PIL import Image, ImageDraw
+
+from .colormap import colormap
+
+__all__ = ['visualize_results']
+
+
+def visualize_results(image,
+                      im_id,
+                      catid2name,
+                      threshold=0.5,
+                      bbox_results=None,
+                      mask_results=None,
+                      is_bbox_normalized=False):
+    """
+    Visualize bbox and mask results
+    """
+    if mask_results:
+        image = draw_mask(image, im_id, mask_results, threshold)
+    if bbox_results:
+        image = draw_bbox(image, im_id, catid2name, bbox_results,
+                          threshold, is_bbox_normalized)
+    return image
+
+
+def draw_mask(image, im_id, segms, threshold, alpha=0.7):
+    """
+    Draw mask on image
+    """
+    mask_color_id = 0
+    w_ratio = .4
+    color_list = colormap(rgb=True)
+    img_array = np.array(image).astype('float32')
+    for dt in np.array(segms):
+        if im_id != dt['image_id']:
+            continue
+        segm, score = dt['segmentation'], dt['score']
+        if score < threshold:
+            continue
+        mask = mask_util.decode(segm) * 255
+        color_mask = color_list[mask_color_id % len(color_list), 0:3]
+        mask_color_id += 1
+        for c in range(3):
+            color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
+        idx = np.nonzero(mask)
+        img_array[idx[0], idx[1], :] *= 1.0 - alpha
+        img_array[idx[0], idx[1], :] += alpha * color_mask
+    return Image.fromarray(img_array.astype('uint8'))
+
+
+def draw_bbox(image, im_id, catid2name, bboxes, threshold,
+              is_bbox_normalized=False):
+    """
+    Draw bbox on image
+    """
+    draw = ImageDraw.Draw(image)
+
+    catid2color = {}
+    color_list = colormap(rgb=True)[:40]
+    for dt in np.array(bboxes):
+        if im_id != dt['image_id']:
+            continue
+        catid, bbox, score = dt['category_id'], dt['bbox'], dt['score']
+        if score < threshold:
+            continue
+
+        xmin, ymin, w, h = bbox
+        if is_bbox_normalized:
+            im_width, im_height = image.size
+            xmin *= im_width
+            ymin *= im_height
+            w *= im_width
+            h *= im_height
+        xmax = xmin + w
+        ymax = ymin + h
+
+        if catid not in catid2color:
+            idx = np.random.randint(len(color_list))
+            catid2color[catid] = color_list[idx]
+        color = tuple(catid2color[catid])
+
+        # draw bbox
+        draw.line(
+            [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
+             (xmin, ymin)],
+            width=2,
+            fill=color)
+
+        # draw label
+        text = "{} {:.2f}".format(catid2name[catid], score)
+        tw, th = draw.textsize(text)
+        draw.rectangle([(xmin + 1, ymin - th), 
+                       (xmin + tw + 1, ymin)],
+                       fill=color)
+        draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
+
+    return image
--- a/ppdet/utils/voc_eval.py
+++ b/ppdet/utils/voc_eval.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import os
+import sys
+import numpy as np
+
+from ..data.source.voc_loader import pascalvoc_label
+from .coco_eval import bbox2out
+
+import logging
+logger = logging.getLogger(__name__)
+
+__all__ = [
+    'bbox2out', 'get_category_info'
+]
+
+
+def get_category_info(anno_file=None,
+                      with_background=True,
+                      use_default_label=False):
+    if use_default_label or anno_file is None \
+            or not os.path.exists(anno_file):
+        logger.info("Not found annotation file {}, load "
+                    "voc2012 categories.".format(anno_file))
+        return vocall_category_info(with_background)
+    else:
+        logger.info("Load categories from {}".format(anno_file))
+        return get_category_info_from_anno(anno_file, with_background)
+
+
+def get_category_info_from_anno(anno_file, with_background=True):
+    """
+    Get class id to category id map and category id
+    to category name map from annotation file.
+
+    Args:
+        anno_file (str): annotation file path
+        with_background (bool, default True):
+            whether load background as class 0.
+    """
+    cats = []
+    with open(anno_file) as f:
+        for line in f.readlines():
+            cats.append(line.strip())
+
+    if cats[0] != 'background' and with_background:
+        cats.insert(0, 'background')
+    if cats[0] == 'background' and not with_background:
+        cats = cats[1:]
+
+    clsid2catid = {i: i for i in range(len(cats))}
+    catid2name = {i: name for i, name in enumerate(cats)}
+
+    return clsid2catid, catid2name
+
+
+def vocall_category_info(with_background=True):
+    """
+    Get class id to category id map and category id
+    to category name map of mixup voc dataset
+
+    Args:
+        with_background (bool, default True):
+            whether load background as class 0.
+    """
+    label_map = pascalvoc_label(with_background)
+    label_map = sorted(label_map.items(), key=lambda x: x[1])
+    cats = [l[0] for l in label_map]
+
+    if with_background:
+        cats.insert(0, 'background')
+
+    clsid2catid = {i: i for i in range(len(cats))}
+    catid2name = {i: name for i, name in enumerate(cats)}
+
+    return clsid2catid, catid2name
--- a/ppdet/utils/voc_utils.py
+++ b/ppdet/utils/voc_utils.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import os.path as osp
+import re
+import random
+import shutil
+
+__all__ = ['merge_and_create_list']
+
+
+def merge_and_create_list(devkit_dir, years, output_dir):
+    """
+    Merge VOC2007 and VOC2012 to output_dir and create following list:
+        1. train.txt
+        2. val.txt
+        3. test.txt
+    """
+    os.makedirs(osp.join(output_dir, 'Annotations/'))
+    os.makedirs(osp.join(output_dir, 'ImageSets/Main/'))
+    os.makedirs(osp.join(output_dir, 'JPEGImages/'))
+
+    trainval_list = []
+    test_list = []
+    for year in years:
+        trainval, test = _walk_voc_dir(devkit_dir, year, output_dir)
+        trainval_list.extend(trainval)
+        test_list.extend(test)
+
+    main_dir = osp.join(output_dir, 'ImageSets/Main/')
+    random.shuffle(trainval_list)
+    with open(osp.join(main_dir, 'train.txt'), 'w') as ftrainval:
+        for item in trainval_list:
+            ftrainval.write(item + '\n')
+
+    with open(osp.join(main_dir, 'val.txt'), 'w') as fval:
+        with open(osp.join(main_dir, 'test.txt'), 'w') as ftest:
+            ct = 0
+            for item in test_list:
+                ct += 1
+                fval.write(item + '\n')
+                if ct <= 1000:
+                    ftest.write(item + '\n')
+
+
+def _get_voc_dir(devkit_dir, year, type):
+    return osp.join(devkit_dir, 'VOC' + year, type)
+
+
+def _walk_voc_dir(devkit_dir, year, output_dir):
+    filelist_dir = _get_voc_dir(devkit_dir, year, 'ImageSets/Main')
+    annotation_dir = _get_voc_dir(devkit_dir, year, 'Annotations')
+    img_dir = _get_voc_dir(devkit_dir, year, 'JPEGImages')
+    trainval_list = []
+    test_list = []
+    added = set()
+
+    for _, _, files in os.walk(filelist_dir):
+        for fname in files:
+            img_ann_list = []
+            if re.match('[a-z]+_trainval\.txt', fname):
+                img_ann_list = trainval_list
+            elif re.match('[a-z]+_test\.txt', fname):
+                img_ann_list = test_list
+            else:
+                continue
+            fpath = osp.join(filelist_dir, fname)
+            for line in open(fpath):
+                name_prefix = line.strip().split()[0]
+                if name_prefix in added:
+                    continue
+                added.add(name_prefix)
+                ann_path = osp.join(annotation_dir, name_prefix + '.xml')
+                img_path = osp.join(img_dir, name_prefix + '.jpg')
+                new_ann_path = osp.join(output_dir, 'Annotations/',
+                                        name_prefix + '.xml')
+                new_img_path = osp.join(output_dir, 'JPEGImages/',
+                                        name_prefix + '.jpg')
+                shutil.copy(ann_path, new_ann_path)
+                shutil.copy(img_path, new_img_path)
+                img_ann_list.append(name_prefix)
+
+    return trainval_list, test_list
--- a/requirements.txt
+++ b/requirements.txt
+tqdm
+docstring_parser @ http://github.com/willthefrog/docstring_parser/tarball/master
+typeguard ; python_version >= '3.4'
--- a/tools/configure.py
+++ b/tools/configure.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import print_function
+
+import re
+import sys
+from argparse import ArgumentParser, RawDescriptionHelpFormatter
+
+import yaml
+
+from ppdet.core.workspace import get_registered_modules, load_config
+from ppdet.utils.cli import ColorTTY
+
+color_tty = ColorTTY()
+
+MISC_CONFIG = {
+    "architecture": "<value>",
+    "max_iters": "<value>",
+    "train_feed": "<value>",
+    "eval_feed": "<value>",
+    "test_feed": "<value>",
+    "pretrain_weights": "<value>",
+    "save_dir": "<value>",
+    "weights": "<value>",
+    "metric": "<value>",
+    "log_smooth_window": 20,
+    "snapshot_iter": 10000,
+    "use_gpu": True,
+}
+
+
+def dump_value(value):
+    # XXX this is hackish, but collections.abc is not available in python 2
+    if hasattr(value, '__dict__') or isinstance(value, (dict, tuple, list)):
+        value = yaml.dump(value, default_flow_style=True)
+        value = value.replace('\n', '')
+        value = value.replace('...', '')
+        return "'{}'".format(value)
+    else:
+        # primitive types
+        return str(value)
+
+
+def dump_config(module, minimal=False):
+    args = module.schema.values()
+    if minimal:
+        args = [arg for arg in args if not arg.has_default()]
+    return yaml.dump(
+        {
+            module.name: {
+                arg.name: arg.default if arg.has_default() else "<value>"
+                for arg in args
+            }
+        },
+        default_flow_style=False,
+        default_style='')
+
+
+def list_modules(**kwargs):
+    target_category = kwargs['category']
+    module_schema = get_registered_modules()
+    module_by_category = {}
+
+    for schema in module_schema.values():
+        category = schema.category
+        if target_category is not None and schema.category != target_category:
+            continue
+        if category not in module_by_category:
+            module_by_category[category] = [schema]
+        else:
+            module_by_category[category].append(schema)
+
+    for cat, modules in module_by_category.items():
+        print("Available modules in the category '{}':".format(cat))
+        print("")
+        max_len = max([len(mod.name) for mod in modules])
+        for mod in modules:
+            print(color_tty.green(mod.name.ljust(max_len)),
+                  mod.doc.split('\n')[0])
+        print("")
+
+
+def help_module(**kwargs):
+    schema = get_registered_modules()[kwargs['module']]
+
+    doc = schema.doc is None and "Not documented" or "{}".format(schema.doc)
+    func_args = {arg.name: arg.doc for arg in schema.schema.values()}
+    max_len = max([len(k) for k in func_args.keys()])
+    opts = "\n".join([
+        "{} {}".format(color_tty.green(k.ljust(max_len)), v)
+        for k, v in func_args.items()
+    ])
+    template = dump_config(schema)
+    print("{}\n\n{}\n\n{}\n\n{}\n\n{}\n\n{}\n{}\n".format(
+        color_tty.bold(color_tty.blue("MODULE DESCRIPTION:")),
+        doc,
+        color_tty.bold(color_tty.blue("MODULE OPTIONS:")),
+        opts,
+        color_tty.bold(color_tty.blue("CONFIGURATION TEMPLATE:")),
+        template,
+        color_tty.bold(color_tty.blue("COMMAND LINE OPTIONS:")), ))
+    for arg in schema.schema.values():
+        print("--opt {}.{}={}".format(schema.name, arg.name,
+                                      dump_value(arg.default)
+                                      if arg.has_default() else "<value>"))
+
+
+def generate_config(**kwargs):
+    minimal = kwargs['minimal']
+    modules = kwargs['modules']
+    module_schema = get_registered_modules()
+    visited = []
+    schema = []
+
+    def walk(m):
+        if m in visited:
+            return
+        s = module_schema[m]
+        schema.append(s)
+        visited.append(m)
+
+    for mod in modules:
+        walk(mod)
+
+    # XXX try to be smart about when to add header,
+    # if any "architecture" module, is included, head will be added as well
+    if any([getattr(m, 'category', None) == 'architecture' for m in schema]):
+        # XXX for ordered printing
+        header = ""
+        for k, v in MISC_CONFIG.items():
+            header += yaml.dump(
+                {
+                    k: v
+                }, default_flow_style=False, default_style='')
+        print(header)
+
+    for s in schema:
+        print(dump_config(s, minimal))
+
+
+# FIXME this is pretty hackish, maybe implement a custom YAML printer?
+def analyze_config(**kwargs):
+    config = load_config(kwargs['file'])
+    modules = get_registered_modules()
+    green = '___{}___'.format(color_tty.colors.index('green') + 31)
+
+    styled = {}
+    for key in config.keys():
+        if not config[key]:  # empty schema
+            continue
+
+        if key not in modules and not hasattr(config[key], '__dict__'):
+            styled[key] = config[key]
+            continue
+        elif key in modules:
+            module = modules[key]
+        else:
+            type_name = type(config[key]).__name__
+            if type_name in modules:
+                module = modules[type_name].copy()
+                module.update({
+                    k: v
+                    for k, v in config[key].__dict__.items()
+                    if k in module.schema
+                })
+                key += " ({})".format(type_name)
+        default = module.find_default_keys()
+        missing = module.find_missing_keys()
+        mismatch = module.find_mismatch_keys()
+        extra = module.find_extra_keys()
+        dep_missing = []
+        for dep in module.inject:
+            if isinstance(module[dep], str) and module[dep] != '<value>':
+                if module[dep] not in modules:  # not a valid module
+                    dep_missing.append(dep)
+                else:
+                    dep_mod = modules[module[dep]]
+                    # empty dict but mandatory
+                    if not dep_mod and dep_mod.mandatory():
+                        dep_missing.append(dep)
+        override = list(
+            set(module.keys()) - set(default) - set(extra) - set(dep_missing))
+        replacement = {}
+        for name in set(override + default + extra + mismatch + missing):
+            new_name = name
+            if name in missing:
+                value = "<missing>"
+            else:
+                value = module[name]
+
+            if name in extra:
+                value = dump_value(value) + " <extraneous>"
+            elif name in mismatch:
+                value = dump_value(value) + " <type mismatch>"
+            elif name in dep_missing:
+                value = dump_value(value) + " <module config missing>"
+            elif name in override and value != '<missing>':
+                mark = green
+                new_name = mark + name
+            replacement[new_name] = value
+        styled[key] = replacement
+    buffer = yaml.dump(styled, default_flow_style=False, default_style='')
+    buffer = (re.sub(r"<missing>", r"[31m<missing>[0m", buffer))
+    buffer = (re.sub(r"<extraneous>", r"[33m<extraneous>[0m", buffer))
+    buffer = (re.sub(r"<type mismatch>", r"[31m<type mismatch>[0m", buffer))
+    buffer = (re.sub(r"<module config missing>",
+                     r"[31m<module config missing>[0m", buffer))
+    buffer = re.sub(r"___(\d+)___(.*?):", r"[\1m\2[0m:", buffer)
+    print(buffer)
+
+
+if __name__ == '__main__':
+    argv = sys.argv[1:]
+
+    parser = ArgumentParser(formatter_class=RawDescriptionHelpFormatter)
+    subparsers = parser.add_subparsers(help='Supported Commands')
+    list_parser = subparsers.add_parser("list", help="list available modules")
+    help_parser = subparsers.add_parser(
+        "help", help="show detail options for module")
+    generate_parser = subparsers.add_parser(
+        "generate", help="generate configuration template")
+    analyze_parser = subparsers.add_parser(
+        "analyze", help="analyze configuration file")
+
+    list_parser.set_defaults(func=list_modules)
+    help_parser.set_defaults(func=help_module)
+    generate_parser.set_defaults(func=generate_config)
+    analyze_parser.set_defaults(func=analyze_config)
+
+    list_group = list_parser.add_mutually_exclusive_group()
+    list_group.add_argument(
+        "-c",
+        "--category",
+        type=str,
+        default=None,
+        help="list modules for <category>")
+
+    help_parser.add_argument(
+        "module",
+        help="module to show info for",
+        choices=list(get_registered_modules().keys()))
+
+    generate_parser.add_argument(
+        "modules",
+        nargs='+',
+        help="include these module in generated configuration template",
+        choices=list(get_registered_modules().keys()))
+    generate_group = generate_parser.add_mutually_exclusive_group()
+    generate_group.add_argument(
+        "--minimal", action='store_true', help="only include required options")
+    generate_group.add_argument(
+        "--full",
+        action='store_false',
+        dest='minimal',
+        help="include all options")
+
+    analyze_parser.add_argument("file", help="configuration file to analyze")
+
+    if len(sys.argv) < 2:
+        parser.print_help()
+        sys.exit(1)
+
+    args = parser.parse_args(argv)
+    if hasattr(args, 'func'):
+        args.func(**vars(args))
--- a/tools/eval.py
+++ b/tools/eval.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import multiprocessing
+
+import paddle.fluid as fluid
+
+from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
+import ppdet.utils.checkpoint as checkpoint
+from ppdet.utils.cli import ArgsParser
+from ppdet.modeling.model_input import create_feed
+from ppdet.data.data_feed import create_reader
+from ppdet.core.workspace import load_config, merge_config, create
+
+import logging
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT)
+logger = logging.getLogger(__name__)
+
+
+def main():
+    """
+    Main evaluate function
+    """
+    cfg = load_config(FLAGS.config)
+    if 'architecture' in cfg:
+        main_arch = cfg.architecture
+    else:
+        raise ValueError("'architecture' not specified in config file.")
+
+    merge_config(FLAGS.opt)
+
+    if cfg.use_gpu:
+        devices_num = fluid.core.get_cuda_device_count()
+    else:
+        devices_num = int(
+            os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
+
+    if 'eval_feed' not in cfg:
+        eval_feed = create(main_arch + 'EvalFeed')
+    else:
+        eval_feed = create(cfg.eval_feed)
+
+    # define executor
+    place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    # build program
+    model = create(main_arch)
+    startup_prog = fluid.Program()
+    eval_prog = fluid.Program()
+    with fluid.program_guard(eval_prog, startup_prog):
+        with fluid.unique_name.guard():
+            pyreader, feed_vars = create_feed(eval_feed)
+            fetches = model.eval(feed_vars)
+    eval_prog = eval_prog.clone(True)
+
+    reader = create_reader(eval_feed)
+    pyreader.decorate_sample_list_generator(reader, place)
+
+    # compile program for multi-devices
+    if devices_num <= 1:
+        compile_program = fluid.compiler.CompiledProgram(eval_prog)
+    else:
+        build_strategy = fluid.BuildStrategy()
+        build_strategy.memory_optimize = False
+        build_strategy.enable_inplace = False
+        compile_program = fluid.compiler.CompiledProgram(
+            eval_prog).with_data_parallel(build_strategy=build_strategy)
+
+    # load model
+    exe.run(startup_prog)
+    if 'weights' in cfg:
+        checkpoint.load_pretrain(exe, eval_prog, cfg.weights)
+
+    extra_keys = []
+    if 'metric' in cfg and cfg.metric == 'COCO':
+        extra_keys = ['im_info', 'im_id', 'im_shape']
+
+    keys, values, cls = parse_fetches(fetches, eval_prog, extra_keys)
+
+    results = eval_run(exe, compile_program, pyreader, keys, values, cls)
+    # evaluation
+    resolution = None
+    if 'mask' in results[0]:
+        resolution = model.mask_head.resolution
+    eval_results(results, eval_feed, cfg.metric, resolution, FLAGS.output_file)
+
+
+if __name__ == '__main__':
+    parser = ArgsParser()
+    parser.add_argument(
+        "-f",
+        "--output_file",
+        default=None,
+        type=str,
+        help="Evaluation file name, default to bbox.json and mask.json.")
+    FLAGS = parser.parse_args()
+    main()
--- a/tools/infer.py
+++ b/tools/infer.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import glob
+
+import numpy as np
+from PIL import Image
+
+from paddle import fluid
+
+from ppdet.core.workspace import load_config, merge_config, create
+from ppdet.modeling.model_input import create_feed
+from ppdet.data.data_feed import create_reader
+
+from ppdet.utils.eval_utils import parse_fetches
+from ppdet.utils.cli import ArgsParser
+from ppdet.utils.visualizer import visualize_results
+import ppdet.utils.checkpoint as checkpoint
+
+import logging
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT)
+logger = logging.getLogger(__name__)
+
+
+def get_save_image_name(output_dir, image_path):
+    """
+    Get save image name from source image path.
+    """
+    if not os.path.exists(output_dir):
+        os.makedirs(output_dir)
+    image_name = image_path.split('/')[-1]
+    name, ext = os.path.splitext(image_name)
+    return os.path.join(output_dir, "{}".format(name)) + ext
+
+
+def get_test_images(infer_dir, infer_img):
+    """
+    Get image path list in TEST mode
+    """
+    assert infer_img is not None or infer_dir is not None, \
+        "--infer_img or --infer_dir should be set"
+    assert infer_img is None or os.path.isfile(infer_img), \
+            "{} is not a file".format(infer_img)
+    assert infer_dir is None or os.path.isdir(infer_dir), \
+            "{} is not a directory".format(infer_dir)
+    images = []
+
+    # infer_img has a higher priority
+    if infer_img and os.path.isfile(infer_img):
+        images.append(infer_img)
+        return images
+
+    infer_dir = os.path.abspath(infer_dir)
+    assert os.path.isdir(infer_dir), \
+        "infer_dir {} is not a directory".format(infer_dir)
+    exts = ['jpg', 'jpeg', 'png', 'bmp']
+    exts += [ext.upper() for ext in exts]
+    for ext in exts:
+        images.extend(glob.glob('{}/*.{}'.format(infer_dir, ext)))
+
+    assert len(images) > 0, "no image found in {}".format(infer_dir)
+    logger.info("Found {} inference images in total.".format(len(images)))
+
+    return images
+
+
+def main():
+    cfg = load_config(FLAGS.config)
+
+    if 'architecture' in cfg:
+        main_arch = cfg.architecture
+    else:
+        raise ValueError("'architecture' not specified in config file.")
+
+    merge_config(FLAGS.opt)
+
+    if 'test_feed' not in cfg:
+        test_feed = create(main_arch + 'TestFeed')
+    else:
+        test_feed = create(cfg.test_feed)
+
+    test_images = get_test_images(FLAGS.infer_dir, FLAGS.infer_img)
+    test_feed.dataset.add_images(test_images)
+
+    place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    model = create(main_arch)
+
+    startup_prog = fluid.Program()
+    infer_prog = fluid.Program()
+    with fluid.program_guard(infer_prog, startup_prog):
+        with fluid.unique_name.guard():
+            _, feed_vars = create_feed(test_feed, use_pyreader=False)
+            test_fetches = model.test(feed_vars)
+    infer_prog = infer_prog.clone(True)
+
+    reader = create_reader(test_feed)
+    feeder = fluid.DataFeeder(place=place, feed_list=feed_vars.values())
+
+    exe.run(startup_prog)
+    if cfg.weights:
+        checkpoint.load_checkpoint(exe, infer_prog, cfg.weights)
+
+    # parse infer fetches
+    extra_keys = []
+    if cfg['metric'] == 'COCO':
+        extra_keys = ['im_info', 'im_id', 'im_shape']
+    if cfg['metric'] == 'VOC':
+        extra_keys = ['im_id']
+    keys, values, _ = parse_fetches(test_fetches, infer_prog, extra_keys)
+
+    # parse dataset category
+    if cfg.metric == 'COCO':
+        from ppdet.utils.coco_eval import bbox2out, mask2out, get_category_info
+    if cfg.metric == "VOC":
+        from ppdet.utils.voc_eval import bbox2out, get_category_info
+
+    anno_file = getattr(test_feed.dataset, 'annotation', None)
+    with_background = getattr(test_feed, 'with_background', True)
+    use_default_label = getattr(test_feed, 'use_default_label', False)
+    clsid2catid, catid2name = get_category_info(anno_file, with_background,
+                                                use_default_label)
+
+    imid2path = reader.imid2path
+    for iter_id, data in enumerate(reader()):
+        outs = exe.run(infer_prog,
+                       feed=feeder.feed(data),
+                       fetch_list=values,
+                       return_numpy=False)
+        res = {
+            k: (np.array(v), v.recursive_sequence_lengths())
+            for k, v in zip(keys, outs)
+        }
+        logger.info('Infer iter {}'.format(iter_id))
+
+        bbox_results = None
+        mask_results = None
+        is_bbox_normalized = True if cfg.metric == 'VOC' else False
+        if 'bbox' in res:
+            bbox_results = bbox2out([res], clsid2catid, is_bbox_normalized)
+        if 'mask' in res:
+            mask_results = mask2out([res], clsid2catid,
+                                    model.mask_head.resolution)
+
+        # visualize result
+        im_ids = res['im_id'][0]
+        for im_id in im_ids:
+            image_path = imid2path[int(im_id)]
+            image = Image.open(image_path).convert('RGB')
+            image = visualize_results(image,
+                                      int(im_id), catid2name, 0.5, bbox_results,
+                                      mask_results, is_bbox_normalized)
+            save_name = get_save_image_name(FLAGS.output_dir, image_path)
+            logger.info("Detection bbox results save in {}".format(save_name))
+            image.save(save_name)
+
+
+if __name__ == '__main__':
+    parser = ArgsParser()
+    parser.add_argument(
+        "--infer_dir",
+        type=str,
+        default=None,
+        help="Directory for images to perform inference on.")
+    parser.add_argument(
+        "--infer_img",
+        type=str,
+        default=None,
+        help="Image path, has higher priority over --infer_dir")
+    parser.add_argument(
+        "--output_dir",
+        type=str,
+        default="output",
+        help="Directory for storing the output visualization files.")
+    FLAGS = parser.parse_args()
+    main()
--- a/tools/train.py
+++ b/tools/train.py
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import time
+import multiprocessing
+import numpy as np
+
+
+def set_paddle_flags(**kwargs):
+    for key, value in kwargs.items():
+        if os.environ.get(key, None) is None:
+            os.environ[key] = str(value)
+
+
+# NOTE(paddle-dev): All of these flags should be
+# set before `import paddle`. Otherwise, it would
+# not take any effect. 
+set_paddle_flags(
+    FLAGS_eager_delete_tensor_gb=0,  # enable GC to save memory
+)
+
+from paddle import fluid
+
+from ppdet.core.workspace import load_config, merge_config, create
+from ppdet.data.data_feed import create_reader
+
+from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
+from ppdet.utils.stats import TrainingStats
+from ppdet.utils.cli import ArgsParser
+import ppdet.utils.checkpoint as checkpoint
+from ppdet.modeling.model_input import create_feed
+
+import logging
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT)
+logger = logging.getLogger(__name__)
+
+
+def main():
+    cfg = load_config(FLAGS.config)
+
+    if 'architecture' in cfg:
+        main_arch = cfg.architecture
+    else:
+        raise ValueError("'architecture' not specified in config file.")
+
+    merge_config(FLAGS.opt)
+
+    if cfg.use_gpu:
+        devices_num = fluid.core.get_cuda_device_count()
+    else:
+        devices_num = int(
+            os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
+
+    if 'train_feed' not in cfg:
+        train_feed = create(main_arch + 'TrainFeed')
+    else:
+        train_feed = create(cfg.train_feed)
+
+    if FLAGS.eval:
+        if 'eval_feed' not in cfg:
+            eval_feed = create(main_arch + 'EvalFeed')
+        else:
+            eval_feed = create(cfg.eval_feed)
+
+    place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    model = create(main_arch)
+    lr_builder = create('LearningRate')
+    optim_builder = create('OptimizerBuilder')
+
+    # build program
+    startup_prog = fluid.Program()
+    train_prog = fluid.Program()
+    with fluid.program_guard(train_prog, startup_prog):
+        with fluid.unique_name.guard():
+            train_pyreader, feed_vars = create_feed(train_feed)
+            train_fetches = model.train(feed_vars)
+            loss = train_fetches['loss']
+            lr = lr_builder()
+            optimizer = optim_builder(lr)
+            optimizer.minimize(loss)
+
+    train_reader = create_reader(train_feed, cfg.max_iters * devices_num)
+    train_pyreader.decorate_sample_list_generator(train_reader, place)
+
+    # parse train fetches
+    train_keys, train_values, _ = parse_fetches(train_fetches)
+    train_values.append(lr)
+
+    if FLAGS.eval:
+        eval_prog = fluid.Program()
+        with fluid.program_guard(eval_prog, startup_prog):
+            with fluid.unique_name.guard():
+                eval_pyreader, feed_vars = create_feed(eval_feed)
+                fetches = model.eval(feed_vars)
+        eval_prog = eval_prog.clone(True)
+
+        eval_reader = create_reader(eval_feed)
+        eval_pyreader.decorate_sample_list_generator(eval_reader, place)
+
+        # parse train fetches
+        extra_keys = ['im_info', 'im_id'] if cfg.metric == 'COCO' else []
+        eval_keys, eval_values, eval_cls = parse_fetches(fetches, eval_prog,
+                                                         extra_keys)
+
+    # compile program for multi-devices
+    build_strategy = fluid.BuildStrategy()
+    build_strategy.memory_optimize = False
+    build_strategy.enable_inplace = True
+    sync_bn = getattr(model.backbone, 'norm_type', None) == 'sync_bn'
+    build_strategy.sync_batch_norm = sync_bn
+    train_compile_program = fluid.compiler.CompiledProgram(
+        train_prog).with_data_parallel(
+            loss_name=loss.name, build_strategy=build_strategy)
+    if FLAGS.eval:
+        eval_compile_program = fluid.compiler.CompiledProgram(eval_prog)
+
+    exe.run(startup_prog)
+
+    freeze_bn = getattr(model.backbone, 'freeze_norm', False)
+    if FLAGS.resume_checkpoint:
+        checkpoint.load_checkpoint(exe, train_prog, FLAGS.resume_checkpoint)
+    elif cfg.pretrain_weights and freeze_bn:
+        checkpoint.load_and_fusebn(exe, train_prog, cfg.pretrain_weights)
+    elif cfg.pretrain_weights:
+        checkpoint.load_pretrain(exe, train_prog, cfg.pretrain_weights)
+
+    train_stats = TrainingStats(cfg.log_smooth_window, train_keys)
+    train_pyreader.start()
+    start_time = time.time()
+    end_time = time.time()
+
+    cfg_name = os.path.basename(FLAGS.config).split('.')[0]
+    save_dir = os.path.join(cfg.save_dir, cfg_name)
+    for it in range(cfg.max_iters):
+        start_time = end_time
+        end_time = time.time()
+        outs = exe.run(train_compile_program, fetch_list=train_values)
+        stats = {k: np.array(v).mean() for k, v in zip(train_keys, outs[:-1])}
+        train_stats.update(stats)
+        logs = train_stats.log()
+        strs = 'iter: {}, lr: {:.6f}, {}, time: {:.3f}'.format(
+            it, np.mean(outs[-1]), logs, end_time - start_time)
+        logger.info(strs)
+
+        if it > 0 and it % cfg.snapshot_iter == 0:
+            checkpoint.save(exe, train_prog, os.path.join(save_dir, str(it)))
+
+            if FLAGS.eval:
+                # evaluation
+                results = eval_run(exe, eval_compile_program, eval_pyreader,
+                                   eval_keys, eval_values, eval_cls)
+                resolution = None
+                if 'mask' in results[0]:
+                    resolution = model.mask_head.resolution
+                eval_results(results, eval_feed, cfg.metric, resolution,
+                             FLAGS.output_file)
+
+    checkpoint.save(exe, train_prog, os.path.join(save_dir, "model_final"))
+    train_pyreader.reset()
+
+
+if __name__ == '__main__':
+    parser = ArgsParser()
+    parser.add_argument(
+        "-r",
+        "--resume_checkpoint",
+        default=None,
+        type=str,
+        help="Checkpoint path for resuming training.")
+    parser.add_argument(
+        "--eval",
+        action='store_true',
+        default=False,
+        help="Whether to perform evaluation in train")
+    parser.add_argument(
+        "-f",
+        "--output_file",
+        default=None,
+        type=str,
+        help="Evaluation file name, default to bbox.json and mask.json.")
+    FLAGS = parser.parse_args()
+    main()