Update PaddleDetection (#3787)

* update release/1.6 from develop * update api * fix config for obj365 model

Update PaddleDetection (#3787)
* update release/1.6 from develop * update api * fix config for obj365 model
9d31d6e3 · wangguanzhong · GitHub · 5e353a87 · 9d31d6e3 · 9d31d6e3
117 changed file
--- a/PaddleCV/PaddleDetection/.gitignore
+++ b/PaddleCV/PaddleDetection/.gitignore
@@ -56,3 +56,9 @@ coverage.xml
 /docs/_build/

 *.json
+
+
+dataset/coco/annotations
+dataset/coco/train2017
+dataset/coco/val2017
+dataset/voc/VOCdevkit
--- a/PaddleCV/PaddleDetection/README.md
+++ b/PaddleCV/PaddleDetection/README.md
@@ -32,97 +32,102 @@ changes.
 - Performance Optimized:

  With the help of the underlying PaddlePaddle framework, faster training and
-reduced GPU memory footprint is achieved. Notably, Yolo V3 training is
+reduced GPU memory footprint is achieved. Notably, YOLOv3 training is
 much faster compared to other frameworks. Another example is Mask-RCNN
 (ResNet50), we managed to fit up to 4 images per GPU (Tesla V100 16GB) during
 multi-GPU training.

 Supported Architectures:

-|                    | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG |
-|--------------------|:------:|------------------------------:|:----------:|:-----:|:---------:|:-------:|:---:|
-| Faster R-CNN       | ✓      |                             ✓ | x          | ✓     | ✗         | ✗       | ✗   |
-| Faster R-CNN + FPN | ✓      |                             ✓ | ✓          | ✓     | ✗         | ✗       | ✗   |
-| Mask R-CNN         | ✓      |                             ✓ | x          | ✓     | ✗         | ✗       | ✗   |
-| Mask R-CNN + FPN   | ✓      |                             ✓ | ✓          | ✓     | ✗         | ✗       | ✗   |
-| Cascade R-CNN      | ✓      |                             ✗ | ✗          | ✗     | ✗         | ✗       | ✗   |
-| RetinaNet          | ✓      |                             ✗ | ✗          | ✗     | ✗         | ✗       | ✗   |
-| Yolov3             | ✓      |                             ✗ | ✗          | ✗     | ✓         | ✓       | ✗   |
-| SSD                | ✗      |                             ✗ | ✗          | ✗     | ✓         | ✗       | ✓   |
+|                     | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | DarkNet | VGG  |
+| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: | :-----: | :--: |
+| Faster R-CNN        |   ✓    |                             ✓ |     x      |   ✓   |     ✗     |    ✗    |  ✗   |
+| Faster R-CNN + FPN  |   ✓    |                             ✓ |     ✓      |   ✓   |     ✗     |    ✗    |  ✗   |
+| Mask R-CNN          |   ✓    |                             ✓ |     x      |   ✓   |     ✗     |    ✗    |  ✗   |
+| Mask R-CNN + FPN    |   ✓    |                             ✓ |     ✓      |   ✓   |     ✗     |    ✗    |  ✗   |
+| Cascade Faster-RCNN |   ✓    |                             ✓ |     ✓      |   ✗   |     ✗     |    ✗    |  ✗   |
+| Cascade Mask-RCNN   |   ✓    |                             ✗ |     ✗      |   ✓   |     ✗     |    ✗    |  ✗   |
+| RetinaNet           |   ✓    |                             ✗ |     ✗      |   ✗   |     ✗     |    ✗    |  ✗   |
+| YOLOv3              |   ✓    |                             ✗ |     ✗      |   ✗   |     ✓     |    ✓    |  ✗   |
+| SSD                 |   ✗    |                             ✗ |     ✗      |   ✗   |     ✓     |    ✗    |  ✓   |

 <a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost.

 Advanced Features:

- [x] **Synchronized Batch Norm**: currently used by Yolo V3.
- [x] **Group Norm**: pretrained models to be released.
- [x] **Modulated Deformable Convolution**: pretrained models to be released.
- [x] **Deformable PSRoI Pooling**: pretrained models to be released.
+- [x] **Synchronized Batch Norm**: currently used by YOLOv3.
+- [x] **Group Norm**
+- [x] **Modulated Deformable Convolution**
+- [x] **Deformable PSRoI Pooling**

 **NOTE:** Synchronized batch normalization can only be used on multiple GPU devices, can not be used on CPU devices or single GPU device.

+## Get Started

-## Model zoo
-
-Pretrained models are available in the PaddlePaddle [PaddleDetection model zoo](docs/MODEL_ZOO.md).
-
+- [Installation guide](docs/INSTALL.md)
+- [Quick start on small dataset](docs/QUICK_STARTED.md)
+- [Guide to traing, evaluate and arguments description](docs/GETTING_STARTED.md)
+- [Guide to preprocess pipeline and custom dataset](docs/DATA.md)
+- [Introduction to the configuration workflow](docs/CONFIG.md)
+- [Examples for detailed configuration explanation](docs/config_example/)
+- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
+- [Transfer learning document](docs/TRANSFER_LEARNING.md)

-## Installation
+## Model Zoo

-Please follow the [installation guide](docs/INSTALL.md).
+- Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md).
+- [Face detection models](configs/face_detection/README.md)
+- [Pretrained models for pedestrian  and vehicle detection](contrib/README.md)

+## Model compression

-## Get Started
+- [ Quantification aware training example](slim/quantization)
+- [ Pruning compression example](slim/prune)

-For inference, simply run the following command and the visualized result will
-be saved in `output`.
+## Depoly

-```bash
-export PYTHONPATH=`pwd`:$PYTHONPATH
-python tools/infer.py -c configs/mask_rcnn_r50_1x.yml \
-    -o weights=https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_1x.tar \
-    --infer_img=demo/000000570688.jpg
-```
+- [Export model for inference depolyment](docs/EXPORT_MODEL.md)
+- [C++ inference depolyment](inference/README.md)

-For detailed training and evaluation workflow, please refer to [GETTING_STARTED.md](docs/GETTING_STARTED.md).
+## Benchmark

-For detailed configuration and parameter description, please refer to [Complete config files](docs/config_example/)
+- [Inference benchmark](docs/BENCHMARK_INFER_cn.md)

-We also recommend users to take a look at the [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)

-Further information can be found in these documentations:
+## Updates

- [Introduction to the configuration workflow.](docs/CONFIG.md)
- [Guide to custom dataset and preprocess pipeline.](docs/DATA.md)
+#### 10/2019

+- Face detection models included: BlazeFace, Faceboxes.
+- Enrich COCO models,  box mAP up to 51.9%.
+- Add CACacascade RCNN, one of the best single model of Objects365 2019 challenge Full Track champion.
+- Add pretrained models for pedestrian and vehicle detection.
+- Support mixed-precision training.
+- Add C++ inference depolyment.
+- Add model compression examples.

-##  Todo List
+#### 2/9/2019

-Please note this is a work in progress, substantial changes may come in the
-near future.
-Some of the planned features include:
+- Add retrained models for GroupNorm.

- [ ] Mixed precision training.
- [ ] Distributed training.
- [ ] Inference in 8-bit mode.
- [ ] User defined operations.
- [ ] Larger model zoo.
+- Add Cascade-Mask-RCNN+FPN.

+#### 5/8/2019

-## Updates
+- Add a series of models ralated modulated Deformable Convolution.

 #### 7/29/2019

 - Update Chinese docs for PaddleDetection
 - Fix bug in R-CNN models when train and test at the same time
 - Add ResNext101-vd + Mask R-CNN + FPN models
- Add Yolo v3 on VOC models
+- Add YOLOv3 on VOC models

 #### 7/3/2019

 - Initial release of PaddleDetection and detection model zoo
 - Models included: Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
-  R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, Yolo v3, and SSD.
+  R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, and SSD.


 ## Contributing

--- a/PaddleCV/PaddleDetection/README_cn.md
+++ b/PaddleCV/PaddleDetection/README_cn.md
@@ -2,7 +2,7 @@

 # PaddleDetection

-PaddleDetection的目的是为工业界和学术界提供大量易使用的目标检测模型。PaddleDetection不仅性能完善，易于部署，同时能够灵活的满足算法研发需求。
+PaddleDetection的目的是为工业界和学术界提供丰富、易用的目标检测模型。不仅性能优越、易于部署，而且能够灵活的满足算法研究的需求。

 **目前检测库下模型均要求使用PaddlePaddle 1.6及以上版本或适当的develop版本。**

@@ -17,15 +17,15 @@ PaddleDetection的目的是为工业界和学术界提供大量易使用的目

 - 易部署:

-  PaddleDetection的模型中使用的主要算子均通过C++和CUDA实现，配合PaddlePaddle的高性能预测引擎，使得在服务器环境下易于部署。
+  PaddleDetection的模型中使用的核心算子均通过C++或CUDA实现，同时基于PaddlePaddle的高性能推理引擎可以方便地部署在多种硬件平台上。

 - 高灵活度：

-  PaddleDetection各个组件均为功能单元。例如，模型结构，数据预处理流程，用户能够通过修改配置文件轻松实现可定制化。
+  PaddleDetection通过模块化设计来解耦各个组件，基于配置文件可以轻松地搭建各种检测模型。

 - 高性能：

-  在PaddlePaddle底层框架的帮助下，实现了更快的模型训练及更少的显存占用量。值得注意的是，Yolo v3的训练速度远快于其他框架。另外，Mask-RCNN(ResNet50)可以在Tesla V100 16GB环境下以每个GPU4张图片输入实现多卡训练。
+  基于PaddlePaddle框架的高性能内核，在模型训练速度、显存占用上有一定的优势。例如，YOLOv3的训练速度快于其他框架，在Tesla V100 16GB环境下，Mask-RCNN(ResNet50)可以单卡Batch Size可以达到4 (甚至到5)。

 支持的模型结构：

@@ -35,75 +35,89 @@ PaddleDetection的目的是为工业界和学术界提供大量易使用的目
 | Faster R-CNN + FPN | ✓      |                             ✓ | ✓          | ✓     | ✗         | ✗       | ✗   |
 | Mask R-CNN         | ✓      |                             ✓ | x          | ✓     | ✗         | ✗       | ✗   |
 | Mask R-CNN + FPN   | ✓      |                             ✓ | ✓          | ✓     | ✗         | ✗       | ✗   |
-| Cascade R-CNN      | ✓      |                             ✗ | ✗          | ✗     | ✗         | ✗       | ✗   |
-| RetinaNet          | ✓      |                             ✗ | ✗          | ✗     | ✗         | ✗       | ✗   |
-| Yolov3             | ✓      |                             ✗ | ✗          | ✗     | ✓         | ✓       | ✗   |
+| Cascade Faster-CNN | ✓      |                             ✓ | ✓          | ✗     | ✗         | ✗       | ✗  |
+| Cascade Mask-CNN   | ✓      |                             ✗ | ✗          | ✓     | ✗         | ✗       | ✗   |
+| RetinaNet          | ✓      |                             ✗ | ✓          | ✗     | ✗         | ✗       | ✗   |
+| YOLOv3             | ✓      |                             ✗ | ✗          | ✗     | ✓         | ✓       | ✗   |
 | SSD                | ✗      |                             ✗ | ✗          | ✗     | ✓         | ✗       | ✓   |

 <a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) 模型提供了较大的精度提高和较少的性能损失。

 扩展特性：

- [x] **Synchronized Batch Norm**: 目前在Yolo v3中使用。
- [x] **Group Norm**: 预训练模型待发布。
- [x] **Modulated Deformable Convolution**: 预训练模型待发布。
- [x] **Deformable PSRoI Pooling**: 预训练模型待发布。
+- [x] **Synchronized Batch Norm**: 目前在YOLOv3中使用。
+- [x] **Group Norm**
+- [x] **Modulated Deformable Convolution**
+- [x] **Deformable PSRoI Pooling**

 **注意:** Synchronized batch normalization 只能在多GPU环境下使用，不能在CPU环境或者单GPU环境下使用。

-## 模型库
-
-基于PaddlePaddle训练的目标检测模型可参考[PaddleDetection模型库](docs/MODEL_ZOO_cn.md).

+## 使用教程

-## 安装
+- [安装说明](docs/INSTALL_cn.md)
+- [快速开始](docs/QUICK_STARTED_cn.md)
+- [训练、评估及参数说明](docs/GETTING_STARTED_cn.md)
+- [数据预处理及自定义数据集](docs/DATA_cn.md)
+- [配置模块设计和介绍](docs/CONFIG_cn.md)
+- [详细的配置信息和参数说明示例](docs/config_example/)
+- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
+- [迁移学习教程](docs/TRANSFER_LEARNING_cn.md)

-请参考[安装说明文档](docs/INSTALL_cn.md).
+## 模型库

+- [模型库](docs/MODEL_ZOO_cn.md)
+- [人脸检测模型](configs/face_detection/README.md)
+- [行人检测和车辆检测预训练模型](contrib/README_cn.md)

-## 开始

-## 快速入门
+## 模型压缩
+- [量化训练压缩示例](slim/quantization)
+- [剪枝压缩示例](slim/prune)

-PaddleDetection提供了快速开始的demo利于用户能够快速上手，示例请参考[QUICK_STARTED_cn.md](docs/QUICK_STARTED_cn.md)
+## 推理部署

-更多训练及评估流程，请参考[GETTING_STARTED_cn.md](docs/GETTING_STARTED_cn.md).
+- [模型导出教程](docs/EXPORT_MODEL.md)
+- [C++推理部署](inference/README.md)

-详细的配置信息和参数说明，请参考[示例配置文件](docs/config_example/).
+## Benchmark

-同时推荐用户参考[IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
+- [推理Benchmark](docs/BENCHMARK_INFER_cn.md)

-其他更多信息可参考以下文档内容：

- [配置流程介绍](docs/CONFIG_cn.md)
- [自定义数据集和预处理流程介绍](docs/DATA_cn.md)

+## 版本更新

-## 未来规划
+### 10/2019

-目前PaddleDetection处在持续更新的状态，接下来将会推出一系列的更新，包括如下特性：
+- 增加人脸检测模型BlazeFace、Faceboxes。
+- 丰富基于COCO的模型，精度高达51.9%。
+- 增加Objects365 2019 Challenge上夺冠的最佳单模型之一CACascade-RCNN。
+- 增加行人检测和车辆检测预训练模型。
+- 支持FP16训练。
+- 增加跨平台的C++推理部署方案。
+- 增加模型压缩示例。

- [ ] 混合精度训练
- [ ] 分布式训练
- [ ] Int8模式预测
- [ ] 用户自定义算子
- [ ] 进一步丰富模型库

+### 2/9/2019
+- 增加GroupNorm模型。
+- 增加CascadeRCNN+Mask模型。

-## 版本更新
+#### 5/8/2019
+- 增加Modulated Deformable Convolution系列模型。

 #### 7/22/2019

 - 增加检测库中文文档
 - 修复R-CNN系列模型训练同时进行评估的问题
 - 新增ResNext101-vd + Mask R-CNN + FPN模型
- 新增基于VOC数据集的Yolo v3模型
+- 新增基于VOC数据集的YOLOv3模型

 #### 7/3/2019

 - 首次发布PaddleDetection检测库和检测模型库
 - 模型包括：Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
-  R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, Yolo v3, 和SSD.
+  R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD.

 ## 如何贡献代码


--- a/PaddleCV/PaddleDetection/configs/cascade_rcnn_r50_fpn_1x_ms_test.yml
+++ b/PaddleCV/PaddleDetection/configs/cascade_rcnn_r50_fpn_1x_ms_test.yml
+architecture: CascadeRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 90000
+snapshot_iter: 10000
+use_gpu: true
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar
+weights: output/cascade_rcnn_r50_fpn_1x/model_final
+metric: COCO
+num_classes: 81
+
+CascadeRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: CascadeBBoxHead
+  bbox_assigner: CascadeBBoxAssigner
+
+ResNet:
+  norm_type: affine_channel
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  variant: b
+
+FPN:
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_positive_overlap: 0.7
+    rpn_negative_overlap: 0.3
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  min_level: 2
+  max_level: 5
+  box_resolution: 7
+  sampling_ratio: 2
+
+CascadeBBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [10, 20, 30]
+  bg_thresh_lo: [0.0, 0.0, 0.0]
+  bg_thresh_hi: [0.5, 0.6, 0.7]
+  fg_thresh: [0.5, 0.6, 0.7]
+  fg_fraction: 0.25
+
+CascadeBBoxHead:
+  head: CascadeTwoFCHead
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+
+CascadeTwoFCHead:
+  mlp_dim: 1024
+
+MultiScaleTEST:
+  score_thresh: 0.05
+  nms_thresh: 0.5
+  detections_per_im: 100
+  enable_voting: true
+  vote_thresh: 0.9
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60000, 80000]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  batch_size: 2
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  sample_transforms:
+  - !DecodeImage
+    to_rgb: true
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: true
+    mean:
+    - 0.485
+    - 0.456
+    - 0.406
+    std:
+    - 0.229
+    - 0.224
+    - 0.225
+  - !MultiscaleTestResize
+    origin_target_size: 800
+    origin_max_size: 1333
+    target_size:
+    - 400
+    - 500
+    - 600
+    - 700
+    - 900
+    - 1000
+    - 1100
+    - 1200
+    max_size: 2000
+    use_flip: true
+  - !Permute
+    channel_first: true
+    to_bgr: false
+  batch_transforms:
+  - !PadMSTest
+    pad_to_stride: 32
+  num_scale: 18
+  num_workers: 2
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: dataset/coco/annotations/instances_val2017.json
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
--- a/PaddleCV/PaddleDetection/configs/dcn/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x.yml
+++ b/PaddleCV/PaddleDetection/configs/dcn/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x.yml
+architecture: CascadeMaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+max_iters: 300000
+snapshot_iter: 10
+use_gpu: true
+log_iter: 20
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_caffe_pretrained.tar 
+weights: output/cascade_mask_rcnn_dcn_se154_vd_fpn_gn_s1x/model_final/
+metric: COCO
+num_classes: 81
+
+CascadeMaskRCNN:
+  backbone: SENet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: CascadeBBoxHead
+  bbox_assigner: CascadeBBoxAssigner
+  mask_assigner: MaskAssigner
+  mask_head: MaskHead
+
+SENet:
+  depth: 152
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: bn
+  freeze_norm: True
+  variant: d
+  dcn_v2_stages: [3, 4, 5]
+  std_senet: True
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+  freeze_norm: False
+  norm_type: gn
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  conv_dim: 256
+  num_convs: 4
+  resolution: 28
+  norm_type: gn
+
+CascadeBBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [10, 20, 30]
+  bg_thresh_hi: [0.5, 0.6, 0.7]
+  bg_thresh_lo: [0.0, 0.0, 0.0]
+  fg_fraction: 0.25
+  fg_thresh: [0.5, 0.6, 0.7]
+
+MaskAssigner:
+  resolution: 28
+
+CascadeBBoxHead:
+  head: CascadeXConvNormHead 
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+
+CascadeXConvNormHead:
+  norm_type: gn
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 280000]
+  - !LinearWarmup
+    start_factor: 0.01
+    steps: 2000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+MaskRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  sample_transforms: 
+  - !DecodeImage
+    to_rgb: False
+    with_mixup: False
+  - !RandomFlipImage
+    is_mask_flip: true
+    is_normalized: false
+    prob: 0.5
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: False
+    mean:
+    - 102.9801 
+    - 115.9465
+    - 122.7717
+    std:
+    - 1.0 
+    - 1.0 
+    - 1.0 
+  - !ResizeImage
+    interp: 1
+    target_size:
+    - 416
+    - 448
+    - 480
+    - 512
+    - 544
+    - 576
+    - 608
+    - 640
+    - 672
+    - 704 
+    - 736
+    - 768
+    - 800
+    - 832
+    - 864
+    - 896
+    - 928
+    - 960
+    - 992
+    - 1024
+    - 1056
+    - 1088
+    - 1120
+    - 1152
+    - 1184
+    - 1216
+    - 1248
+    - 1280
+    - 1312
+    - 1344
+    - 1376
+    - 1408
+    max_size: 1600
+    use_cv2: true
+  - !Permute
+    channel_first: true
+    to_bgr: false
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 8
+
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+  sample_transforms: 
+  - !DecodeImage
+    to_rgb: False
+    with_mixup: False
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: False
+    mean:
+    - 102.9801 
+    - 115.9465
+    - 122.7717
+    std:
+    - 1.0 
+    - 1.0 
+    - 1.0 
+  - !ResizeImage
+    interp: 1
+    target_size:
+    - 800
+    max_size: 1333
+    use_cv2: true
+  - !Permute
+    channel_first: true
+    to_bgr: false
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
+
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: dataset/coco/annotations/instances_val2017.json
+  sample_transforms: 
+  - !DecodeImage
+    to_rgb: False
+    with_mixup: False
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: False
+    mean:
+    - 102.9801 
+    - 115.9465
+    - 122.7717
+    std:
+    - 1.0 
+    - 1.0 
+    - 1.0 
+  - !Permute
+    channel_first: true
+    to_bgr: false
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/PaddleCV/PaddleDetection/configs/dcn/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x_ms_test.yml
+++ b/PaddleCV/PaddleDetection/configs/dcn/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x_ms_test.yml
+architecture: CascadeMaskRCNN
+train_feed: MaskRCNNTrainFeed
+eval_feed: MaskRCNNEvalFeed
+test_feed: MaskRCNNTestFeed
+max_iters: 300000
+snapshot_iter: 10000
+use_gpu: true
+log_iter: 20
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_caffe_pretrained.tar 
+weights: output/cascade_mask_rcnn_dcn_se154_vd_fpn_gn_s1x/model_final/
+metric: COCO
+num_classes: 81
+
+CascadeMaskRCNN:
+  backbone: SENet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: CascadeBBoxHead
+  bbox_assigner: CascadeBBoxAssigner
+  mask_assigner: MaskAssigner
+  mask_head: MaskHead
+
+SENet:
+  depth: 152
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: bn
+  freeze_norm: True
+  variant: d
+  dcn_v2_stages: [3, 4, 5]
+  std_senet: True
+
+FPN:
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+  freeze_norm: False
+  norm_type: gn
+
+FPNRPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  max_level: 6
+  min_level: 2
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_negative_overlap: 0.3
+    rpn_positive_overlap: 0.7
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  max_level: 5
+  min_level: 2
+  box_resolution: 7
+  sampling_ratio: 2
+  mask_resolution: 14
+
+MaskHead:
+  dilation: 1
+  conv_dim: 256
+  num_convs: 4
+  resolution: 28
+  norm_type: gn
+
+CascadeBBoxAssigner:
+  batch_size_per_im: 512
+  bbox_reg_weights: [10, 20, 30]
+  bg_thresh_hi: [0.5, 0.6, 0.7]
+  bg_thresh_lo: [0.0, 0.0, 0.0]
+  fg_fraction: 0.25
+  fg_thresh: [0.5, 0.6, 0.7]
+
+MaskAssigner:
+  resolution: 28
+
+CascadeBBoxHead:
+  head: CascadeXConvNormHead 
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+
+CascadeXConvNormHead:
+  norm_type: gn
+
+MultiScaleTEST:
+  score_thresh: 0.05
+  nms_thresh: 0.5
+  detections_per_im: 100
+  enable_voting: true
+  vote_thresh: 0.9
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [240000, 280000]
+  - !LinearWarmup
+    start_factor: 0.01
+    steps: 2000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+MaskRCNNTrainFeed:
+  # batch size per device
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    image_dir: train2017
+    annotation: annotations/instances_train2017.json
+  sample_transforms: 
+  - !DecodeImage
+    to_rgb: False
+    with_mixup: False
+  - !RandomFlipImage
+    is_mask_flip: true
+    is_normalized: false
+    prob: 0.5
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: False
+    mean:
+    - 102.9801 
+    - 115.9465
+    - 122.7717
+    std:
+    - 1.0 
+    - 1.0 
+    - 1.0 
+  - !ResizeImage
+    interp: 1
+    target_size:
+    - 416
+    - 448
+    - 480
+    - 512
+    - 544
+    - 576
+    - 608
+    - 640
+    - 672
+    - 704 
+    - 736
+    - 768
+    - 800
+    - 832
+    - 864
+    - 896
+    - 928
+    - 960
+    - 992
+    - 1024
+    - 1056
+    - 1088
+    - 1120
+    - 1152
+    - 1184
+    - 1216
+    - 1248
+    - 1280
+    - 1312
+    - 1344
+    - 1376
+    - 1408
+    max_size: 1600
+    use_cv2: true
+  - !Permute
+    channel_first: true
+    to_bgr: false
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 8
+
+MaskRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/coco
+    annotation: annotations/instances_val2017_debug_139.json
+    image_dir: val2017
+  sample_transforms: 
+  - !DecodeImage
+    to_rgb: False
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: False
+    mean:
+    - 102.9801 
+    - 115.9465
+    - 122.7717
+    std:
+    - 1.0 
+    - 1.0 
+    - 1.0 
+  - !MultiscaleTestResize
+    origin_target_size: 800
+    origin_max_size: 1333
+    target_size:
+    - 400
+    - 500
+    - 600
+    - 700
+    - 900
+    - 1000
+    - 1100
+    - 1200
+    max_size: 2000
+    use_flip: true
+  - !Permute
+    channel_first: true
+    to_bgr: false
+  batch_transforms:
+  - !PadMSTest
+    pad_to_stride: 32
+  # num_scale = (len(target_size) + 1) * (1 + use_flip)
+  num_scale: 32
+  num_workers: 2
+
+MaskRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: dataset/coco/annotations/instances_val2017.json
+  sample_transforms: 
+  - !DecodeImage
+    to_rgb: False
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: False
+    mean:
+    - 102.9801 
+    - 115.9465
+    - 122.7717
+    std:
+    - 1.0 
+    - 1.0 
+    - 1.0 
+  - !Permute
+    channel_first: true
+    to_bgr: false
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 2
--- a/PaddleCV/PaddleDetection/configs/face_detection/README.md
+++ b/PaddleCV/PaddleDetection/configs/face_detection/README.md
+English | [简体中文](README_cn.md)
+
+# FaceDetection
+The goal of FaceDetection is to provide efficient and high-speed face detection solutions,
+including cutting-edge and classic models.
+
+
+<div align="center">
+  <img src="../../demo/output/12_Group_Group_12_Group_Group_12_935.jpg" />
+</div>
+
+## Data Pipline
+We use the [WIDER FACE dataset](http://shuoyang1213.me/WIDERFACE/) to carry out the training
+and testing of the model, the official website gives detailed data introduction.
+- WIDER Face data source:  
+Loads `wider_face` type dataset with directory structures like this:
+
+  ```
+  dataset/wider_face/
+  ├── wider_face_split
+  │   ├── wider_face_train_bbx_gt.txt
+  │   ├── wider_face_val_bbx_gt.txt
+  ├── WIDER_train
+  │   ├── images
+  │   │   ├── 0--Parade
+  │   │   │   ├── 0_Parade_marchingband_1_100.jpg
+  │   │   │   ├── 0_Parade_marchingband_1_381.jpg
+  │   │   │   │   ...
+  │   │   ├── 10--People_Marching
+  │   │   │   ...
+  ├── WIDER_val
+  │   ├── images
+  │   │   ├── 0--Parade
+  │   │   │   ├── 0_Parade_marchingband_1_1004.jpg
+  │   │   │   ├── 0_Parade_marchingband_1_1045.jpg
+  │   │   │   │   ...
+  │   │   ├── 10--People_Marching
+  │   │   │   ...
+  ```
+
+- Download dataset manually:  
+To download the WIDER FACE dataset, run the following commands:
+```
+cd dataset/wider_face && ./download.sh
+```
+
+- Download dataset automatically:
+If a training session is started but the dataset is not setup properly
+(e.g, not found in dataset/wider_face), PaddleDetection can automatically
+download them from [WIDER FACE dataset](http://shuoyang1213.me/WIDERFACE/),
+the decompressed datasets will be cached in ~/.cache/paddle/dataset/ and can be discovered
+automatically subsequently.
+
+### Data Augmentation
+
+- **Data-anchor-sampling:** Randomly transform the scale of the image to a certain range of scales,
+greatly enhancing the scale change of the face. The specific operation is to obtain $v=\sqrt{width * height}$
+according to the randomly selected face height and width, and judge the value of `v` in which interval of
+ `[16,32,64,128]`. Assuming `v=45` && `32<v<64`, and any value of `[16,32,64]` is selected with a probability
+ of uniform distribution. If `64` is selected, the face's interval is selected in `[64 / 2, min(v * 2, 64 * 2)]`.
+
+- **Other methods:** Including `RandomDistort`,`ExpandImage`,`RandomInterpImage`,`RandomFlipImage` etc.
+Please refer to [DATA.md](../../docs/DATA.md#APIs) for details.
+
+
+##  Benchmark and Model Zoo
+Supported architectures is shown in the below table, please refer to
+[Algorithm Description](#Algorithm-Description) for details of the algorithm.
+
+|                          | Original | Lite <sup>[1](#lite)</sup> | NAS <sup>[2](#nas)</sup> |
+|:------------------------:|:--------:|:--------------------------:|:------------------------:|
+| [BlazeFace](#BlazeFace)  | ✓        |                          ✓ | ✓                        |
+| [FaceBoxes](#FaceBoxes)  | ✓        |                          ✓ | x                        |
+
+<a name="lite">[1]</a> `Lite` edition means reduces the number of network layers and channels.  
+<a name="nas">[2]</a> `NAS` edition means use `Neural Architecture Search` algorithm to
+optimized network structure.
+
+**Todo List:**
+- [ ] HamBox
+- [ ] Pyramidbox
+
+### Model Zoo
+
+#### mAP in WIDER FACE
+
+| Architecture | Type     | Size | Img/gpu | Lr schd | Easy Set  | Medium Set | Hard Set  | Download |
+|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:|
+| BlazeFace    | Original | 640  |    8    | 32w     | **0.915** | **0.892**  | **0.797** | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_original.tar) |
+| BlazeFace    | Lite     | 640  |    8    | 32w     | 0.909     | 0.885      | 0.781     | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) |
+| BlazeFace    | NAS      | 640  |    8    | 32w     | 0.837     | 0.807      | 0.658     | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) |
+| FaceBoxes    | Original | 640  |    8    | 32w     | 0.875     | 0.848      | 0.568     | [model](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_original.tar) |
+| FaceBoxes    | Lite     | 640  |    8    | 32w     | 0.898     | 0.872      | 0.752     | [model](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_lite.tar) |
+
+**NOTES:**  
+- Get mAP in `Easy/Medium/Hard Set` by multi-scale evaluation in `tools/face_eval.py`.
+For details can refer to [Evaluation](#Evaluate-on-the-WIDER-FACE).
+- BlazeFace-Lite Training and Testing ues [blazeface.yml](../../configs/face_detection/blazeface.yml)
+configs file and set `lite_edition: true`.
+
+#### mAP in FDDB
+
+| Architecture | Type     | Size | DistROC | ContROC |
+|:------------:|:--------:|:----:|:-------:|:-------:|
+| BlazeFace    | Original | 640  | **0.992**   | **0.762**   |
+| BlazeFace    | Lite     | 640  | 0.990   | 0.756   |
+| BlazeFace    | NAS      | 640  | 0.981   | 0.741   |
+| FaceBoxes    | Original | 640  | 0.985   | 0.731   |
+| FaceBoxes    | Lite     | 640  | 0.987   | 0.741   |
+
+**NOTES:**  
+- Get mAP by multi-scale evaluation on the FDDB dataset.
+For details can refer to [Evaluation](#Evaluate-on-the-FDDB).
+
+#### Infer Time and Model Size comparison  
+
+| Architecture | Type     | Size | P4 (ms)   | CPU (ms) | ARM (ms)   | File size (MB) | Flops     |
+|:------------:|:--------:|:----:|:---------:|:--------:|:----------:|:--------------:|:---------:|
+| BlazeFace    | Original | 128  | -         | -        | -          | -              | -         |
+| BlazeFace    | Lite     | 128  | -         | -        | -          | -              | -         |
+| BlazeFace    | NAS      | 128  | -         | -        | -          | -              | -         |
+| FaceBoxes    | Original | 128  | -         | -        | -          | -              | -         |
+| FaceBoxes    | Lite     | 128  | -         | -        | -          | -              | -         |
+| BlazeFace    | Original | 320  | -         | -        | -          | -              | -         |
+| BlazeFace    | Lite     | 320  | -         | -        | -          | -              | -         |
+| BlazeFace    | NAS      | 320  | -         | -        | -          | -              | -         |
+| FaceBoxes    | Original | 320  | -         | -        | -          | -              | -         |
+| FaceBoxes    | Lite     | 320  | -         | -        | -          | -              | -         |
+| BlazeFace    | Original | 640  | -         | -        | -          | -              | -         |
+| BlazeFace    | Lite     | 640  | -         | -        | -          | -              | -         |
+| BlazeFace    | NAS      | 640  | -         | -        | -          | -              | -         |
+| FaceBoxes    | Original | 640  | -         | -        | -          | -              | -         |
+| FaceBoxes    | Lite     | 640  | -         | -        | -          | -              | -         |
+
+
+**NOTES:**  
+- CPU: i5-7360U @ 2.30GHz. Single core and single thread.
+
+
+
+## Get Started
+`Training` and `Inference` please refer to [GETTING_STARTED.md](../../docs/GETTING_STARTED.md)
+- **NOTES:**  
+- `BlazeFace` and `FaceBoxes` is trained in 4 GPU with `batch_size=8` per gpu (total batch size as 32)
+and trained 320000 iters.(If your GPU count is not 4, please refer to the rule of training parameters
+in the table of [calculation rules](../../docs/GETTING_STARTED.md#faq))
+- Currently we do not support evaluation in training.
+
+### Evaluation
+```
+export CUDA_VISIBLE_DEVICES=0
+export PYTHONPATH=$PYTHONPATH:.
+python tools/face_eval.py -c configs/face_detection/blazeface.yml
+```
+- Optional arguments
+- `-d` or `--dataset_dir`: Dataset path, same as dataset_dir of configs. Such as: `-d dataset/wider_face`.
+- `-f` or `--output_eval`: Evaluation file directory, default is `output/pred`.
+- `-e` or `--eval_mode`: Evaluation mode, include `widerface` and `fddb`, default is `widerface`.
+- `--multi_scale`: If you add this action button in the command, it will select `multi_scale` evaluation.
+Default is `False`, it will select `single-scale` evaluation.
+
+After the evaluation is completed, the test result in txt format will be generated in `output/pred`,
+and then mAP will be calculated according to different data sets. If you set `--eval_mode=widerface`,
+it will [Evaluate on the WIDER FACE](#Evaluate-on-the-WIDER-FACE).If you set `--eval_mode=fddb`,
+it will [Evaluate on the FDDB](#Evaluate-on-the-FDDB).
+
+#### Evaluate on the WIDER FACE
+- Download the official evaluation script to evaluate the AP metrics:
+```
+wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
+unzip eval_tools.zip && rm -f eval_tools.zip
+```
+- Modify the result path and the name of the curve to be drawn in `eval_tools/wider_eval.m`:
+```
+# Modify the folder name where the result is stored.
+pred_dir = './pred';  
+# Modify the name of the curve to be drawn
+legend_name = 'Fluid-BlazeFace';
+```
+- `wider_eval.m` is the main execution program of the evaluation module. The run command is as follows:
+```
+matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
+```
+
+#### Evaluate on the FDDB
+[FDDB dataset](http://vis-www.cs.umass.edu/fddb/) details can refer to FDDB's official website.  
+- Download the official dataset and evaluation script to evaluate the ROC metrics:
+```
+#external link to the Faces in the Wild data set
+wget http://tamaraberg.com/faceDataset/originalPics.tar.gz
+#The annotations are split into ten folds. See README for details.
+wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz
+#information on directory structure and file formats
+wget http://vis-www.cs.umass.edu/fddb/README.txt
+```
+- Install OpenCV: Requires [OpenCV library](http://sourceforge.net/projects/opencvlibrary/)  
+If the utility 'pkg-config' is not available for your operating system,
+edit the Makefile to manually specify the OpenCV flags as following:
+```
+INCS = -I/usr/local/include/opencv
+LIBS = -L/usr/local/lib -lcxcore -lcv -lhighgui -lcvaux -lml
+```
+
+- Compile FDDB evaluation code: execute `make` in evaluation folder.
+
+- Generate full image path list and groundtruth in FDDB-folds. The run command is as follows:
+```
+cat `ls|grep -v"ellipse"` > filePath.txt` and `cat *ellipse* > fddb_annotFile.txt`
+```
+- Evaluation
+Finally evaluation command is:
+```
+./evaluate -a ./FDDB/FDDB-folds/fddb_annotFile.txt \
+           -d DETECTION_RESULT.txt -f 0 \
+           -i ./FDDB -l ./FDDB/FDDB-folds/filePath.txt \
+           -r ./OUTPUT_DIR -z .jpg
+```
+**NOTES:** The interpretation of the argument can be performed by `./evaluate --help`.
+
+## Algorithm Description
+
+### BlazeFace
+**Introduction:**  
+[BlazeFace](https://arxiv.org/abs/1907.05047) is Google Research published face detection model.
+It's lightweight but good performance, and tailored for mobile GPU inference. It runs at a speed
+of 200-1000+ FPS on flagship devices.
+
+**Particularity:**  
+- Anchor scheme stops at 8×8(input 128x128), 6 anchors per pixel at that resolution.
+- 5 single, and 6 double BlazeBlocks: 5×5 depthwise convs, same accuracy with fewer layers.
+- Replace the non-maximum suppression algorithm with a blending strategy that estimates the
+regression parameters of a bounding box as a weighted mean between the overlapping predictions.
+
+**Edition information:**
+- Original: Reference original paper reproduction.
+- Lite: Replace 5x5 conv with 3x3 conv, fewer network layers and conv channels.
+- NAS: use `Neural Architecture Search` algorithm to optimized network structure,
+less network layer and conv channel number than `Lite`.
+
+### FaceBoxes
+**Introduction:**  
+[FaceBoxes](https://arxiv.org/abs/1708.05234) which named A CPU Real-time Face Detector
+with High Accuracy is face detector proposed by Shifeng Zhang, with high performance on
+both speed and accuracy. This paper is published by IJCB(2017).
+
+**Particularity:**
+- Anchor scheme stops at 20x20, 10x10, 5x5, which network input size is 640x640,
+including 3, 1, 1 anchors per pixel at each resolution. The corresponding densities
+are 1, 2, 4(20x20), 4(10x10) and 4(5x5).
+- 2 convs with CReLU, 2 poolings, 3 inceptions and 2 convs with ReLU.
+- Use density prior box to improve detection accuracy.
+
+**Edition information:**
+- Original: Reference original paper reproduction.
+- Lite: 2 convs with CReLU, 1 pooling, 2 convs with ReLU, 3 inceptions and 2 convs with ReLU.
+Anchor scheme stops at 80x80 and 40x40, including 3, 1 anchors per pixel at each resolution.
+The corresponding densities are 1, 2, 4(80x80) and 4(40x40), using less conv channel number than lite.
+
+
+## Contributing
+Contributions are highly welcomed and we would really appreciate your feedback!!
--- a/PaddleCV/PaddleDetection/configs/face_detection/blazeface.yml
+++ b/PaddleCV/PaddleDetection/configs/face_detection/blazeface.yml
@@ -89,7 +89,7 @@ SSDEvalFeed:
  fields: ['image', 'im_id', 'gt_box']
  dataset:
    dataset_dir: dataset/wider_face
-    annotation: annotFile.txt #wider_face_split/wider_face_val_bbx_gt.txt   
+    annotation: wider_face_split/wider_face_val_bbx_gt.txt   
    image_dir: WIDER_val/images
  drop_last: false
  image_shape: [3, 640, 640]

--- a/PaddleCV/PaddleDetection/configs/faster_rcnn_r50_fpn_1x.yml
+++ b/PaddleCV/PaddleDetection/configs/faster_rcnn_r50_fpn_1x.yml
@@ -21,7 +21,7 @@ FasterRCNN:
  bbox_assigner: BBoxAssigner

 ResNet:
-  norm_type: affine_channel
+  norm_type: bn
  norm_decay: 0.
  depth: 50
  feature_maps: [2, 3, 4, 5]

--- a/PaddleCV/PaddleDetection/configs/faster_rcnn_r50_vd_1x.yml
+++ b/PaddleCV/PaddleDetection/configs/faster_rcnn_r50_vd_1x.yml
@@ -82,8 +82,8 @@ LearningRate:
    gamma: 0.1
    milestones: [120000, 160000]
  - !LinearWarmup
-    start_factor: 0.3333333333333333
-    steps: 500
+    start_factor: 0.1
+    steps: 1000

 OptimizerBuilder:
  optimizer:

--- a/PaddleCV/PaddleDetection/configs/mask_rcnn_r50_fpn_1x.yml
+++ b/PaddleCV/PaddleDetection/configs/mask_rcnn_r50_fpn_1x.yml
@@ -24,7 +24,7 @@ ResNet:
  depth: 50
  feature_maps: [2, 3, 4, 5]
  freeze_at: 2
-  norm_type: affine_channel
+  norm_type: bn

 FPN:
  max_level: 6

--- a/PaddleCV/PaddleDetection/configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas.yml
+++ b/PaddleCV/PaddleDetection/configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas.yml
+architecture: CascadeRCNN
+train_feed: FasterRCNNTrainFeed
+eval_feed: FasterRCNNEvalFeed
+test_feed: FasterRCNNTestFeed
+max_iters: 500000
+snapshot_iter: 10000
+use_gpu: true
+log_iter: 20
+log_smooth_window: 20
+save_dir: output
+pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_coco_pretrained.tar
+weights: output/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas/model_final
+metric: COCO
+num_classes: 81
+
+CascadeRCNN:
+  backbone: ResNet
+  fpn: FPN
+  rpn_head: FPNRPNHead
+  roi_extractor: FPNRoIAlign
+  bbox_head: CascadeBBoxHead
+  bbox_assigner: CascadeBBoxAssigner
+
+SENet:
+  depth: 152
+  feature_maps: [2, 3, 4, 5]
+  freeze_at: 2
+  group_width: 4
+  groups: 64
+  norm_type: bn
+  freeze_norm: True
+  variant: d
+  dcn_v2_stages: [3, 4, 5]
+  std_senet: True
+
+FPN:
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
+  freeze_norm: False
+  norm_type: gn
+
+FPNRPNHead:
+  anchor_generator:
+    anchor_sizes: [32, 64, 128, 256, 512]
+    aspect_ratios: [0.5, 1.0, 2.0]
+    stride: [16.0, 16.0]
+    variance: [1.0, 1.0, 1.0, 1.0]
+  anchor_start_size: 32
+  min_level: 2
+  max_level: 6
+  num_chan: 256
+  rpn_target_assign:
+    rpn_batch_size_per_im: 256
+    rpn_fg_fraction: 0.5
+    rpn_positive_overlap: 0.7
+    rpn_negative_overlap: 0.3
+    rpn_straddle_thresh: 0.0
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+FPNRoIAlign:
+  canconical_level: 4
+  canonical_size: 224
+  min_level: 2
+  max_level: 5
+  box_resolution: 7
+  sampling_ratio: 2
+
+CascadeBBoxAssigner:
+  batch_size_per_im: 1024
+  bbox_reg_weights: [10, 20, 30]
+  bg_thresh_lo: [0.0, 0.0, 0.0]
+  bg_thresh_hi: [0.5, 0.6, 0.7]
+  fg_thresh: [0.5, 0.6, 0.7]
+  fg_fraction: 0.25
+
+CascadeBBoxHead:
+  head: CascadeXConvNormHead 
+  nms:
+    keep_top_k: 100
+    nms_threshold: 0.5
+    score_threshold: 0.05
+
+CascadeXConvNormHead:
+  norm_type: gn
+
+CascadeTwoFCHead:
+  mlp_dim: 1024
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [400000, 460000]
+  - !LinearWarmup
+    start_factor: 0.01
+    steps: 2000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+FasterRCNNTrainFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/objects365
+    annotation: annotations/train.json
+    image_dir: train
+  sample_transforms: 
+  - !DecodeImage
+    to_rgb: False
+    with_mixup: False
+  - !RandomFlipImage
+    is_mask_flip: true
+    is_normalized: false
+    prob: 0.5
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: False
+    mean:
+    - 102.9801 
+    - 115.9465
+    - 122.7717
+    std:
+    - 1.0 
+    - 1.0 
+    - 1.0 
+  - !ResizeImage
+    interp: 1
+    target_size:
+    - 416
+    - 448
+    - 480
+    - 512
+    - 544
+    - 576
+    - 608
+    - 640
+    - 672
+    - 704 
+    - 736
+    - 768
+    - 800
+    - 832
+    - 864
+    - 896
+    - 928
+    - 960
+    - 992
+    - 1024
+    - 1056
+    - 1088
+    - 1120
+    - 1152
+    - 1184
+    - 1216
+    - 1248
+    - 1280
+    - 1312
+    - 1344
+    - 1376
+    - 1408
+    max_size: 1600
+    use_cv2: true
+  - !Permute
+    channel_first: true
+    to_bgr: false
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  num_workers: 4
+  class_aware_sampling: true
+
+FasterRCNNEvalFeed:
+  batch_size: 1
+  dataset:
+    dataset_dir: dataset/objects365
+    annotation: annotations/val.json
+    image_dir: val
+  sample_transforms: 
+  - !DecodeImage
+    to_rgb: False
+    with_mixup: False
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: False
+    mean:
+    - 102.9801 
+    - 115.9465
+    - 122.7717
+    std:
+    - 1.0 
+    - 1.0 
+    - 1.0 
+  - !ResizeImage
+    target_size: 800
+    max_size: 1333
+    interp: 1
+  - !Permute
+    channel_first: true
+    to_bgr: false
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+
+FasterRCNNTestFeed:
+  batch_size: 1
+  dataset:
+    annotation: dataset/obj365/annotations/val.json
+  sample_transforms: 
+  - !DecodeImage
+    to_rgb: False
+    with_mixup: False
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: False
+    mean:
+    - 102.9801 
+    - 115.9465
+    - 122.7717
+    std:
+    - 1.0 
+    - 1.0 
+    - 1.0 
+  - !Permute
+    channel_first: true
+    to_bgr: false
+  batch_transforms:
+  - !PadBatch
+    pad_to_stride: 32
+  drop_last: false
+  num_workers: 2
--- a/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/001.png
+++ b/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/001.png
--- a/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/002.png
+++ b/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/002.png
--- a/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/003.png
+++ b/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/003.png
--- a/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/004.png
+++ b/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/004.png
--- a/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/output/001.png
+++ b/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/output/001.png
--- a/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/output/004.png
+++ b/PaddleCV/PaddleDetection/contrib/PedestrianDetection/demo/output/004.png
--- a/PaddleCV/PaddleDetection/contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml
+++ b/PaddleCV/PaddleDetection/contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml
+architecture: YOLOv3
+train_feed: YoloTrainFeed
+eval_feed: YoloEvalFeed
+test_feed: YoloTestFeed
+use_gpu: true
+max_iters: 200000
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 5000
+metric: COCO
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar
+weights: https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar
+num_classes: 1
+
+YOLOv3:
+  backbone: DarkNet
+  yolo_head: YOLOv3Head
+
+DarkNet:
+  norm_type: sync_bn
+  norm_decay: 0.
+  depth: 53
+
+YOLOv3Head:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  norm_decay: 0.
+  ignore_thresh: 0.7
+  label_smooth: true
+  nms:
+    background_label: -1
+    keep_top_k: 100
+    nms_threshold: 0.45
+    nms_top_k: 1000
+    normalized: false
+    score_threshold: 0.01
+
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 150000
+    - 180000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+YoloTrainFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/pedestrian
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 8
+  bufsize: 128
+  use_process: true
+
+YoloEvalFeed:
+  batch_size: 8
+  image_shape: [3, 608, 608]
+  dataset:
+    dataset_dir: dataset/pedestrian
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+
+YoloTestFeed:
+  batch_size: 1
+  image_shape: [3, 608, 608]
+  dataset:
+    annotation: contrib/PedestrianDetection/pedestrian.json
--- a/PaddleCV/PaddleDetection/contrib/README.md
+++ b/PaddleCV/PaddleDetection/contrib/README.md
+# PaddleDetection applied for specific scenarios
+
+We provide some models implemented by PaddlePaddle to detect objects in specific scenarios, users can download the models and use them in these scenarios.
+
+| Task                 | Algorithm | Box AP | Download                                                                                |
+|:---------------------|:---------:|:------:| :-------------------------------------------------------------------------------------: |
+| Vehicle Detection    |  YOLOv3  |  54.5  | [model](https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar) |
+| Pedestrian Detection |  YOLOv3  |  51.8  | [model](https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar) |
+
+## Vehicle Detection
+
+One of major applications of vehichle detection is traffic monitoring. In this scenary, vehicles to be detected are mostly captured by the cameras mounted on top of traffic light columns.
+
+### 1. Network
+
+The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53.
+
+### 2. Configuration for training
+
+PaddleDetection provides users with a configuration file [yolov3_darnet.yml](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleDetection/configs/yolov3_darknet.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for vehicle detection:
+
+* max_iters: 120000
+* num_classes: 6
+* anchors: [[8, 9], [10, 23], [19, 15], [23, 33], [40, 25], [54, 50], [101, 80], [139, 145], [253, 224]]
+* label_smooth: false
+* nms/nms_top_k: 400
+* nms/score_threshold: 0.005
+* milestones: [60000, 80000]
+* dataset_dir: dataset/vehicle
+
+### 3. Accuracy
+
+The accuracy of the model trained and evaluated on our private data is shown as followed:
+
+AP at IoU=.50:.05:.95 is 0.545.
+
+AP at IoU=.50 is 0.764.
+
+### 4. Inference
+
+Users can employ the model to conduct the inference:
+
+```
+export CUDA_VISIBLE_DEVICES=0
+export PYTHONPATH=$PYTHONPATH:.
+python -u tools/infer.py -c contrib/VehicleDetection/vehicle_yolov3_darknet.yml \
+                         -o weights=https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar \
+                         --infer_dir contrib/VehicleDetection/demo \
+                         --draw_threshold 0.2 \
+                         --output_dir contrib/VehicleDetection/demo/output
+
+```
+
+Some inference results are visualized below:
+
+![](VehicleDetection/demo/output/001.jpeg)
+
+![](VehicleDetection/demo/output/005.png)
+
+## Pedestrian Detection
+
+The main applications of pedetestrian detection include intelligent monitoring. In this scenary, photos of pedetestrians are taken by surveillance cameras in public areas, then pedestrian detection are conducted on these photos.
+
+### 1. Network
+
+The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53.
+
+### 2. Configuration for training
+
+PaddleDetection provides users with a configuration file [yolov3_darnet.yml](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleDetection/configs/yolov3_darknet.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for pedestrian detection:
+
+* max_iters: 200000
+* num_classes: 1
+* snapshot_iter: 5000
+* milestones: [150000, 180000]
+* dataset_dir: dataset/pedestrian
+
+### 3. Accuracy
+
+The accuracy of the model trained and evaluted on our private data is shown as followed:
+
+AP at IoU=.50:.05:.95 is 0.518.
+
+AP at IoU=.50 is 0.792.
+
+### 4. Inference
+
+Users can employ the model to conduct the inference:
+
+```
+export CUDA_VISIBLE_DEVICES=0
+export PYTHONPATH=$PYTHONPATH:.
+python -u tools/infer.py -c contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml \
+                         -o weights=https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar \
+                         --infer_dir contrib/PedestrianDetection/demo \
+                         --draw_threshold 0.3 \
+                         --output_dir contrib/PedestrianDetection/demo/output
+```
+
+Some inference results are visualized below:
+
+![](PedestrianDetection/demo/output/001.png)
+
+![](PedestrianDetection/demo/output/004.png)
--- a/PaddleCV/PaddleDetection/contrib/README_cn.md
+++ b/PaddleCV/PaddleDetection/contrib/README_cn.md
+# PaddleDetection 特色垂类检测模型
+
+我们提供了针对不同场景的基于PaddlePaddle的检测模型，用户可以下载模型进行使用。
+
+| 任务                 | 算法 | 精度(Box AP) | 下载                                                                                |
+|:---------------------|:---------:|:------:| :---------------------------------------------------------------------------------: |
+| 车辆检测    |  YOLOv3  |  54.5  | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar) |
+| 行人检测 |  YOLOv3  |  51.8  | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar) |
+
+
+## 车辆检测（Vehicle Detection）
+
+车辆检测的主要应用之一是交通监控。在这样的监控场景中，待检测的车辆多为道路红绿灯柱上的摄像头拍摄所得。
+
+### 1. 模型结构
+
+Backbone为Dacknet53的YOLOv3。
+
+### 2. 训练参数配置
+
+PaddleDetection提供了使用COCO数据集对YOLOv3进行训练的参数配置文件[yolov3_darnet.yml](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleDetection/configs/yolov3_darknet.yml)，与之相比，在进行车辆检测的模型训练时，我们对以下参数进行了修改：
+
+* max_iters: 120000
+* num_classes: 6
+* anchors: [[8, 9], [10, 23], [19, 15], [23, 33], [40, 25], [54, 50], [101, 80], [139, 145], [253, 224]]
+* label_smooth: false
+* nms/nms_top_k: 400
+* nms/score_threshold: 0.005
+* milestones: [60000, 80000]
+* dataset_dir: dataset/vehicle
+
+### 3. 精度指标
+
+模型在我们内部数据上的精度指标为：
+
+IOU=.50:.05:.95时的AP为 0.545。
+
+IOU=.5时的AP为 0.764。
+
+### 4. 预测
+
+用户可以使用我们训练好的模型进行车辆检测：
+
+```
+export CUDA_VISIBLE_DEVICES=0
+export PYTHONPATH=$PYTHONPATH:.
+python -u tools/infer.py -c contrib/VehicleDetection/vehicle_yolov3_darknet.yml \
+                         -o weights=https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar \
+                         --infer_dir contrib/VehicleDetection/demo \
+                         --draw_threshold 0.2 \
+                         --output_dir contrib/VehicleDetection/demo/output
+
+```
+
+预测结果示例：
+
+![](VehicleDetection/demo/output/001.jpeg)
+
+![](VehicleDetection/demo/output/005.png)
+
+## 行人检测（Pedestrian Detection）
+
+行人检测的主要应用有智能监控。在监控场景中，大多是从公共区域的监控摄像头视角拍摄行人，获取图像后再进行行人检测。
+
+### 1. 模型结构
+
+Backbone为Dacknet53的YOLOv3。
+
+
+### 2. 训练参数配置
+
+PaddleDetection提供了使用COCO数据集对YOLOv3进行训练的参数配置文件[yolov3_darnet.yml](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/PaddleDetection/configs/yolov3_darknet.yml)，与之相比，在进行行人检测的模型训练时，我们对以下参数进行了修改：
+
+* max_iters: 200000
+* num_classes: 1
+* snapshot_iter: 5000
+* milestones: [150000, 180000]
+* dataset_dir: dataset/pedestrian
+
+### 2. 精度指标
+
+模型在我们针对监控场景的内部数据上精度指标为：
+
+IOU=.5时的AP为 0.792。
+
+IOU=.5-.95时的AP为 0.518。
+
+### 3. 预测
+
+用户可以使用我们训练好的模型进行行人检测：
+
+```
+export CUDA_VISIBLE_DEVICES=0
+export PYTHONPATH=$PYTHONPATH:.
+python -u tools/infer.py -c contrib/PedestrianDetection/pedestrian_yolov3_darknet.yml \
+                         -o weights=https://paddlemodels.bj.bcebos.com/object_detection/pedestrian_yolov3_darknet.tar \
+                         --infer_dir contrib/PedestrianDetection/demo \
+                         --draw_threshold 0.3 \
+                         --output_dir contrib/PedestrianDetection/demo/output
+```
+
+预测结果示例：
+
+![](PedestrianDetection/demo/output/001.png)
+
+![](PedestrianDetection/demo/output/004.png)
--- a/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/001.jpeg
+++ b/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/001.jpeg
--- a/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/003.png
+++ b/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/003.png
--- a/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/004.png
+++ b/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/004.png
--- a/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/005.png
+++ b/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/005.png
--- a/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/output/001.jpeg
+++ b/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/output/001.jpeg
--- a/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/output/005.png
+++ b/PaddleCV/PaddleDetection/contrib/VehicleDetection/demo/output/005.png
--- a/PaddleCV/PaddleDetection/contrib/VehicleDetection/vehicle_yolov3_darknet.yml
+++ b/PaddleCV/PaddleDetection/contrib/VehicleDetection/vehicle_yolov3_darknet.yml
+architecture: YOLOv3
+train_feed: YoloTrainFeed
+eval_feed: YoloEvalFeed
+test_feed: YoloTestFeed
+use_gpu: true
+max_iters: 120000
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 2000
+metric: COCO
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar
+weights: https://paddlemodels.bj.bcebos.com/object_detection/vehicle_yolov3_darknet.tar
+num_classes: 6
+
+YOLOv3:
+  backbone: DarkNet
+  yolo_head: YOLOv3Head
+
+DarkNet:
+  norm_type: sync_bn
+  norm_decay: 0.
+  depth: 53
+
+YOLOv3Head:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[8, 9], [10, 23], [19, 15],
+            [23, 33], [40, 25], [54, 50],
+            [101, 80], [139, 145], [253, 224]]
+  norm_decay: 0.
+  ignore_thresh: 0.7
+  label_smooth: false
+  nms:
+    background_label: -1
+    keep_top_k: 100
+    nms_threshold: 0.45
+    nms_top_k: 400
+    normalized: false
+    score_threshold: 0.005
+
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 60000
+    - 80000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+YoloTrainFeed:
+  batch_size: 8
+  dataset:
+    dataset_dir: dataset/vehicle
+    annotation: annotations/instances_train2017.json
+    image_dir: train2017
+  num_workers: 8
+  bufsize: 128
+  use_process: true
+
+YoloEvalFeed:
+  batch_size: 8
+  image_shape: [3, 608, 608]
+  dataset:
+    dataset_dir: dataset/vehicle
+    annotation: annotations/instances_val2017.json
+    image_dir: val2017
+
+YoloTestFeed:
+  batch_size: 1
+  image_shape: [3, 608, 608]
+  dataset:
+    annotation: contrib/VehicleDetection/vehicle.json
--- a/PaddleCV/PaddleDetection/dataset/wider_face/download.sh
+++ b/PaddleCV/PaddleDetection/dataset/wider_face/download.sh
+# All rights `PaddleDetection` reserved
+# References:
+#   @inproceedings{yang2016wider,
+#   Author = {Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou},
+#   Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+#   Title = {WIDER FACE: A Face Detection Benchmark},
+#   Year = {2016}}
+
+DIR="$( cd "$(dirname "$0")" ; pwd -P )"
+cd "$DIR"
+
+# Download the data.
+echo "Downloading..."
+wget https://dataset.bj.bcebos.com/wider_face/WIDER_train.zip
+wget https://dataset.bj.bcebos.com/wider_face/WIDER_val.zip
+wget https://dataset.bj.bcebos.com/wider_face/wider_face_split.zip
+# Extract the data.
+echo "Extracting..."
+unzip WIDER_train.zip
+unzip WIDER_val.zip
+unzip wider_face_split.zip
--- a/PaddleCV/PaddleDetection/demo/000000014439_640x640.jpg
+++ b/PaddleCV/PaddleDetection/demo/000000014439_640x640.jpg
--- a/PaddleCV/PaddleDetection/demo/cas.png
+++ b/PaddleCV/PaddleDetection/demo/cas.png
--- a/PaddleCV/PaddleDetection/demo/obj365_gt.png
+++ b/PaddleCV/PaddleDetection/demo/obj365_gt.png
--- a/PaddleCV/PaddleDetection/demo/obj365_pred.png
+++ b/PaddleCV/PaddleDetection/demo/obj365_pred.png
--- a/PaddleCV/PaddleDetection/demo/output/12_Group_Group_12_Group_Group_12_935.jpg
+++ b/PaddleCV/PaddleDetection/demo/output/12_Group_Group_12_Group_Group_12_935.jpg
--- a/PaddleCV/PaddleDetection/docs/BENCHMARK_INFER_cn.md
+++ b/PaddleCV/PaddleDetection/docs/BENCHMARK_INFER_cn.md
+
+
+# 推理Benchmark
+
+
+
+- 测试环境:
+  - CUDA 9.0
+  - CUDNN 7.5
+  - TensorRT-5.1.2.2
+  - PaddlePaddle v1.6
+  - GPU分别为: Tesla V100和Tesla P4
+- 测试方式:
+  - 为了方面比较不同模型的推理速度，输入采用同样大小的图片，为 3x640x640，采用 `demo/000000014439_640x640.jpg` 图片。
+  - Batch Size=1
+  - 去掉前10轮warmup时间，测试100轮的平均时间，单位ms/image，包括输入数据拷贝至GPU的时间、计算时间、数据拷贝只CPU的时间。
+  - 采用Fluid C++预测引擎: 包含Fluid C++预测、Fluid-TensorRT预测，下面同时测试了Float32 (FP32) 和Float16 (FP16)的推理速度。
+  - 测试时开启了 FLAGS_cudnn_exhaustive_search=True，使用exhaustive方式搜索卷积计算算法。
+
+### 推理速度
+
+
+
+
+
+| 模型                                  | Tesla V100 Fluid   (ms/image) | Tesla V100   Fluid-TensorRT-FP32 (ms/image) | Tesla V100   Fluid-TensorRT-FP16 (ms/image) | Tesla P4 Fluid   (ms/image) | Tesla P4   Fluid-TensorRT-FP32 (ms/image) |
+| ------------------------------------- | ----------------------------- | ------------------------------------------- | ------------------------------------------- | --------------------------- | ----------------------------------------- |
+| faster_rcnn_r50_1x                    | 147.488                       | 146.124                                     | 142.416                                     | 471.547                     | 471.631                                   |
+| faster_rcnn_r50_2x                    | 147.636                       | 147.73                                      | 141.664                                     | 471.548                     | 472.86                                    |
+| faster_rcnn_r50_vd_1x                 | 146.588                       | 144.767                                     | 141.208                                     | 459.357                     | 457.852                                   |
+| faster_rcnn_r50_fpn_1x                | 25.11                         | 24.758                                      | 20.744                                      | 59.411                      | 57.585                                    |
+| faster_rcnn_r50_fpn_2x                | 25.351                        | 24.505                                      | 20.509                                      | 59.594                      | 57.591                                    |
+| faster_rcnn_r50_vd_fpn_2x             | 25.514                        | 25.292                                      | 21.097                                      | 61.026                      | 58.377                                    |
+| faster_rcnn_r50_fpn_gn_2x             | 36.959                        | 36.173                                      | 32.356                                      | 101.339                     | 101.212                                   |
+| faster_rcnn_dcn_r50_fpn_1x            | 28.707                        | 28.162                                      | 27.503                                      | 68.154                      | 67.443                                    |
+| faster_rcnn_dcn_r50_vd_fpn_2x         | 28.576                        | 28.271                                      | 27.512                                      | 68.959                      | 68.448                                    |
+| faster_rcnn_r101_1x                   | 153.267                       | 150.985                                     | 144.849                                     | 490.104                     | 486.836                                   |
+| faster_rcnn_r101_fpn_1x               | 30.949                        | 30.331                                      | 24.021                                      | 73.591                      | 69.736                                    |
+| faster_rcnn_r101_fpn_2x               | 30.918                        | 29.126                                      | 23.677                                      | 73.563                      | 70.32                                     |
+| faster_rcnn_r101_vd_fpn_1x            | 31.144                        | 30.202                                      | 23.57                                       | 74.767                      | 70.773                                    |
+| faster_rcnn_r101_vd_fpn_2x            | 30.678                        | 29.969                                      | 23.327                                      | 74.882                      | 70.842                                    |
+| faster_rcnn_x101_vd_64x4d_fpn_1x      | 60.36                         | 58.461                                      | 45.172                                      | 132.178                     | 131.734                                   |
+| faster_rcnn_x101_vd_64x4d_fpn_2x      | 59.003                        | 59.163                                      | 46.065                                      | 131.422                     | 132.186                                   |
+| faster_rcnn_dcn_r101_vd_fpn_1x        | 36.862                        | 37.205                                      | 36.539                                      | 93.273                      | 92.616                                    |
+| faster_rcnn_dcn_x101_vd_64x4d_fpn_1x  | 78.476                        | 78.335                                      | 77.559                                      | 185.976                     | 185.996                                   |
+| faster_rcnn_se154_vd_fpn_s1x          | 166.282                       | 90.508                                      | 80.738                                      | 304.653                     | 193.234                                   |
+| mask_rcnn_r50_1x                      | 160.185                       | 160.4                                       | 160.322                                     | -                           | -                                         |
+| mask_rcnn_r50_2x                      | 159.821                       | 159.527                                     | 160.41                                      | -                           | -                                         |
+| mask_rcnn_r50_fpn_1x                  | 95.72                         | 95.719                                      | 92.455                                      | 259.8                       | 258.04                                    |
+| mask_rcnn_r50_fpn_2x                  | 84.545                        | 83.567                                      | 79.269                                      | 227.284                     | 222.975                                   |
+| mask_rcnn_r50_vd_fpn_2x               | 82.07                         | 82.442                                      | 77.187                                      | 223.75                      | 221.683                                   |
+| mask_rcnn_r50_fpn_gn_2x               | 94.936                        | 94.611                                      | 91.42                                       | 265.468                     | 263.76                                    |
+| mask_rcnn_dcn_r50_fpn_1x              | 97.828                        | 97.433                                      | 93.76                                       | 256.295                     | 258.056                                   |
+| mask_rcnn_dcn_r50_vd_fpn_2x           | 77.831                        | 79.453                                      | 76.983                                      | 205.469                     | 204.499                                   |
+| mask_rcnn_r101_fpn_1x                 | 95.543                        | 97.929                                      | 90.314                                      | 252.997                     | 250.782                                   |
+| mask_rcnn_r101_vd_fpn_1x              | 98.046                        | 97.647                                      | 90.272                                      | 261.286                     | 262.108                                   |
+| mask_rcnn_x101_vd_64x4d_fpn_1x        | 115.461                       | 115.756                                     | 102.04                                      | 296.066                     | 293.62                                    |
+| mask_rcnn_x101_vd_64x4d_fpn_2x        | 107.144                       | 107.29                                      | 97.275                                      | 267.636                     | 267.577                                   |
+| mask_rcnn_dcn_r101_vd_fpn_1x          | 85.504                        | 84.875                                      | 84.907                                      | 225.202                     | 226.585                                   |
+| mask_rcnn_dcn_x101_vd_64x4d_fpn_1x    | 129.937                       | 129.934                                     | 127.804                                     | 326.786                     | 326.161                                   |
+| mask_rcnn_se154_vd_fpn_s1x            | 214.188                       | 139.807                                     | 121.516                                     | 440.391                     | 439.727                                   |
+| cascade_rcnn_r50_fpn_1x               | 36.866                        | 36.949                                      | 36.637                                      | 101.851                     | 101.912                                   |
+| cascade_mask_rcnn_r50_fpn_1x          | 110.344                       | 106.412                                     | 100.367                                     | 301.703                     | 297.739                                   |
+| cascade_rcnn_dcn_r50_fpn_1x           | 40.412                        | 39.58                                       | 39.853                                      | 110.346                     | 110.077                                   |
+| cascade_mask_rcnn_r50_fpn_gn_2x       | 170.092                       | 168.758                                     | 163.298                                     | 527.998                     | 529.59                                    |
+| cascade_rcnn_dcn_r101_vd_fpn_1x       | 48.414                        | 48.849                                      | 48.701                                      | 134.9                       | 134.846                                   |
+| cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x | 90.062                        | 90.218                                      | 90.009                                      | 228.67                      | 228.396                                   |
+| retinanet_r101_fpn_1x                 | 55.59                         | 54.636                                      | 48.489                                      | 90.394                      | 83.951                                    |
+| retinanet_r50_fpn_1x                  | 50.048                        | 47.932                                      | 44.385                                      | 73.819                      | 70.282                                    |
+| retinanet_x101_vd_64x4d_fpn_1x        | 83.329                        | 83.446                                      | 70.76                                       | 145.936                     | 146.168                                   |
+| yolov3_darknet                        | 21.427                        | 20.252                                      | 13.856                                      | 55.173                      | 55.692                                    |
+| yolov3_darknet_voc                    | 17.58                         | 16.241                                      | 9.473                                       | 51.049                      | 51.249                                    |
+| yolov3_mobilenet_v1                   | 12.869                        | 11.834                                      | 9.408                                       | 24.887                      | 21.352                                    |
+| yolov3_mobilenet_v1_voc               | 9.118                         | 8.146                                       | 5.575                                       | 20.787                      | 17.169                                    |
+| yolov3_r34                            | 14.914                        | 14.125                                      | 11.176                                      | 20.798                      | 20.822                                    |
+| yolov3_r34_voc                        | 11.288                        | 10.73                                       | 7.7                                         | 25.874                      | 22.399                                    |
+| ssd_mobilenet_v1_voc                  | 5.763                         | 5.854                                       | 4.589                                       | 11.75                       | 9.485                                     |
+| ssd_vgg16_300                         | 28.722                        | 29.644                                      | 20.399                                      | 73.707                      | 74.531                                    |
+| ssd_vgg16_300_voc                     | 18.425                        | 19.288                                      | 11.298                                      | 56.297                      | 56.201                                    |
+| ssd_vgg16_512                         | 27.471                        | 28.328                                      | 19.328                                      | 68.685                      | 69.808                                    |
+| ssd_vgg16_512_voc                     | 18.721                        | 19.636                                      | 12.004                                      | 54.688                      | 56.174                                    |
+
+1. RCNN系列模型Fluid-TensorRT速度相比Fluid预测没有优势，原因是: TensorRT仅支持定长输入，当前基于ResNet系列的RCNN模型，只有backbone部分采用了TensorRT子图计算，比较耗时的stage-5没有基于TensorRT计算。 Fluid对CNN模型也做了一系列的融合优化。后续TensorRT版本升级、或有其他优化策略时再更新数据。
+2. YOLO v3系列模型，Fluid-TensorRT相比Fluid预测加速5% - 10%不等。
+3. SSD和YOLOv3系列模型 TensorRT-FP16预测速度有一定的优势，加速约20% - 40%不等。具体如下图。
+
+<div align="center">
+  <img src="images/bench_ssd_yolo_infer.png" />
+</div>
--- a/PaddleCV/PaddleDetection/docs/CACascadeRCNN.md
+++ b/PaddleCV/PaddleDetection/docs/CACascadeRCNN.md
+# CACascade RCNN
+## 简介
+CACascade RCNN是百度视觉技术部在Objects365 2019 Challenge上夺冠的最佳单模型之一，Objects365是在通用物体检测领域的一个全新的数据集，旨在促进对自然场景不同对象的检测研究。Objects365在63万张图像上标注了365个对象类，训练集中共有超过1000万个边界框。这里放出的是Full Track任务中最好的单模型之一。
+
+<div align="center">
+  <img src="../demo/obj365_gt.png"/>
+</div>
+
+## 方法描述
+
+针对大规模物体检测算法的特点，我们提出了一种基于图片包含物体类别的数量的采样方式（Class Aware Sampling）。基于这种方式进行训练模型可以在更短的时间使模型收敛到更好的效果。
+
+<div align="center">
+  <img src="../demo/cas.png"/>
+</div>
+
+本次公布的最好单模型是一个基于Cascade RCNN的两阶段检测模型，在此基础上将Backbone替换为更加强大的SENet154模型，Deformable Conv模块以及更复杂二阶段网络结构，针对BatchSize比较小的情况增加了Group Normalization操作并同时使用了多尺度训练，最终达到了非常理想的效果。预训练模型先后分别在ImageNet和COCO数据集上进行了训练，其中在COCO数据集上训练时增加了Mask分支，其余结构与CACascade RCNN相同， 会在启动训练时自动下载。
+
+## 使用方法
+
+1.准备数据
+
+数据需要通过[Objects365官方网站](https://www.objects365.org/download.html)进行申请下载，数据下载后将数据放置在dataset目录中。
+```
+${THIS REPO ROOT}
+  \--dataset
+      \-- objects365
+           \-- annotations
+                |-- train.json
+                |-- val.json
+           \-- train
+           \-- val
+```
+
+2.启动训练模型
+
+```bash
+python tools/train.py -c configs/obj365/cascade_rcnn_dcnv2_se154_vd_fpn_gn.yml
+```
+
+3.模型预测结果
+
+|        模型         | 验证集 mAP |                           下载链接                           |
+| :-----------------: | :--------: | :----------------------------------------------------------: |
+| CACascadeRCNN SE154 |    31.7    | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcnv2_se154_vd_fpn_gn_cas_obj365.tar) |
+
+## 模型效果
+
+<div align="center">
+  <img src="../demo/obj365_pred.png"/>
+</div>
--- a/PaddleCV/PaddleDetection/docs/CONFIG.md
+++ b/PaddleCV/PaddleDetection/docs/CONFIG.md
+English | [简体中文](CONFIG_cn.md)
+
 # Config Pipline

 ## Introduction

--- a/PaddleCV/PaddleDetection/docs/DATA.md
+++ b/PaddleCV/PaddleDetection/docs/DATA.md
+English | [简体中文](DATA_cn.md)
+
 # Data Pipline

 ## Introduction
@@ -126,6 +128,8 @@ the corresponding data stream. Many aspect of the `Reader`, such as storage
 location, preprocessing pipeline, acceleration mode can be configured with yaml
 files.

+### APIs
+
 The main APIs are as follows:

 1. Data parsing
@@ -139,7 +143,7 @@ The main APIs are as follows:
 - `source/loader.py`: Roidb dataset parser. [source](../ppdet/data/source/loader.py)

 2. Operator
- `transform/operators.py`: Contains a variety of data enhancement methods, including:
+ `transform/operators.py`: Contains a variety of data augmentation methods, including:
 - `DecodeImage`: Read images in RGB format.
 - `RandomFlipImage`: Horizontal flip.
 - `RandomDistort`: Distort brightness, contrast, saturation, and hue.
@@ -150,7 +154,7 @@ The main APIs are as follows:
 - `NormalizeImage`: Normalize image pixel values.
 - `NormalizeBox`: Normalize the bounding box.
 - `Permute`: Arrange the channels of the image and optionally convert image to BGR format.
- `MixupImage`: Mixup two images with given fraction<sup>[1](#vd)</sup>.
+- `MixupImage`: Mixup two images with given fraction<sup>[1](#mix)</sup>.

 <a name="mix">[1]</a> Please refer to [this paper](https://arxiv.org/pdf/1710.09412.pdf)。

@@ -177,16 +181,18 @@ whole data pipeline is fully customizable through the yaml configuration files.

 #### Custom Datasets

- Option 1: Convert the dataset to COCO or VOC format.
+- Option 1: Convert the dataset to COCO format.
 ```sh
- # a small utility (`tools/labelme2coco.py`) is provided to convert
- # Labelme-annotated dataset to COCO format.
- python ./ppdet/data/tools/labelme2coco.py --json_input_dir ./labelme_annos/
+ # a small utility (`tools/x2coco.py`) is provided to convert
+ # Labelme-annotated dataset or cityscape dataset to COCO format. 
+ python ./ppdet/data/tools/x2coco.py --dataset_type labelme
+                                --json_input_dir ./labelme_annos/
                                --image_input_dir ./labelme_imgs/
                                --output_dir ./cocome/
                                --train_proportion 0.8
                                --val_proportion 0.2
                                --test_proportion 0.0
+ # --dataset_type: The data format which is need to be converted. Currently supported are: 'labelme' and 'cityscape'
 # --json_input_dir：The path of json files which are annotated by Labelme.
 # --image_input_dir：The path of images.
 # --output_dir：The path of coverted COCO dataset.

--- a/PaddleCV/PaddleDetection/docs/DATA_cn.md
+++ b/PaddleCV/PaddleDetection/docs/DATA_cn.md
@@ -105,9 +105,9 @@ python ./ppdet/data/tools/generate_data_for_training.py
 4. 数据获取接口  
     为方便训练时的数据获取，我们将多个`data.Dataset`组合在一起构成一个`data.Reader`为用户提供数据，用户只需要调用`Reader.[train|eval|infer]`即可获得对应的数据流。`Reader`支持yaml文件配置数据地址、预处理过程、加速方式等。

-主要的APIs如下：
-
+### APIs

+主要的APIs如下：


 1. 数据解析  
@@ -165,15 +165,17 @@ coco = Reader(ccfg.DATA, ccfg.TRANSFORM, maxiter=-1)
 ```
 #### 如何使用自定义数据集？

- 选择1：将数据集转换为VOC格式或者COCO格式。
+- 选择1：将数据集转换为COCO格式。
 ```
- # 在./tools/中提供了labelme2coco.py用于将labelme标注的数据集转换为COCO数据集
- python ./ppdet/data/tools/labelme2coco.py --json_input_dir ./labelme_annos/
+ # 在./tools/中提供了x2coco.py用于将labelme标注的数据集或cityscape数据集转换为COCO数据集
+ python ./ppdet/data/tools/x2coco.py --dataset_type labelme
+                                --json_input_dir ./labelme_annos/
                                --image_input_dir ./labelme_imgs/
                                --output_dir ./cocome/
                                --train_proportion 0.8
                                --val_proportion 0.2
                                --test_proportion 0.0
+ # --dataset_type：需要转换的数据格式，目前支持：’labelme‘和’cityscape‘
 # --json_input_dir：使用labelme标注的json文件所在文件夹
 # --image_input_dir：图像文件所在文件夹
 # --output_dir：转换后的COCO格式数据集存放位置

--- a/PaddleCV/PaddleDetection/docs/EXPORT_MODEL.md
+++ b/PaddleCV/PaddleDetection/docs/EXPORT_MODEL.md
+# 模型导出
+
+训练得到一个满足要求的模型后，如果想要将该模型接入到C++预测库或者Serving服务，需要通过`tools/export_model.py`导出该模型。
+
+## 启动参数说明
+
+|      FLAG      |      用途      |    默认值    |                 备注                      |
+|:--------------:|:--------------:|:------------:|:-----------------------------------------:|
+|       -c       |  指定配置文件  |     None     |                                           |
+|  --output_dir  |  模型保存路径  |  `./output`  |  模型默认保存在`output/配置文件名/`路径下 |
+
+## 使用示例
+
+使用[训练/评估/推断](GETTING_STARTED_cn.md)中训练得到的模型进行试用，脚本如下
+
+```bash
+# 导出FasterRCNN模型, 模型中data层默认的shape为3x800x1333
+python tools/export_model.py -c configs/faster_rcnn_r50_1x.yml \
+        --output_dir=./inference_model \
+        -o weights=output/faster_rcnn_r50_1x/model_final \
+
+```
+
+预测模型会导出到`inference_model/faster_rcnn_r50_1x`目录下，模型名和参数名分别为`__model__`和`__params__`。
+
+## 设置导出模型的输入大小
+
+使用Fluid-TensorRT进行预测时，由于<=TensorRT 5.1的版本仅支持定长输入，保存模型的`data`层的图片大小需要和实际输入图片大小一致。而Fluid C++预测引擎没有此限制。可通过设置TestFeed的`image_shape`可以修改保存模型中的输入图片大小。示例如下:
+
+```bash
+# 导出FasterRCNN模型，输入是3x640x640
+python tools/export_model.py -c configs/faster_rcnn_r50_1x.yml \
+        --output_dir=./inference_model \
+        -o weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar \
+           FasterRCNNTestFeed.image_shape=[3,640,640]
+
+# 导出YOLOv3模型，输入是3x320x320
+python tools/export_model.py -c configs/yolov3_darknet.yml \
+        --output_dir=./inference_model \
+        -o weights=https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar \
+           YoloTestFeed.image_shape=[3,320,320]
+
+# 导出SSD模型，输入是3x300x300
+python tools/export_model.py -c configs/ssd/ssd_mobilenet_v1_voc.yml \
+        --output_dir=./inference_model \
+        -o weights= https://paddlemodels.bj.bcebos.com/object_detection/ssd_mobilenet_v1_voc.tar \
+           SSDTestFeed.image_shape=[3,300,300]
+```
--- a/PaddleCV/PaddleDetection/docs/GETTING_STARTED.md
+++ b/PaddleCV/PaddleDetection/docs/GETTING_STARTED.md
+English | [简体中文](GETTING_STARTED_cn.md)
+
 # Getting Started

 For setting up the running environment, please refer to [installation
 instructions](INSTALL.md).


-## Training
-
-#### Single-GPU Training
+## Training/Evaluation/Inference

+PaddleDetection provides scripots for training, evalution and inference with various features according to different configure.

 ```bash
-export CUDA_VISIBLE_DEVICES=0
+# set PYTHONPATH
 export PYTHONPATH=$PYTHONPATH:.
-python tools/train.py -c configs/faster_rcnn_r50_1x.yml
-```
-
-#### Multi-GPU Training
-
-```bash
+# training in single-GPU and multi-GPU. specify different GPU numbers by CUDA_VISIBLE_DEVICES
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export PYTHONPATH=$PYTHONPATH:.
 python tools/train.py -c configs/faster_rcnn_r50_1x.yml
+# GPU evalution
+export CUDA_VISIBLE_DEVICES=0
+python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
+# Inference
+python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
 ```

-#### CPU Training
+### Optional argument list

-```bash
-export CPU_NUM=8
-export PYTHONPATH=$PYTHONPATH:.
-python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
-```
+list below can be viewed by `--help`

-##### Optional arguments
+|         FLAG             |  script supported  |    description    |     default     |      remark      |
+| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
+|          -c              |      ALL       |  Select config file  |  None  |  **The whole description of configure can refer to [config_example](config_example)** |
+|          -o              |      ALL       |  Set parameters in configure file  |  None  |  `-o` has higher priority to file configured by `-c`. Such as `-o use_gpu=False max_iter=10000`  |  
+|   -r/--resume_checkpoint |     train      |  Checkpoint path for resuming training  |  None  |  `-r output/faster_rcnn_r50_1x/10000`  |
+|        --eval            |     train      |  Whether to perform evaluation in training  |  False  |    |
+|      --output_eval       |     train/eval |  json path in evalution  |  current path  |  `--output_eval ./json_result`  |
+|   -d/--dataset_dir       |   train/eval   |  path for dataset, same as dataset_dir in configs  |  None  |  `-d dataset/coco`  |
+|       --fp16             |     train      |  Whether to enable mixed precision training  |  False  |  GPU training is required  |
+|       --loss_scale       |     train      |  Loss scaling factor for mixed precision training  |  8.0  |  enable when `--fp16` is True  |  
+|       --json_eval        |       eval     |  Whether to evaluate with already existed bbox.json or mask.json  |  False  |  json path is set in `--output_eval`  |
+|       --output_dir       |      infer     |  Directory for storing the output visualization files  |  `./output`  |  `--output_dir output`  |
+|    --draw_threshold      |      infer     |  Threshold to reserve the result for visualization  |  0.5  |  `--draw_threshold 0.7`  |
+|      --infer_dir         |       infer     |  Directory for images to perform inference on  |  None  |    |
+|      --infer_img         |       infer     |  Image path  |  None  |  higher priority over --infer_dir  |
+|        --use_tb          |   train/infer   |  Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard  |  False  |      |
+|        --tb\_log_dir     |   train/infer   |  tb-paddle logging directory for image  |  train:`tb_log_dir/scalar` infer: `tb_log_dir/image`  |     |

- `-r` or `--resume_checkpoint`: Checkpoint path for resuming training. Such as: `-r output/faster_rcnn_r50_1x/10000`
- `--eval`: Whether to perform evaluation in training, default is `False`
- `--output_eval`: If perform evaluation in training, this edits evaluation directory, default is current directory.
- `-d` or `--dataset_dir`: Dataset path, same as `dataset_dir` of configs. Such as: `-d dataset/coco`
- `-c`: Select config file and all files are saved in `configs/`
- `-o`: Set configuration options in config file. Such as: `-o max_iters=180000`. `-o` has higher priority to file configured by `-c`
- `--use_tb`: Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard, default is `False`
- `--tb_log_dir`: tb-paddle logging directory for scalar, default is `tb_log_dir/scalar`
- `--fp16`: Whether to enable mixed precision training (requires GPU), default is `False`
- `--loss_scale`: Loss scaling factor for mixed precision training, default is `8.0`

+## Examples

-##### Examples
+### Training

 - Perform evaluation in training
-```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export PYTHONPATH=$PYTHONPATH:.
-python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
-```

-Alternating between training epoch and evaluation run is possible, simply pass
-in `--eval` to do so and evaluate at each snapshot_iter. It can be modified at `snapshot_iter` of the configuration file. If evaluation dataset is large and
-causes time-consuming in training, we suggest decreasing evaluation times or evaluating after training. When perform evaluation in training,
-the best model with highest MAP is saved at each `snapshot_iter`. `best_model` has the same path as `model_final`.
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
+  ```

+  Perform training and evalution alternatively and evaluate at each snapshot_iter. Meanwhile, the best model with highest MAP is saved at each `snapshot_iter` which has the same path as `model_final`.

- Configure dataset path
-```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export PYTHONPATH=$PYTHONPATH:.
-python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
-                         -d dataset/coco
-```
+  If evaluation dataset is large, we suggest decreasing evaluation times or evaluating after training.

 - Fine-tune other task

-When using pre-trained model to fine-tune other task, the excluded pre-trained parameters can be set by finetune_exclude_pretrained_params in YAML config or -o finetune_exclude_pretrained_params in the arguments.
+  When using pre-trained model to fine-tune other task, two methods can be used:

-```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export PYTHONPATH=$PYTHONPATH:.
-python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
-                         -o pretrain_weights=output/faster_rcnn_r50_1x/model_final/ \
-                            finetune_exclude_pretrained_params = ['cls_score','bbox_pred']
-```
+  1. The excluded pre-trained parameters can be set by `finetune_exclude_pretrained_params` in YAML config
+  2. Set -o finetune\_exclude\_pretrained_params in the arguments.
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
+                           -o pretrain_weights=output/faster_rcnn_r50_1x/model_final/ \
+                              finetune_exclude_pretrained_params = ['cls_score','bbox_pred']
+  ```

 ##### NOTES

 - `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU calculation rules can refer [FAQ](#faq)
- Dataset is stored in `dataset/coco` by default (configurable).
 - Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
 - Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
- Model checkpoints are saved in `output` by default (configurable).
- When finetuning, users could set `pretrain_weights` to the models published by PaddlePaddle. Parameters matched by fields in finetune_exclude_pretrained_params will be ignored in loading and fields can be wildcard matching. For detailed information, please refer to [Transfer Learning](TRANSFER_LEARNING.md).
- To check out hyper parameters used, please refer to the [configs](../configs).
+- Checkpoints are saved in `output` by default, and can be revised from save_dir in configure files.
 - RCNN models training on CPU is not supported on PaddlePaddle<=1.5.1 and will be fixed on later version.


+### Mixed Precision Training
+
+Mixed precision training can be enabled with `--fp16` flag. Currently Faster-FPN, Mask-FPN and Yolov3 have been verified to be working with little to no loss of precision (less than 0.2 mAP)

-## Evaluation
+To speed up mixed precision training, it is recommended to train in multi-process mode, for example

 ```bash
-# run on GPU with:
-export PYTHONPATH=$PYTHONPATH:.
-export CUDA_VISIBLE_DEVICES=0
-python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
+python -m paddle.distributed.launch --selected_gpus 0,1,2,3,4,5,6,7 tools/train.py --fp16 -c configs/faster_rcnn_r50_fpn_1x.yml
 ```

-#### Optional arguments
+If loss becomes `NaN` during training, try tweak the `--loss_scale` value. Please refer to the Nvidia [documentation](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#mptrain) on mixed precision training for details.

- `-d` or `--dataset_dir`: Dataset path, same as dataset_dir of configs. Such as: `-d dataset/coco`
- `--output_eval`: Evaluation directory, default is current directory.
- `-o`: Set configuration options in config file. Such as: `-o weights=output/faster_rcnn_r50_1x/model_final`
- `--json_eval`: Whether to eval with already existed bbox.json or mask.json. Default is `False`. Json file directory is assigned by `-f` argument.
+Also, please note mixed precision training currently requires changing `norm_type` from `affine_channel` to `bn`.

-#### Examples
+
+
+### Evaluation

 - Evaluate by specified weights path and dataset path
-```bash
-# run on GPU with:
-export PYTHONPATH=$PYTHONPATH:.
-export CUDA_VISIBLE_DEVICES=0
-python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
-                        -o weights=output/faster_rcnn_r50_1x/model_final \
-                        -d dataset/coco
-```
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
+                          -o weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar \
+                          -d dataset/coco
+  ```
+
+  The path of model to be evaluted can be both local path and link in [MODEL_ZOO](MODEL_ZOO_cn.md).

 - Evaluate with json
-```bash
-# run on GPU with:
-export PYTHONPATH=$PYTHONPATH:.
-export CUDA_VISIBLE_DEVICES=0
-python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
             --json_eval \
             -f evaluation/
-```
+  ```

-The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory. Or without the `-f` parameter, default is the current directory.
+  The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory.

 #### NOTES

- Checkpoint is loaded from `output` by default (configurable)
 - Multi-GPU evaluation for R-CNN and SSD models is not supported at the
 moment, but it is a planned feature


-## Inference
-
-
- Run inference on a single image:
-
-```bash
-# run on GPU with:
-export PYTHONPATH=$PYTHONPATH:.
-export CUDA_VISIBLE_DEVICES=0
-python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
-```
-
- Multi-image inference:
-
-```bash
-# run on GPU with:
-export PYTHONPATH=$PYTHONPATH:.
-export CUDA_VISIBLE_DEVICES=0
-python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
-```
-
-#### Optional arguments
-
- `--output_dir`: Directory for storing the output visualization files.
- `--draw_threshold`: Threshold to reserve the result for visualization. Default is 0.5.
- `--save_inference_model`: Save inference model in output_dir if True.
- `--use_tb`: Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard, default is `False`
- `--tb_log_dir`: tb-paddle logging directory for image, default is `tb_log_dir/image`
-
-#### Examples
+### Inference

 - Output specified directory && Set up threshold

-```bash
-# run on GPU with:
-export PYTHONPATH=$PYTHONPATH:.
-export CUDA_VISIBLE_DEVICES=0
-python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
                      --infer_img=demo/000000570688.jpg \
                      --output_dir=infer_output/ \
                      --draw_threshold=0.5 \
                      -o weights=output/faster_rcnn_r50_1x/model_final \
                      --use_tb=Ture
-```
+  ```

-The visualization files are saved in `output` by default, to specify a different path, simply add a `--output_dir=` flag.
-`--draw_threshold` is an optional argument. Default is 0.5.
-Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
-If users want to infer according to customized model path, `-o weights` can be set for specified path.
-`--use_tb` is an optional argument, if `--use_tb` is `True`, the tb-paddle will record data in directory,
-so users can see the results in Tensorboard.
+  `--draw_threshold` is an optional argument. Default is 0.5.
+  Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).

- Save inference model

-```bash
-# run on GPU with:
-export CUDA_VISIBLE_DEVICES=0
-export PYTHONPATH=$PYTHONPATH:.
-python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
-                      --infer_img=demo/000000570688.jpg \
-                      --save_inference_model
-```
+- Export model

-Save inference model by set `--save_inference_model`, which can be loaded by PaddlePaddle predict library.
+  ```bash
+  python tools/export_model.py -c configs/faster_rcnn_r50_1x.yml \
+                      --output_dir=inference_model \
+                      -o weights=output/faster_rcnn_r50_1x/model_final \
+                         FasterRCNNTestFeed.image_shape=[3,800,1333]
+  ```

+  Save inference model `tools/export_model.py`, which can be loaded by PaddlePaddle predict library.

 ## FAQ


--- a/PaddleCV/PaddleDetection/docs/GETTING_STARTED_cn.md
+++ b/PaddleCV/PaddleDetection/docs/GETTING_STARTED_cn.md
@@ -3,206 +3,146 @@
 关于配置运行环境，请参考[安装指南](INSTALL_cn.md)


-## 训练
-
-
-#### 单GPU训练
+## 训练/评估/推断

+PaddleDetection提供了训练/训练/评估三个功能的使用脚本，支持通过不同可选参数实现特定功能

 ```bash
-export CUDA_VISIBLE_DEVICES=0
+# 设置PYTHONPATH路径
 export PYTHONPATH=$PYTHONPATH:.
-python tools/train.py -c configs/faster_rcnn_r50_1x.yml
-```
-
-#### 多GPU训练
-
-
-```bash
+# GPU训练 支持单卡，多卡训练，通过CUDA_VISIBLE_DEVICES指定卡号
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export PYTHONPATH=$PYTHONPATH:.
 python tools/train.py -c configs/faster_rcnn_r50_1x.yml
+# GPU评估
+export CUDA_VISIBLE_DEVICES=0
+python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
+# 推断
+python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
 ```

-#### CPU训练
+### 可选参数列表

-```bash
-export CPU_NUM=8
-export PYTHONPATH=$PYTHONPATH:.
-python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
-```
+以下列表可以通过`--help`查看
+
+|         FLAG             |     支持脚本    |        用途        |      默认值       |         备注         |
+| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
+|          -c              |      ALL       |  指定配置文件  |  None  |  **完整配置说明请参考[配置案例](config_example)** |
+|          -o              |      ALL       |  设置配置文件里的参数内容  |  None  |  使用-o配置相较于-c选择的配置文件具有更高的优先级。例如：`-o use_gpu=False max_iter=10000`  |  
+|   -r/--resume_checkpoint |     train      |  从某一检查点恢复训练  |  None  |  `-r output/faster_rcnn_r50_1x/10000`  |
+|        --eval            |     train      |  是否边训练边测试  |  False  |    |
+|      --output_eval       |     train/eval |  编辑评测保存json路径  |  当前路径  |  `--output_eval ./json_result`  |
+|   -d/--dataset_dir       |   train/eval   |  数据集路径, 同配置文件里的dataset_dir  |  None  |  `-d dataset/coco`  |
+|       --fp16             |     train      |  是否使用混合精度训练模式  |  False  |  需使用GPU训练  |
+|       --loss_scale       |     train      |  设置混合精度训练模式中损失值的缩放比例  |  8.0  |  需先开启`--fp16`后使用  |  
+|       --json_eval        |       eval     |  是否通过已存在的bbox.json或者mask.json进行评估  |  False  |  json文件路径在`--output_eval`中设置  |
+|       --output_dir       |      infer     |  输出推断后可视化文件  |  `./output`  |  `--output_dir output`  |
+|    --draw_threshold      |      infer     |  可视化时分数阈值  |  0.5  |  `--draw_threshold 0.7`  |
+|      --infer_dir         |       infer     |  用于推断的图片文件夹路径  |  None  |    |
+|      --infer_img         |       infer     |  用于推断的图片路径  |  None  |  相较于`--infer_dir`具有更高优先级  |
+|        --use_tb          |   train/infer   |  是否使用[tb-paddle](https://github.com/linshuliang/tb-paddle)记录数据，进而在TensorBoard中显示  |  False  |      |
+|        --tb\_log_dir     |   train/infer   |  指定 tb-paddle 记录数据的存储路径  |  train:`tb_log_dir/scalar` infer: `tb_log_dir/image`  |     |

-##### 可选参数

- `-r` or `--resume_checkpoint`: 从某一检查点恢复训练，例如: `-r output/faster_rcnn_r50_1x/10000`
- `--eval`: 是否边训练边测试，默认是 `False`
- `--output_eval`: 如果边训练边测试, 这个参数可以编辑评测保存json路径, 默认是当前目录。
- `-d` or `--dataset_dir`: 数据集路径, 同配置文件里的`dataset_dir`. 例如: `-d dataset/coco`
- `-c`: 选择配置文件，所有配置文件在`configs/`中
- `-o`: 设置配置文件里的参数内容。例如: `-o max_iters=180000`。使用`-o`配置相较于`-c`选择的配置文件具有更高的优先级。
- `--use_tb`: 是否使用[tb-paddle](https://github.com/linshuliang/tb-paddle)记录数据，进而在TensorBoard中显示，默认是False。
- `--tb_log_dir`: 指定 tb-paddle 记录数据的存储路径，默认是`tb_log_dir/scalar`。
- `--fp16`: 是否使用混合精度训练模式（需GPU训练），默认是`False`。
- `--loss_scale`: 设置混合精度训练模式中损失值的缩放比例，默认是`8.0`。
+## 使用示例

-##### 例子
+### 模型训练

 - 边训练边测试

-```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export PYTHONPATH=$PYTHONPATH:.
-python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
-```
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval -d dataset/coco
+  ```

-可通过设置`--eval`在训练epoch中交替执行评估, 评估在每个snapshot\_iter时开始。可在配置文件的`snapshot_iter`处修改。
-如果验证集很大，测试将会比较耗时，影响训练速度，建议减少评估次数，或训练完再进行评估。
-当边训练边测试时，在每次snapshot\_iter会评测出最佳mAP模型保存到
-`best_model`文件夹下，`best_model`的路径和`model_final`的路径相同。
+  在训练中交替执行评估, 评估在每个snapshot\_iter时开始。每次评估后还会评出最佳mAP模型保存到`best_model`文件夹下。

- 指定数据集路径
+  如果验证集很大，测试将会比较耗时，建议减少评估次数，或训练完再进行评估。

-```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export PYTHONPATH=$PYTHONPATH:.
-python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
-                         -d dataset/coco
-```

 - Fine-tune其他任务

-使用预训练模型fine-tune其他任务时，在YAML配置文件中设置`finetune_exclude_pretrained_params`或在命令行中添加`-o finetune_exclude_pretrained_params`对预训练模型进行选择性加载。
+  使用预训练模型fine-tune其他任务时，可采用如下两种方式：

-```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export PYTHONPATH=$PYTHONPATH:.
-python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
+  1. 在YAML配置文件中设置`finetune_exclude_pretrained_params`
+  2. 在命令行中添加-o finetune\_exclude\_pretrained_params对预训练模型进行选择性加载。
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
                         -o pretrain_weights=output/faster_rcnn_r50_1x/model_final/ \
-                            finetune_exclude_pretrained_params = ['cls_score','bbox_pred']
-```
+                            finetune_exclude_pretrained_params=['cls_score','bbox_pred']
+  ```

-##### 提示
+  详细说明请参考[Transfer Learning](TRANSFER_LEARNING_cn.md)
+
+#### 提示

 - `CUDA_VISIBLE_DEVICES` 参数可以指定不同的GPU。例如: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU计算规则可以参考 [FAQ](#faq)
- 数据集默认存储在`dataset/coco`中（可配置）。
 - 若本地未找到数据集，将自动下载数据集并保存在`~/.cache/paddle/dataset`中。
 - 预训练模型自动下载并保存在`〜/.cache/paddle/weights`中。
- 模型checkpoints默认保存在`output`中（可配置）。
- 进行模型fine-tune时，用户可将`pretrain_weights`配置为PaddlePaddle发布的模型，加载模型时finetune_exclude_pretrained_params中的字段匹配的参数不被加载，可以为通配符匹配方式。详细说明请参考[Transfer Learning](TRANSFER_LEARNING_cn.md)
- 更多参数配置，请参考[配置文件](../configs)。
- RCNN系列模型CPU训练在PaddlePaddle 1.5.1及以下版本暂不支持，将在下个版本修复。
+- 模型checkpoints默认保存在`output`中，可通过修改配置文件中save_dir进行配置。
+- RCNN系列模型CPU训练在PaddlePaddle 1.5.1及以下版本暂不支持。

+### 混合精度训练

-## 评估
+通过设置 `--fp16` 命令行选项可以启用混合精度训练。目前混合精度训练已经在Faster-FPN, Mask-FPN 及 Yolov3 上进行验证，几乎没有精度损失（小于0.2 mAP)。

+建议使用多进程方式来进一步加速混合精度训练。示例如下。

 ```bash
-# GPU评估
-export CUDA_VISIBLE_DEVICES=0
-export PYTHONPATH=$PYTHONPATH:.
-python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
+python -m paddle.distributed.launch --selected_gpus 0,1,2,3,4,5,6,7 tools/train.py --fp16 -c configs/faster_rcnn_r50_fpn_1x.yml
 ```

-#### 可选参数
+如果训练过程中loss出现`NaN`，请尝试调节`--loss_scale`选项数值，细节请参看混合精度训练相关的[Nvidia文档](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#mptrain)。

- `-d` or `--dataset_dir`: 数据集路径, 同配置文件里的`dataset_dir`。例如: `-d dataset/coco`
- `--output_eval`: 这个参数可以编辑评测保存json路径, 默认是当前目录。
- `-o`: 设置配置文件里的参数内容。 例如: `-o weights=output/faster_rcnn_r50_1x/model_final`
- `--json_eval`: 是否通过已存在的bbox.json或者mask.json进行评估。默认是`False`。json文件路径通过`-f`指令来设置。
+另外，请注意将配置文件中的 `norm_type` 由 `affine_channel` 改为 `bn`。

-#### 例子

- 指定数据集路径
-```bash
-# GPU评估
-export CUDA_VISIBLE_DEVICES=0
-export PYTHONPATH=$PYTHONPATH:.
-python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
-                        -o weights=output/faster_rcnn_r50_1x/model_final \
+### 模型评估
+
+- 指定权重和数据集路径
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
+                        -o weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar \
                        -d dataset/coco
-```
+  ```
+
+  评估模型可以为本地路径，例如`output/faster_rcnn_r50_1x/model_final/`, 也可以为[MODEL_ZOO](MODEL_ZOO_cn.md)中给出的模型链接。

 - 通过json文件评估
-```bash
-# GPU评估
-export CUDA_VISIBLE_DEVICES=0
-export PYTHONPATH=$PYTHONPATH:.
-python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
             --json_eval \
-             -f evaluation/
-```
+             --output_eval evaluation/
+  ```

-json文件必须命名为bbox.json或者mask.json，放在`evaluation/`目录下，或者不加`-f`参数，默认为当前目录。
+  json文件必须命名为bbox.json或者mask.json，放在`evaluation/`目录下。

 #### 提示

- 默认从`output`加载checkpoint（可配置）
 - R-CNN和SSD模型目前暂不支持多GPU评估，将在后续版本支持


-## 推断
-
-
- 单图片推断
-
-```bash
-# GPU推断
-export CUDA_VISIBLE_DEVICES=0
-export PYTHONPATH=$PYTHONPATH:.
-python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
-```
-
- 多图片推断
-
-```bash
-# GPU推断
-export CUDA_VISIBLE_DEVICES=0
-export PYTHONPATH=$PYTHONPATH:.
-python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
-```
-
-#### 可选参数
-
- `--output_dir`: 输出推断后可视化文件。
- `--draw_threshold`: 设置推断的阈值。默认是0.5.
- `--save_inference_model`: 设为`True`时，将预测模型保存到output\_dir中.
- `--use_tb`: 是否使用[tb-paddle](https://github.com/linshuliang/tb-paddle)记录数据，进而在TensorBoard中显示，默认是False。
- `--tb_log_dir`: 指定 tb-paddle 记录数据的存储路径，默认是`tb_log_dir/image`。
-
-#### 例子
+### 模型推断

 - 设置输出路径 && 设置推断阈值

-```bash
-# GPU推断
-export CUDA_VISIBLE_DEVICES=0
-export PYTHONPATH=$PYTHONPATH:.
-python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python -u tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
                      --infer_img=demo/000000570688.jpg \
                      --output_dir=infer_output/ \
                      --draw_threshold=0.5 \
                      -o weights=output/faster_rcnn_r50_1x/model_final \
-                      --use_tb=True
-```
+  ```


-可视化文件默认保存在`output`中，可通过`--output_dir=`指定不同的输出路径。
-`--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算，
-不同阈值会产生不同的结果。如果用户需要对自定义路径的模型进行推断，可以设置`-o weights`指定模型路径。
-`--use_tb`是个可选参数，当为`True`时，可使用 TensorBoard 来可视化参数的变化趋势和图片。
-
- 保存推断模型
-
-```bash
-# GPU推断
-export CUDA_VISIBLE_DEVICES=0
-export PYTHONPATH=$PYTHONPATH:.
-python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg \
-                      --save_inference_model
-```
-
-通过设置`--save_inference_model`保存可供PaddlePaddle预测库加载的推断模型。
-
+  `--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算，
+  不同阈值会产生不同的结果。如果用户需要对自定义路径的模型进行推断，可以设置`-o weights`指定模型路径。

 ## FAQ

@@ -227,3 +167,7 @@ batch size可以达到每GPU 4 (Tesla V100 16GB)。
 **Q:**  如何修改数据预处理? </br>
 **A:**  可在配置文件中设置 `sample_transform`。注意需要在配置文件中加入**完整预处理**
 例如RCNN模型中`DecodeImage`, `NormalizeImage` and `Permute`。更多详细描述请参考[配置案例](config_example)。
+
+
+**Q:** affine_channel和batch norm是什么关系?
+**A:** 在RCNN系列模型加载预训练模型初始化，有时候会固定住batch norm的参数, 使用预训练模型中的全局均值和方式，并且batch norm的scale和bias参数不更新，已发布的大多ResNet系列的RCNN模型采用这种方式。这种情况下可以在config中设置norm_type为bn或affine_channel, freeze_norm为true (默认为true)，两种方式等价。affne_channel的计算方式为`scale * x + bias`。只不过设置affine_channel时，内部对batch norm的参数自动做了融合。如果训练使用的affine_channel，用保存的模型做初始化，训练其他任务时，即可使用affine_channel, 也可使用batch norm, 参数均可正确加载。
--- a/PaddleCV/PaddleDetection/docs/INSTALL.md
+++ b/PaddleCV/PaddleDetection/docs/INSTALL.md
+English | [简体中文](INSTALL_cn.md)
+
 # Installation

 ---
@@ -36,7 +38,7 @@ python -c "import paddle; print(paddle.__version__)"

 ### Requirements:

- Python2 or Python3
+- Python2 or Python3 (Only support Python3 for windows)
 - CUDA >= 8.0
 - cuDNN >= 5.0
 - nccl >= 2.1.2
@@ -58,6 +60,12 @@ COCO-API is needed for running. Installation is as follows:
    # not to install the COCO API into global site-packages
    python setup.py install --user

+**Installation of COCO-API in windows:**
+
+    # if cython is not installed
+    pip install Cython
+    # Because the origin version of cocoapi does not support windows, another version is used which only supports Python3
+    pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI

 ## PaddleDetection


--- a/PaddleCV/PaddleDetection/docs/INSTALL_cn.md
+++ b/PaddleCV/PaddleDetection/docs/INSTALL_cn.md
@@ -35,7 +35,7 @@ python -c "import paddle; print(paddle.__version__)"

 ### 环境需求:

- Python2 or Python3
+- Python2 or Python3 (windows系统仅支持Python3)
 - CUDA >= 8.0
 - cuDNN >= 5.0
 - nccl >= 2.1.2
@@ -56,6 +56,12 @@ python -c "import paddle; print(paddle.__version__)"
    # 若您没有权限或更倾向不安装至全局site-packages
    python setup.py install --user

+**windows用户安装COCO-API方式：**
+
+    # 若Cython未安装，请安装Cython
+    pip install Cython
+    # 由于原版cocoapi不支持windows，采用第三方实现版本，该版本仅支持Python3
+    pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI

 ## PaddleDetection


--- a/PaddleCV/PaddleDetection/docs/MODEL_ZOO.md
+++ b/PaddleCV/PaddleDetection/docs/MODEL_ZOO.md
+English | [简体中文](MODEL_ZOO_cn.md)
+
 # Model Zoo and Benchmark
 ## Environment

@@ -76,6 +78,7 @@ The backbone models pretrained on ImageNet are available. All backbone models ar
 | ResNet50-FPN            | Cascade Faster | c3-c5 |     2     |   1x    |       -        |  44.2  |    -    | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_fpn_1x.tar) |
 | ResNet101-vd-FPN        | Cascade Faster | c3-c5 |     2     |   1x    |       -        |  46.4  |    -    | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r101_vd_fpn_1x.tar) |
 | ResNeXt101-vd-FPN       | Cascade Faster | c3-c5 |     2     |   1x    |       -        |  47.3  |    -    | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x.tar) |
+| SENet154-vd-FPN         | Cascade Mask   | c3-c5 |    1      |  1.44x  |       -        |  51.9  |  43.9   | [model](https://paddlemodels.bj.bcebos.com/object_detection/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x.tar) |

 #### Notes:
 - Deformable ConvNets v2(dcn_v2) reference from [Deformable ConvNets v2](https://arxiv.org/abs/1811.11168).
@@ -155,3 +158,8 @@ results of image size 608/416/320 above.

 **NOTE**: MobileNet-SSD is trained in 2 GPU with totoal batch size as 64 and trained 120 epoches. VGG-SSD is trained in 4 GPU with total batch size as 32 and trained 240 epoches. SSD training data augmentations: randomly color distortion,
 randomly cropping, randomly expansion, randomly flipping.
+
+
+## Face Detection
+
+Please refer [face detection models](../configs/face_detection) for details.
--- a/PaddleCV/PaddleDetection/docs/MODEL_ZOO_cn.md
+++ b/PaddleCV/PaddleDetection/docs/MODEL_ZOO_cn.md
@@ -75,6 +75,7 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
 | ResNet50-FPN         | Cascade Faster | c3-c5   |    2      |   1x    |      -        |  44.2  |    -    | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_fpn_1x.tar) |
 | ResNet101-vd-FPN     | Cascade Faster | c3-c5   |    2      |   1x    |      -        |  46.4  |    -    | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r101_vd_fpn_1x.tar) |
 | ResNeXt101-vd-FPN    | Cascade Faster | c3-c5   |    2      |   1x    |      -        |  47.3  |    -    | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x.tar) |
+| SENet154-vd-FPN      | Cascade Mask   | c3-c5   |    1      |  1.44x  |      -        |  51.9  |  43.9   | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x.tar) |

 #### 注意事项:
 - Deformable卷积网络v2(dcn_v2)参考自论文[Deformable ConvNets v2](https://arxiv.org/abs/1811.11168).
@@ -149,3 +150,7 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
 | VGG16        | 512  |     8   |   240e  |      65.975     | 80.2  | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ssd_vgg16_512_voc.tar) |

 **注意事项:** MobileNet-SSD在2卡，总batch size为64下训练120周期。VGG-SSD在总batch size为32下训练240周期。数据增强包括：随机颜色失真，随机剪裁，随机扩张，随机翻转。
+
+## 人脸检测
+
+详细请参考[人脸检测模型](../configs/face_detection).
--- a/PaddleCV/PaddleDetection/docs/QUICK_STARTED.md
+++ b/PaddleCV/PaddleDetection/docs/QUICK_STARTED.md
@@ -2,7 +2,7 @@ English | [简体中文](QUICK_STARTED_cn.md)

 # Quick Start

-This tutorial fine-tunes a tiny dataset by pretrained detection model for users to get a model and learn PaddleDetection quickly. The model can be trained in around 15min with good performance.
+This tutorial fine-tunes a tiny dataset by pretrained detection model for users to get a model and learn PaddleDetection quickly. The model can be trained in around 20min with good performance.

 ## Data Preparation


--- a/PaddleCV/PaddleDetection/docs/QUICK_STARTED_cn.md
+++ b/PaddleCV/PaddleDetection/docs/QUICK_STARTED_cn.md
@@ -2,7 +2,7 @@

 # 快速开始

-为了使得用户能够在很短的时间内快速产出模型，掌握PaddleDetection的使用方式，这篇教程通过一个预训练检测模型对小数据集进行finetune。在P40上单卡大约15min即可产出一个效果不错的模型。
+为了使得用户能够在很短的时间内快速产出模型，掌握PaddleDetection的使用方式，这篇教程通过一个预训练检测模型对小数据集进行finetune。在P40上单卡大约20min即可产出一个效果不错的模型。

 ## 数据准备


--- a/PaddleCV/PaddleDetection/docs/TRANSFER_LEARNING.md
+++ b/PaddleCV/PaddleDetection/docs/TRANSFER_LEARNING.md
+English | [简体中文](TRANSFER_LEARNING_cn.md)
+
 # Transfer Learning

 Transfer learning aims at learning new knowledge from existing knowledge. For example, take pretrained model from ImageNet to initialize detection models, or take pretrained model from COCO dataset to initialize train detection models in PascalVOC dataset.
@@ -6,7 +8,10 @@ In transfer learning, if different dataset and the number of classes is used, th

 ## Transfer Learning in PaddleDetection

-In transfer learning, it's needed to load pretrained model selectively. Set `finetune_exclude_pretrained_params` in YAML configuration files or set `-o finetune_exclude_pretrained_params` in command line.
+In transfer learning, it's needed to load pretrained model selectively. The following two methods can be used:
+
+1. Set `finetune_exclude_pretrained_params` in YAML configuration files. Please refer to [configure file](../configs/yolov3_mobilenet_v1_fruit.yml#L15)
+2. Set -o finetune_exclude_pretrained_params in command line. For example:

 ```python
 export PYTHONPATH=$PYTHONPATH:.

--- a/PaddleCV/PaddleDetection/docs/TRANSFER_LEARNING_cn.md
+++ b/PaddleCV/PaddleDetection/docs/TRANSFER_LEARNING_cn.md
@@ -6,7 +6,10 @@

 ## PaddleDetection进行迁移学习

-在迁移学习中，对预训练模型进行选择性加载，可通过在 YMAL 配置文件中通过设置 finetune_exclude_pretrained_params字段，也可通过在 train.py的启动参数中设置 -o finetune_exclude_pretrained_params。
+在迁移学习中，对预训练模型进行选择性加载，可通过如下两种方式实现：
+
+1. 在 YMAL 配置文件中通过设置`finetune_exclude_pretrained_params`字段。可参考[配置文件](../configs/yolov3_mobilenet_v1_fruit.yml#L15)
+2. 在 train.py的启动参数中设置 -o finetune_exclude_pretrained_params。例如：

 ```python
 export PYTHONPATH=$PYTHONPATH:.

--- a/PaddleCV/PaddleDetection/docs/images/bench_ssd_yolo_infer.png
+++ b/PaddleCV/PaddleDetection/docs/images/bench_ssd_yolo_infer.png
--- a/PaddleCV/PaddleDetection/inference/CMakeLists.txt
+++ b/PaddleCV/PaddleDetection/inference/CMakeLists.txt
+cmake_minimum_required(VERSION 3.0)
+project(cpp_inference_demo CXX C)
+message("cmake module path: ${CMAKE_MODULE_PATH}")
+message("cmake root path: ${CMAKE_ROOT}")
+option(WITH_MKL        "Compile demo with MKL/OpenBlas support,defaultuseMKL."          ON)
+option(WITH_GPU        "Compile demo with GPU/CPU, default use CPU."                    ON)
+option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static."   ON)
+option(USE_TENSORRT "Compile demo with TensorRT."   OFF)
+
+SET(PADDLE_DIR "" CACHE PATH "Location of libraries")
+SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
+SET(CUDA_LIB "" CACHE PATH "Location of libraries")
+
+
+include(external-cmake/yaml-cpp.cmake)
+
+macro(safe_set_static_flag)
+    foreach(flag_var
+        CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
+        CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
+      if(${flag_var} MATCHES "/MD")
+        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
+	  endif(${flag_var} MATCHES "/MD")
+    endforeach(flag_var)
+endmacro()
+
+if (WITH_MKL)
+    ADD_DEFINITIONS(-DUSE_MKL)
+endif()
+
+if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
+    message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
+endif()
+
+if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
+    message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
+endif()
+
+include_directories("${CMAKE_SOURCE_DIR}/")
+include_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/src/ext-yaml-cpp/include")
+include_directories("${PADDLE_DIR}/")
+include_directories("${PADDLE_DIR}/third_party/install/protobuf/include")
+include_directories("${PADDLE_DIR}/third_party/install/glog/include")
+include_directories("${PADDLE_DIR}/third_party/install/gflags/include")
+include_directories("${PADDLE_DIR}/third_party/install/xxhash/include")
+if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/include")
+    include_directories("${PADDLE_DIR}/third_party/install/snappy/include")
+endif()
+if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/include")
+    include_directories("${PADDLE_DIR}/third_party/install/snappystream/include")
+endif()
+include_directories("${PADDLE_DIR}/third_party/install/zlib/include")
+include_directories("${PADDLE_DIR}/third_party/boost")
+include_directories("${PADDLE_DIR}/third_party/eigen3")
+
+if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
+    link_directories("${PADDLE_DIR}/third_party/install/snappy/lib")
+endif()
+if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
+    link_directories("${PADDLE_DIR}/third_party/install/snappystream/lib")
+endif()
+
+link_directories("${PADDLE_DIR}/third_party/install/zlib/lib")
+link_directories("${PADDLE_DIR}/third_party/install/protobuf/lib")
+link_directories("${PADDLE_DIR}/third_party/install/glog/lib")
+link_directories("${PADDLE_DIR}/third_party/install/gflags/lib")
+link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib")
+link_directories("${PADDLE_DIR}/paddle/lib/")
+link_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/lib")
+link_directories("${CMAKE_CURRENT_BINARY_DIR}")
+if (WIN32)
+  include_directories("${PADDLE_DIR}/paddle/fluid/inference")
+  link_directories("${PADDLE_DIR}/paddle/fluid/inference")
+  include_directories("${OPENCV_DIR}/build/include")
+  include_directories("${OPENCV_DIR}/opencv/build/include")
+  link_directories("${OPENCV_DIR}/build/x64/vc14/lib")
+else ()
+  include_directories("${PADDLE_DIR}/paddle/include")
+  link_directories("${PADDLE_DIR}/paddle/lib")
+  include_directories("${OPENCV_DIR}/include")
+  link_directories("${OPENCV_DIR}/lib")
+endif ()
+
+if (WIN32)
+    add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
+    set(CMAKE_C_FLAGS_DEBUG   "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
+    set(CMAKE_C_FLAGS_RELEASE  "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
+    set(CMAKE_CXX_FLAGS_DEBUG  "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
+    set(CMAKE_CXX_FLAGS_RELEASE   "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
+    if (WITH_STATIC_LIB)
+        safe_set_static_flag()
+        add_definitions(-DSTATIC_LIB)
+    endif()
+else()
+    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -o2 -std=c++11")
+    set(CMAKE_STATIC_LIBRARY_PREFIX "")
+endif()
+
+# TODO let users define cuda lib path
+if (WITH_GPU)
+    if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
+        message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64")
+    endif()
+    if (NOT WIN32)
+        if (NOT DEFINED CUDNN_LIB)
+            message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64")
+        endif()
+    endif(NOT WIN32)
+endif() 
+
+
+if (NOT WIN32)
+  if (USE_TENSORRT AND WITH_GPU)
+      include_directories("${PADDLE_DIR}/third_party/install/tensorrt/include")
+      link_directories("${PADDLE_DIR}/third_party/install/tensorrt/lib")
+  endif()
+endif(NOT WIN32)
+
+if (NOT WIN32)
+    set(NGRAPH_PATH "${PADDLE_DIR}/third_party/install/ngraph")
+    if(EXISTS ${NGRAPH_PATH})
+        include(GNUInstallDirs)
+        include_directories("${NGRAPH_PATH}/include")
+        link_directories("${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}")
+        set(NGRAPH_LIB ${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}/libngraph${CMAKE_SHARED_LIBRARY_SUFFIX})
+    endif()
+endif()
+
+if(WITH_MKL)
+  include_directories("${PADDLE_DIR}/third_party/install/mklml/include")
+  if (WIN32)
+    set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.lib
+            ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.lib)
+  else ()
+    set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
+            ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
+  endif ()
+  set(MKLDNN_PATH "${PADDLE_DIR}/third_party/install/mkldnn")
+  if(EXISTS ${MKLDNN_PATH})
+    include_directories("${MKLDNN_PATH}/include")
+    if (WIN32)
+      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
+    else ()
+      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
+    endif ()
+  endif()
+else()
+  set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
+endif()
+
+if(WITH_STATIC_LIB)
+  if (WIN32)
+    set(DEPS
+        ${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+  else ()
+    set(DEPS
+        ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+  endif()
+else()
+  if (WIN32)
+    set(DEPS
+        ${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+  else ()
+    set(DEPS
+        ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
+  endif()
+endif()
+
+if (NOT WIN32)
+    set(EXTERNAL_LIB "-lrt -ldl -lpthread")
+    set(DEPS ${DEPS}
+        ${MATH_LIB} ${MKLDNN_LIB}
+        glog gflags protobuf yaml-cpp z xxhash
+        ${EXTERNAL_LIB})
+    if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
+        set(DEPS ${DEPS} snappystream)
+    endif()
+    if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
+        set(DEPS ${DEPS} snappy)
+    endif()
+else()
+    set(DEPS ${DEPS}
+        ${MATH_LIB} ${MKLDNN_LIB}
+        opencv_world346 glog libyaml-cppmt gflags_static libprotobuf zlibstatic xxhash ${EXTERNAL_LIB})
+    set(DEPS ${DEPS} libcmt shlwapi)
+    if (EXISTS "${PADDLE_DIR}/third_party/install/snappy/lib")
+        set(DEPS ${DEPS} snappy)
+    endif()
+    if(EXISTS "${PADDLE_DIR}/third_party/install/snappystream/lib")
+        set(DEPS ${DEPS} snappystream)
+    endif()
+endif(NOT WIN32)
+
+if(WITH_GPU)
+  if(NOT WIN32)
+    if (USE_TENSORRT)
+      set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
+      set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
+    endif()
+    set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
+  else()
+    set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
+    set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
+    set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
+  endif()
+endif()
+
+if (NOT WIN32)    
+    set(OPENCV_LIB_DIR ${OPENCV_DIR}/lib)
+    if(EXISTS "${OPENCV_LIB_DIR}")
+	    message("OPENCV_LIB:" ${OPENCV_LIB_DIR})
+    else()
+        set(OPENCV_LIB_DIR ${OPENCV_DIR}/lib64)	
+	    message("OPENCV_LIB:" ${OPENCV_LIB_DIR})        
+    endif()
+   
+    set(OPENCV_3RD_LIB_DIR ${OPENCV_DIR}/share/OpenCV/3rdparty/lib)
+    if(EXISTS "${OPENCV_3RD_LIB_DIR}")
+	    message("OPENCV_3RD_LIB_DIR:" ${OPENCV_3RD_LIB_DIR})
+    else()
+        set(OPENCV_3RD_LIB_DIR ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64)
+	    message("OPENCV_3RD_LIB_DIR:" ${OPENCV_3RD_LIB_DIR})
+    endif()
+
+    set(DEPS ${DEPS} ${OPENCV_LIB_DIR}/libopencv_imgcodecs${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_LIB_DIR}/libopencv_imgproc${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_LIB_DIR}/libopencv_core${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_LIB_DIR}/libopencv_highgui${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/libIlmImf${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/liblibjasper${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/liblibpng${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/liblibtiff${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/libittnotify${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/liblibjpeg-turbo${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/liblibwebp${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/libzlib${CMAKE_STATIC_LIBRARY_SUFFIX})
+    if(EXISTS "${OPENCV_3RD_LIB_DIR}/libippiw${CMAKE_STATIC_LIBRARY_SUFFIX}")
+        set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/libippiw${CMAKE_STATIC_LIBRARY_SUFFIX})
+    endif()
+    if(EXISTS "${OPENCV_3RD_LIB_DIR}/libippicv${CMAKE_STATIC_LIBRARY_SUFFIX}")
+        set(DEPS ${DEPS} ${OPENCV_3RD_LIB_DIR}/libippicv${CMAKE_STATIC_LIBRARY_SUFFIX})
+    endif()
+endif()
+# message(${CMAKE_CXX_FLAGS})
+# set(CMAKE_CXX_FLAGS "-g ${CMAKE_CXX_FLAGS}")
+
+SET(PADDLESEG_INFERENCE_SRCS  preprocessor/preprocessor.cpp 
+    preprocessor/preprocessor_detection.cpp predictor/detection_predictor.cpp 
+    utils/detection_result.pb.cc)
+
+ADD_LIBRARY(libpaddleseg_inference STATIC ${PADDLESEG_INFERENCE_SRCS})
+target_link_libraries(libpaddleseg_inference ${DEPS})
+
+add_executable(detection_demo detection_demo.cpp)
+
+ADD_DEPENDENCIES(libpaddleseg_inference ext-yaml-cpp)
+ADD_DEPENDENCIES(detection_demo ext-yaml-cpp libpaddleseg_inference)
+target_link_libraries(detection_demo ${DEPS} libpaddleseg_inference)
+
+if (WIN32)
+    add_custom_command(TARGET detection_demo POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./mkldnn.dll
+    )
+endif()
+
+execute_process(COMMAND cp -r ${CMAKE_SOURCE_DIR}/images ${CMAKE_SOURCE_DIR}/conf ${CMAKE_CURRENT_BINARY_DIR})
--- a/PaddleCV/PaddleDetection/inference/LICENSE
+++ b/PaddleCV/PaddleDetection/inference/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/PaddleCV/PaddleDetection/inference/README.md
+++ b/PaddleCV/PaddleDetection/inference/README.md
+# PaddleDetection C++预测部署方案
+
+## 本文档结构
+
+[1.说明](#1说明)
+
+[2.主要目录和文件](#2主要目录和文件)
+
+[3.编译](#3编译)
+
+[4.预测并可视化结果](#4预测并可视化结果)
+
+
+
+
+## 1.说明
+
+本目录提供一个跨平台的图像检测模型的C++预测部署方案，用户通过一定的配置，加上少量的代码，即可把模型集成到自己的服务中，完成相应的图像检测任务。
+
+主要设计的目标包括以下四点：
+- 跨平台，支持在 Windows 和 Linux 完成编译、开发和部署
+- 可扩展性，支持用户针对新模型开发自己特殊的数据预处理等逻辑
+- 高性能，除了`PaddlePaddle`自身带来的性能优势，我们还针对图像检测的特点对关键步骤进行了性能优化
+- 支持多种常见的图像检测模型，如YOLOv3, Faster-RCNN, Faster-RCNN+FPN，用户通过少量配置即可加载模型完成常见检测任务
+
+## 2.主要目录和文件
+
+```bash
+deploy
+├── detection_demo.cpp # 完成图像检测预测任务C++代码
+│
+├── conf
+│   ├── detection_rcnn.yaml #示例faster rcnn 目标检测配置
+│   └── detection_rcnn_fpn.yaml #示例faster rcnn + fpn目标检测配置
+├── images
+│   └── detection_rcnn # 示例faster rcnn + fpn目标检测测试图片目录
+├── tools
+│   └── vis.py # 示例图像检测结果可视化脚本
+├── docs
+│   ├── linux_build.md # Linux 编译指南
+│   ├── windows_vs2015_build.md # windows VS2015编译指南
+│   └── windows_vs2019_build.md # Windows VS2019编译指南
+│
+├── utils # 一些基础公共函数
+│
+├── preprocess # 数据预处理相关代码
+│
+├── predictor # 模型加载和预测相关代码
+│
+├── CMakeList.txt # cmake编译入口文件
+│
+└── external-cmake # 依赖的外部项目cmake（目前仅有yaml-cpp）
+
+```
+
+## 3.编译
+支持在`Windows`和`Linux`平台编译和使用：
+- [Linux 编译指南](./docs/linux_build.md)
+- [Windows 使用 Visual Studio 2019 Community 编译指南](./docs/windows_vs2019_build.md)
+- [Windows 使用 Visual Studio 2015 编译指南](./docs/windows_vs2015_build.md)
+
+`Windows`上推荐使用最新的`Visual Studio 2019 Community`直接编译`CMake`项目。
+
+## 4.预测并可视化结果
+
+完成编译后，便生成了需要的可执行文件和链接库。这里以我们基于`faster rcnn`检测模型为例，介绍部署图像检测模型的通用流程。
+
+### 1. 下载模型文件
+我们提供faster rcnn，faster rcnn+fpn模型用于预测coco17数据集，可在以下链接下载：[faster rcnn示例模型下载地址](https://paddleseg.bj.bcebos.com/inference/faster_rcnn_pp50.zip)，
+ [faster rcnn + fpn示例模型下载地址](https://paddleseg.bj.bcebos.com/inference/faster_rcnn_pp50_fpn.zip)。
+
+下载并解压，解压后目录结构如下：
+```
+faster_rcnn_pp50/
+├── __model__ # 模型文件
+│
+└── __params__ # 参数文件
+```
+解压后把上述目录拷贝到合适的路径：
+
+**假设**`Windows`系统上，我们模型和参数文件所在路径为`D:\projects\models\faster_rcnn_pp50`。
+
+**假设**`Linux`上对应的路径则为`/root/projects/models/faster_rcnn_pp50/`。
+
+
+### 2. 修改配置
+
+`inference`源代码(即本目录)的`conf`目录下提供了示例基于faster rcnn的配置文件`detection_rcnn.yaml`, 相关的字段含义和说明如下：
+
+```yaml
+DEPLOY:
+    # 是否使用GPU预测
+    USE_GPU: 1
+    # 模型和参数文件所在目录路径
+    MODEL_PATH: "/root/projects/models/faster_rcnn_pp50"
+    # 模型文件名
+    MODEL_FILENAME: "__model__"
+    # 参数文件名
+    PARAMS_FILENAME: "__params__"
+    # 预测图片的标准输入，尺寸不一致会resize
+    EVAL_CROP_SIZE: (608, 608)
+    # resize方式，支持 UNPADDING和RANGE_SCALING
+    RESIZE_TYPE: "RANGE_SCALING"
+    # 短边对齐的长度，仅在RANGE_SCALING下有效
+    TARGET_SHORT_SIZE : 800
+    # 均值
+    MEAN:  [0.4647, 0.4647, 0.4647]
+    # 方差
+    STD: [0.0834, 0.0834, 0.0834]
+    # 图片类型， rgb或者rgba
+    IMAGE_TYPE: "rgb"
+    # 像素分类数
+    NUM_CLASSES: 1
+    # 通道数
+    CHANNELS : 3
+    # 预处理器， 目前提供图像检测的通用处理类DetectionPreProcessor
+    PRE_PROCESSOR: "DetectionPreProcessor"
+    # 预测模式，支持 NATIVE 和 ANALYSIS
+    PREDICTOR_MODE: "ANALYSIS"
+    # 每次预测的 batch_size
+    BATCH_SIZE : 3
+    # 长边伸缩的最大长度，-1代表无限制。
+    RESIZE_MAX_SIZE: 1333
+    # 输入的tensor数量。
+    FEEDS_SIZE: 3
+
+```
+修改字段`MODEL_PATH`的值为你在**上一步**下载并解压的模型文件所放置的目录即可。更多配置文件字段介绍，请参考文档[预测部署方案配置文件说明](./docs/configuration.md)。
+
+### 3. 执行预测
+
+在终端中切换到生成的可执行文件所在目录为当前目录(Windows系统为`cmd`)。
+
+`Linux` 系统中执行以下命令：
+```shell
+./detection_demo --conf=conf/detection_rcnn.yaml --input_dir=images/detection_rcnn
+```
+`Windows` 中执行以下命令:
+```shell
+.\detection_demo.exe --conf=conf\detection_rcnn.yaml --input_dir=images\detection_rcnn\
+```
+
+
+预测使用的两个命令参数说明如下：
+
+| 参数 | 含义 |
+|-------|----------|
+| conf | 模型配置的Yaml文件路径 |
+| input_dir | 需要预测的图片目录 |
+
+·
+配置文件说明请参考上一步，样例程序会扫描input_dir目录下的所有图片，并为每一张图片生成对应的预测结果，输出到屏幕，并在`X`同一目录下保存到`X.pb文件`（X为对应图片的文件名）。可使用工具脚本vis.py将检测结果可视化。
+
+**检测结果可视化**
+
+运行可视化脚本时，只需输入命令行参数图片路径、检测结果pb文件路径、目标框阈值以及类别-标签映射文件路径即可得到可视化的图片`X.png` (tools目录下提供coco17的类别标签映射文件coco17.json)。
+
+```bash
+python vis.py --img_path=../build/images/detection_rcnn/000000087038.jpg --img_result_path=../build/images/detection_rcnn/000000087038.jpg.pb --threshold=0.1 --c2l_path=coco17.json
+```
+
+检测结果（每个图片的结果用空行隔开）
+
+```原图：```
+
+![原图](./demo_images/000000087038.jpg)
+
+```检测结果图：```
+
+![检测结果](./demo_images/000000087038.jpg.png)
--- a/PaddleCV/PaddleDetection/inference/conf/detection_rcnn.yaml
+++ b/PaddleCV/PaddleDetection/inference/conf/detection_rcnn.yaml
+DEPLOY:
+    USE_GPU: 1
+    MODEL_PATH: "/root/projects/models/faster_rcnn_pp50"
+    MODEL_FILENAME: "__model__"
+    PARAMS_FILENAME: "__params__"
+    EVAL_CROP_SIZE: (608, 608)
+    RESIZE_TYPE: "RANGE_SCALING"
+    TARGET_SHORT_SIZE : 800
+    MEAN:  [0.485, 0.456, 0.406]
+    STD: [0.229, 0.224, 0.225]
+    IMAGE_TYPE: "rgb"
+    NUM_CLASSES: 1
+    CHANNELS : 3
+    PRE_PROCESSOR: "DetectionPreProcessor"
+    PREDICTOR_MODE: "ANALYSIS"
+    BATCH_SIZE : 3 
+    RESIZE_MAX_SIZE: 1333
+    FEEDS_SIZE: 3
--- a/PaddleCV/PaddleDetection/inference/conf/detection_rcnn_fpn.yaml
+++ b/PaddleCV/PaddleDetection/inference/conf/detection_rcnn_fpn.yaml
+DEPLOY:
+    USE_GPU: 1
+    MODEL_PATH: "/root/projects/models/faster_rcnn_pp50_fpn"
+    MODEL_FILENAME: "__model__"
+    PARAMS_FILENAME: "__params__"
+    EVAL_CROP_SIZE: (608, 608)
+    RESIZE_TYPE: "RANGE_SCALING"
+    TARGET_SHORT_SIZE : 800
+    MEAN:  [0.485, 0.456, 0.406]
+    STD: [0.229, 0.224, 0.225]
+    IMAGE_TYPE: "rgb"
+    NUM_CLASSES: 1
+    CHANNELS : 3
+    PRE_PROCESSOR: "DetectionPreProcessor"
+    PREDICTOR_MODE: "ANALYSIS"
+    BATCH_SIZE : 1 
+    RESIZE_MAX_SIZE: 1333
+    FEEDS_SIZE: 3
+    COARSEST_STRIDE: 32
--- a/PaddleCV/PaddleDetection/inference/demo_images/000000087038.jpg
+++ b/PaddleCV/PaddleDetection/inference/demo_images/000000087038.jpg
--- a/PaddleCV/PaddleDetection/inference/demo_images/000000087038.jpg.png
+++ b/PaddleCV/PaddleDetection/inference/demo_images/000000087038.jpg.png
--- a/PaddleCV/PaddleDetection/inference/detection_demo.cpp
+++ b/PaddleCV/PaddleDetection/inference/detection_demo.cpp
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <glog/logging.h>
+#include <utils/utils.h>
+#include <predictor/detection_predictor.h>
+
+DEFINE_string(conf, "", "Configuration File Path");
+DEFINE_string(input_dir, "", "Directory of Input Images");
+
+int main(int argc, char** argv) {
+    // 0. parse args
+    google::ParseCommandLineFlags(&argc, &argv, true);
+    if (FLAGS_conf.empty() || FLAGS_input_dir.empty()) {
+        std::cout << "Usage: ./predictor --conf=/config/path/to/your/model --input_dir=/directory/of/your/input/images";
+        return -1;
+    }
+    // 1. create a predictor and init it with conf
+    PaddleSolution::DetectionPredictor predictor;
+    if (predictor.init(FLAGS_conf) != 0) {
+        LOG(FATAL) << "Fail to init predictor";
+        return -1;
+    }
+
+    // 2. get all the images with extension '.jpeg' at input_dir
+    auto imgs = PaddleSolution::utils::get_directory_images(FLAGS_input_dir, ".jpeg|.jpg|.JPEG|.JPG|.bmp|.BMP|.png|.PNG");
+
+    // 3. predict
+    predictor.predict(imgs);
+    return 0;
+}
--- a/PaddleCV/PaddleDetection/inference/docs/configuration.md
+++ b/PaddleCV/PaddleDetection/inference/docs/configuration.md
+# 预测部署方案配置文件说明
+## 基本概念
+预测部署方案的配置文件旨在给用户提供一个预测部署方案定制化接口。用户仅需理解该配置文件相关字段的含义，无需编写任何代码，即可定制化预测部署方案。为了更好地表达每个字段的含义，首先介绍配置文件中字段的类型。
+
+### 字段类型
+- **required**: 表明该字段必须显式定义，否则无法正常启动预测部署程序。
+- **optional**: 表明该字段可忽略不写，预测部署系统会提供默认值，相关默认值将在下文介绍。
+
+### 字段值类型
+- **int**：表明该字段必须赋予整型类型的值。
+- **string**：表明该字段必须赋予字符串类型的值。
+- **list**：表明该字段必须赋予列表的值。
+- **tuple**: 表明该字段必须赋予双元素元组的值。
+
+## 字段介绍
+
+```yaml
+# 预测部署时所有配置字段需在DEPLOY字段下
+DEPLOY:
+    # 类型：required int
+    # 含义：是否使用GPU预测。 0:不使用  1:使用
+    USE_GPU: 1
+    # 类型：required string
+    # 含义：模型和参数文件所在目录
+    MODEL_PATH: "/path/to/model_directory"
+    # 类型：required string
+    # 含义：模型文件名
+    MODEL_FILENAME: "__model__"
+    # 类型：required string
+    # 含义：参数文件名
+    PARAMS_FILENAME: "__params__"
+    # 类型：optional string
+    # 含义：图像resize的类型。支持 UNPADDING 和 RANGE_SCALING模式。默认是UNPADDING模式。
+    RESIZE_TYPE: "UNPADDING"
+    # 类型：required tuple
+    # 含义：当使用UNPADDING模式时，会将图像直接resize到该尺寸。
+    EVAL_CROP_SIZE: (513, 513)
+    # 类型：optional int
+    # 含义：当使用RANGE_SCALING模式时，图像短边需要对齐该字段的值，长边会同比例
+    # 的缩放，从而在保持图像长宽比例不变的情况下resize到新的尺寸。默认值为0。
+    TARGET_SHORT_SIZE: 800
+    # 类型：optional int
+    # 含义: 当使用RANGE_SCALING模式时,长边不能缩放到比该字段的值大。默认值为0。
+    RESIZE_MAX_SIZE: 1333
+    # 类型：required list
+    # 含义：图像进行归一化预处理时的均值
+    MEAN: [104.008, 116.669, 122.675]
+    # 类型：required list
+    # 含义：图像进行归一化预处理时的方差
+    STD: [1.0, 1.0, 1.0]
+    # 类型：string
+    # 含义：图片类型, rgb 或者 rgba
+    IMAGE_TYPE: "rgb"
+    # 类型：required int
+    # 含义：图像分类类型数
+    NUM_CLASSES: 2
+    # 类型：required int
+    # 含义：图片通道数
+    CHANNELS : 3
+    # 类型：required string
+    # 含义：预处理方式，目前提供图像检测的通用预处理类DetectionPreProcessor.
+    PRE_PROCESSOR: "DetectionPreProcessor"
+    # 类型：required string
+    # 含义：预测模式，支持 NATIVE 和 ANALYSIS
+    PREDICTOR_MODE: "ANALYSIS"
+    # 类型：required int
+    # 含义：每次预测的 batch_size
+    BATCH_SIZE : 3
+    # 类型：optional int
+    # 含义: 输入张量的个数。大部分模型不需要设置。 默认值为1.
+    FEEDS_SIZE: 2
+    # 类型: optional int
+    # 含义: 将图像的边变为该字段的值的整数倍。默认值为1。
+    COARSEST_STRIDE: 32
+```
--- a/PaddleCV/PaddleDetection/inference/docs/linux_build.md
+++ b/PaddleCV/PaddleDetection/inference/docs/linux_build.md
+# Linux平台 编译指南
+
+## 说明
+本文档在 `Linux`平台使用`GCC 4.8.5` 和 `GCC 4.9.4`测试过，如果需要使用更高G++版本编译使用，则需要重新编译Paddle预测库，请参考: [从源码编译Paddle预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_usage/deploy/inference/build_and_install_lib_cn.html#id15)。
+
+## 前置条件
+* G++ 4.8.2 ~ 4.9.4
+* CUDA 8.0/ CUDA 9.0
+* CMake 3.0+
+
+请确保系统已经安装好上述基本软件，**下面所有示例以工作目录为 `/root/projects/`演示**。
+
+### Step1: 下载代码
+
+1. `mkdir -p /root/projects/paddle_models && cd /root/projects/paddle_models`
+2. `git clone https://github.com/PaddlePaddle/models.git`
+
+`C++`预测代码在`/root/projects/paddle_models/models/PaddleCV/PaddleDetection/inference` 目录，该目录不依赖任何`PaddleDetection`下其他目录。
+
+
+### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
+
+目前仅支持`CUDA 8` 和 `CUDA 9`，请点击 [PaddlePaddle预测库下载地址](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_usage/deploy/inference/build_and_install_lib_cn.html)下载对应的版本（develop版本）。
+
+
+下载并解压后`/root/projects/fluid_inference`目录包含内容为：
+```
+fluid_inference
+├── paddle # paddle核心库和头文件
+|
+├── third_party # 第三方依赖库和头文件
+|
+└── version.txt # 版本和编译信息
+```
+
+### Step3: 安装配置OpenCV
+
+```shell
+# 0. 切换到/root/projects目录
+cd /root/projects
+# 1. 下载OpenCV3.4.6版本源代码
+wget -c https://paddleseg.bj.bcebos.com/inference/opencv-3.4.6.zip
+# 2. 解压
+unzip opencv-3.4.6.zip && cd opencv-3.4.6
+# 3. 创建build目录并编译, 这里安装到/usr/local/opencv3目录
+mkdir build && cd build
+cmake .. -DCMAKE_INSTALL_PREFIX=/root/projects/opencv3 -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DWITH_IPP=OFF -DBUILD_IPP_IW=OFF -DWITH_LAPACK=OFF -DWITH_EIGEN=OFF -DCMAKE_INSTALL_LIBDIR=lib64 -DWITH_ZLIB=ON -DBUILD_ZLIB=ON -DWITH_JPEG=ON -DBUILD_JPEG=ON -DWITH_PNG=ON -DBUILD_PNG=ON -DWITH_TIFF=ON -DBUILD_TIFF=ON
+make -j4
+make install
+```
+
+**注意：** 上述操作完成后，`opencv` 被安装在 `/root/projects/opencv3` 目录。
+
+### Step4: 编译
+
+`CMake`编译时，涉及到四个编译参数用于指定核心依赖库的路径, 他们的定义如下:
+
+|  参数名   | 含义  |
+|  ----  | ----  |
+| CUDA_LIB  | cuda的库路径 |
+| CUDNN_LIB | cuDnn的库路径|
+| OPENCV_DIR  | OpenCV的安装路径， |
+| PADDLE_DIR | Paddle预测库的路径 |
+
+执行下列操作时，**注意**把对应的参数改为你的上述依赖库实际路径：
+
+```shell
+cd /root/projects/paddle_models/models/PaddleCV/PaddleDetection/inference
+
+mkdir build && cd build
+cmake .. -DWITH_GPU=ON  -DPADDLE_DIR=/root/projects/fluid_inference -DCUDA_LIB=/usr/local/cuda/lib64/ -DOPENCV_DIR=/root/projects/opencv3/ -DCUDNN_LIB=/usr/local/cuda/lib64/
+make
+```
+
+
+### Step5: 预测及可视化
+
+执行命令：
+
+```
+./detection_demo --conf=/path/to/your/conf --input_dir=/path/to/your/input/data/directory
+```
+
+更详细说明请参考ReadMe文档： [预测和可视化部分](../README.md)
--- a/PaddleCV/PaddleDetection/inference/docs/windows_vs2015_build.md
+++ b/PaddleCV/PaddleDetection/inference/docs/windows_vs2015_build.md
+# Windows平台使用 Visual Studio 2015 编译指南
+
+本文档步骤，我们同时在`Visual Studio 2015` 和 `Visual Studio 2019 Community` 两个版本进行了测试，我们推荐使用[`Visual Studio 2019`直接编译`CMake`项目](./windows_vs2019_build.md)。
+
+
+## 前置条件
+* Visual Studio 2015
+* CUDA 8.0/ CUDA 9.0
+* CMake 3.0+
+
+请确保系统已经安装好上述基本软件，**下面所有示例以工作目录为 `D:\projects`演示**。
+
+### Step1: 下载代码
+
+1. 打开`cmd`, 执行 `cd D:\projects\paddle_models`
+2. `git clone https://github.com/PaddlePaddle/models.git`
+
+`C++`预测库代码在`D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference` 目录，该目录不依赖任何`PaddleDetection`下其他目录。
+
+
+### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
+
+根据Windows环境，下载相应版本的PaddlePaddle预测库，并解压到`D:\projects\`目录
+
+| CUDA | GPU | 下载地址 |
+|------|------|--------|
+| 8.0 | Yes | [fluid_inference.zip](https://bj.bcebos.com/v1/paddleseg/fluid_inference_win.zip) |
+| 9.0 | Yes | [fluid_inference_cuda90.zip](https://paddleseg.bj.bcebos.com/fluid_inference_cuda9_cudnn7.zip) |
+
+解压后`D:\projects\fluid_inference`目录包含内容为：
+```
+fluid_inference
+├── paddle # paddle核心库和头文件
+|
+├── third_party # 第三方依赖库和头文件
+|
+└── version.txt # 版本和编译信息
+```
+
+### Step3: 安装配置OpenCV
+
+1. 在OpenCV官网下载适用于Windows平台的3.4.6版本， [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)  
+2. 运行下载的可执行文件，将OpenCV解压至指定目录，如`D:\projects\opencv`  
+3. 配置环境变量，如下流程所示  
+    - 我的电脑->属性->高级系统设置->环境变量  
+    - 在系统变量中找到Path（如没有，自行创建），并双击编辑  
+    - 新建，将opencv路径填入并保存，如`D:\projects\opencv\build\x64\vc14\bin`
+
+### Step4: 以VS2015为例编译代码
+
+以下命令需根据自己系统中各相关依赖的路径进行修改
+
+* 调用VS2015, 请根据实际VS安装路径进行调整，打开cmd命令行工具执行以下命令
+* 其他vs版本(比如vs2019)，请查找到对应版本的`vcvarsall.bat`路径，替换本命令即可
+
+```
+call "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" amd64
+```
+
+* CMAKE编译工程
+    * PADDLE_DIR: fluid_inference预测库路径
+    * CUDA_LIB: CUDA动态库目录, 请根据实际安装情况调整
+    * OPENCV_DIR: OpenCV解压目录
+
+```
+# 切换到预测库所在目录
+cd /d D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference
+# 创建构建目录, 重新构建只需要删除该目录即可
+mkdir build
+cd build
+# cmake构建VS项目
+D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference\build> cmake .. -G "Visual Studio 14 2015 Win64" -DWITH_GPU=ON -DPADDLE_DIR=D:\projects\fluid_inference -DCUDA_LIB=D:\projects\cudalib\v9.0\lib\x64 -DOPENCV_DIR=D:\projects\opencv -T host=x64
+```
+
+这里的`cmake`参数`-G`, 表示生成对应的VS版本的工程，可以根据自己的`VS`版本调整，具体请参考[cmake文档](https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html)
+
+* 生成可执行文件
+
+```
+D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference\build> msbuild /m /p:Configuration=Release cpp_inference_demo.sln
+```
+
+### Step5: 预测及可视化
+
+上述`Visual Studio 2015`编译产出的可执行文件在`build\release`目录下，切换到该目录：
+```
+cd /d D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference\build\release
+```
+
+之后执行命令：
+
+```
+detection_demo.exe --conf=/path/to/your/conf --input_dir=/path/to/your/input/data/directory
+```
+
+更详细说明请参考ReadMe文档： [预测和可视化部分](../README.md)
--- a/PaddleCV/PaddleDetection/inference/docs/windows_vs2019_build.md
+++ b/PaddleCV/PaddleDetection/inference/docs/windows_vs2019_build.md
+# Visual Studio 2019 Community CMake 编译指南
+
+Windows 平台下，我们使用`Visual Studio 2015` 和 `Visual Studio 2019 Community` 进行了测试。微软从`Visual Studio 2017`开始即支持直接管理`CMake`跨平台编译项目，但是直到`2019`才提供了稳定和完全的支持，所以如果你想使用CMake管理项目编译构建，我们推荐你使用`Visual Studio 2019`环境下构建。
+
+你也可以使用和`VS2015`一样，通过把`CMake`项目转化成`VS`项目来编译，其中**有差别的部分**在文档中我们有说明，请参考：[使用Visual Studio 2015 编译指南](./windows_vs2015_build.md)
+
+## 前置条件
+* Visual Studio 2019
+* CUDA 8.0/ CUDA 9.0
+* CMake 3.0+
+
+请确保系统已经安装好上述基本软件，我们使用的是`VS2019`的社区版。
+
+**下面所有示例以工作目录为 `D:\projects`演示**。
+
+### Step1: 下载代码
+
+1. 点击下载源代码：[下载地址](https://github.com/PaddlePaddle/models/archive/develop.zip)
+2. 解压，解压后目录重命名为`paddle_models`
+
+以下代码目录路径为`D:\projects\paddle_models` 为例。
+
+
+### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
+
+根据Windows环境，下载相应版本的PaddlePaddle预测库，并解压到`D:\projects\`目录
+
+| CUDA | GPU | 下载地址 |
+|------|------|--------|
+| 8.0 | Yes | [fluid_inference.zip](https://bj.bcebos.com/v1/paddleseg/fluid_inference_win.zip) |
+| 9.0 | Yes | [fluid_inference_cuda90.zip](https://paddleseg.bj.bcebos.com/fluid_inference_cuda9_cudnn7.zip) |
+
+解压后`D:\projects\fluid_inference`目录包含内容为：
+```
+fluid_inference
+├── paddle # paddle核心库和头文件
+|
+├── third_party # 第三方依赖库和头文件
+|
+└── version.txt # 版本和编译信息
+```
+**注意：** `CUDA90`版本解压后目录名称为`fluid_inference_cuda90`。
+
+### Step3: 安装配置OpenCV
+
+1. 在OpenCV官网下载适用于Windows平台的3.4.6版本， [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)  
+2. 运行下载的可执行文件，将OpenCV解压至指定目录，如`D:\projects\opencv`
+3. 配置环境变量，如下流程所示  
+    - 我的电脑->属性->高级系统设置->环境变量  
+    - 在系统变量中找到Path（如没有，自行创建），并双击编辑  
+    - 新建，将opencv路径填入并保存，如`D:\projects\opencv\build\x64\vc14\bin`
+
+### Step4: 使用Visual Studio 2019直接编译CMake
+
+1. 打开Visual Studio 2019 Community，点击`继续但无需代码`
+![step2](https://paddleseg.bj.bcebos.com/inference/vs2019_step1.png)
+2. 点击： `文件`->`打开`->`CMake`
+![step2.1](https://paddleseg.bj.bcebos.com/inference/vs2019_step2.png)
+
+选择项目代码所在路径，并打开`CMakeList.txt`：
+
+![step2.2](https://paddleseg.bj.bcebos.com/inference/vs2019_step3.png)
+
+3. 点击：`项目`->`cpp_inference_demo的CMake设置`
+
+![step3](https://paddleseg.bj.bcebos.com/inference/vs2019_step4.png)
+
+4. 点击`浏览`，分别设置编译选项指定`CUDA`、`OpenCV`、`Paddle预测库`的路径
+
+![step4](https://paddleseg.bj.bcebos.com/inference/vs2019_step5.png)
+
+三个编译参数的含义说明如下：
+
+|  参数名   | 含义  |
+|  ----  | ----  |
+| CUDA_LIB  | cuda的库路径 |
+| OPENCV_DIR  | OpenCV的安装路径， |
+| PADDLE_DIR | Paddle预测库的路径 |
+
+**设置完成后**, 点击上图中`保存并生成CMake缓存以加载变量`。
+
+5. 点击`生成`->`全部生成`
+
+![step6](https://paddleseg.bj.bcebos.com/inference/vs2019_step6.png)
+
+
+### Step5: 预测及可视化
+
+上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release`目录下，打开`cmd`，并切换到该目录：
+
+```
+cd D:\projects\paddle_models\models\PaddleCV\PaddleDetection\inference\build\x64-Release
+```
+
+之后执行命令：
+
+```
+detection_demo.exe --conf=/path/to/your/conf --input_dir=/path/to/your/input/data/directory
+```
+
+更详细说明请参考ReadMe文档： [预测和可视化部分](../README.md)
--- a/PaddleCV/PaddleDetection/inference/external-cmake/yaml-cpp.cmake
+++ b/PaddleCV/PaddleDetection/inference/external-cmake/yaml-cpp.cmake
+
+find_package(Git REQUIRED)
+
+include(ExternalProject)
+
+message("${CMAKE_BUILD_TYPE}")
+
+ExternalProject_Add(
+        ext-yaml-cpp
+        GIT_REPOSITORY https://github.com/jbeder/yaml-cpp.git
+        GIT_TAG e0e01d53c27ffee6c86153fa41e7f5e57d3e5c90
+        CMAKE_ARGS
+        -DYAML_CPP_BUILD_TESTS=OFF
+		-DYAML_CPP_BUILD_TOOLS=OFF
+        -DYAML_CPP_INSTALL=OFF
+        -DYAML_CPP_BUILD_CONTRIB=OFF
+		-DMSVC_SHARED_RT=OFF
+		-DBUILD_SHARED_LIBS=OFF
+        -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
+        -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
+        -DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}
+        -DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
+        -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib
+        -DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib
+        PREFIX "${CMAKE_BINARY_DIR}/ext/yaml-cpp"
+        # Disable install step
+        INSTALL_COMMAND ""
+	    LOG_DOWNLOAD ON
+)
--- a/PaddleCV/PaddleDetection/inference/images/detection_rcnn/000000014439.jpg
+++ b/PaddleCV/PaddleDetection/inference/images/detection_rcnn/000000014439.jpg
--- a/PaddleCV/PaddleDetection/inference/images/detection_rcnn/000000087038.jpg
+++ b/PaddleCV/PaddleDetection/inference/images/detection_rcnn/000000087038.jpg
--- a/PaddleCV/PaddleDetection/inference/images/detection_rcnn/000000570688.jpg
+++ b/PaddleCV/PaddleDetection/inference/images/detection_rcnn/000000570688.jpg
--- a/PaddleCV/PaddleDetection/inference/predictor/detection_predictor.cpp
+++ b/PaddleCV/PaddleDetection/inference/predictor/detection_predictor.cpp
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "detection_predictor.h"
+#include <cstring>
+#include <cmath>
+#include <fstream>
+#include "utils/detection_result.pb.h"
+
+namespace PaddleSolution {
+    /* lod_buffer: every item in lod_buffer is an image matrix after preprocessing
+     * input_buffer: same data with lod_buffer after flattening to 1-D vector and padding, needed to be empty before using this function
+     */
+    void padding_minibatch(const std::vector<std::vector<float>> &lod_buffer, std::vector<float> &input_buffer, 
+                           std::vector<int> &resize_heights, std::vector<int> &resize_widths, int channels, int coarsest_stride = 1) {
+        int batch_size = lod_buffer.size();
+        int max_h = -1;
+        int max_w = -1;
+        for(int i = 0; i < batch_size; ++i) {
+            max_h = (max_h > resize_heights[i])? max_h:resize_heights[i];
+            max_w = (max_w > resize_widths[i])? max_w:resize_widths[i];
+        }
+        max_h = static_cast<int>(ceil(static_cast<float>(max_h) / static_cast<float>(coarsest_stride)) * coarsest_stride);
+        max_w = static_cast<int>(ceil(static_cast<float>(max_w) / static_cast<float>(coarsest_stride)) * coarsest_stride);
+        std::cout << "max_w: " << max_w << " max_h: " << max_h << std::endl;
+        input_buffer.insert(input_buffer.end(), batch_size * channels * max_h * max_w, 0);
+        // flatten tensor and padding
+        for(int i = 0; i < lod_buffer.size(); ++i) {
+            float *input_buffer_ptr = input_buffer.data() + i * channels * max_h * max_w;
+            const float *lod_ptr = lod_buffer[i].data();
+            for(int c = 0; c < channels; ++c) {
+                for(int h = 0; h < resize_heights[i]; ++h) {
+                    memcpy(input_buffer_ptr, lod_ptr, resize_widths[i] * sizeof(float));
+                    lod_ptr += resize_widths[i];
+                    input_buffer_ptr += max_w;
+                }
+                input_buffer_ptr += (max_h - resize_heights[i]) * max_w;
+            }
+        }
+        // change resize w, h
+        for(int i = 0; i < batch_size; ++i){
+            resize_widths[i] = max_w;
+            resize_heights[i] = max_h;
+        }
+    }
+
+    void output_detection_result(const float* out_addr, const std::vector<std::vector<size_t>> &lod_vector, const std::vector<std::string> &imgs_batch){
+        for(int i = 0; i < lod_vector[0].size() - 1; ++i) {
+            DetectionResult detection_result;
+            detection_result.set_filename(imgs_batch[i]);
+            std::cout << imgs_batch[i] << ":" << std::endl;
+            for (int j = lod_vector[0][i]; j < lod_vector[0][i+1]; ++j) {
+                DetectionBox *box_ptr = detection_result.add_detection_boxes();
+                box_ptr->set_class_(static_cast<int>(round(out_addr[0 + j * 6])));
+                box_ptr->set_score(out_addr[1 + j * 6]);
+                box_ptr->set_left_top_x(out_addr[2 + j * 6]);
+                box_ptr->set_left_top_y(out_addr[3 + j * 6]);
+                box_ptr->set_right_bottom_x(out_addr[4 + j * 6]);
+                box_ptr->set_right_bottom_y(out_addr[5 + j * 6]);
+                printf("Class %d, score = %f, left top = [%f, %f], right bottom = [%f, %f]\n",
+                          static_cast<int>(round(out_addr[0 + j * 6])), out_addr[1 + j * 6], out_addr[2 + j * 6], 
+                                             out_addr[3 + j * 6], out_addr[4 + j * 6], out_addr[5 + j * 6]);    
+            }
+            printf("\n");
+            std::ofstream output(imgs_batch[i] + ".pb", std::ios::out | std::ios::trunc | std::ios::binary);
+            detection_result.SerializeToOstream(&output);
+            output.close();
+        }
+    }
+    
+    int DetectionPredictor::init(const std::string& conf) {
+        if (!_model_config.load_config(conf)) {
+            LOG(FATAL) << "Fail to load config file: [" << conf << "]";
+            return -1;
+        }
+        _preprocessor = PaddleSolution::create_processor(conf);
+        if (_preprocessor == nullptr) {
+            LOG(FATAL) << "Failed to create_processor";
+            return -1;
+        }
+
+        bool use_gpu = _model_config._use_gpu;
+        const auto& model_dir = _model_config._model_path;
+        const auto& model_filename = _model_config._model_file_name;
+        const auto& params_filename = _model_config._param_file_name;
+
+        // load paddle model file
+        if (_model_config._predictor_mode == "NATIVE") {
+            paddle::NativeConfig config;
+            auto prog_file = utils::path_join(model_dir, model_filename);
+            auto param_file = utils::path_join(model_dir, params_filename);
+            config.prog_file = prog_file;
+            config.param_file = param_file;
+            config.fraction_of_gpu_memory = 0;
+            config.use_gpu = use_gpu;
+            config.device = 0;
+            _main_predictor = paddle::CreatePaddlePredictor(config);
+        } else if (_model_config._predictor_mode == "ANALYSIS") {
+            paddle::AnalysisConfig config;
+            if (use_gpu) {
+                config.EnableUseGpu(100, 0);
+            }
+            auto prog_file = utils::path_join(model_dir, model_filename);
+            auto param_file = utils::path_join(model_dir, params_filename);
+            config.SetModel(prog_file, param_file);
+            config.SwitchUseFeedFetchOps(false);
+            config.SwitchSpecifyInputNames(true);
+            config.EnableMemoryOptim();                        
+            _main_predictor = paddle::CreatePaddlePredictor(config);
+        } else {
+            return -1;
+        }
+        return 0;
+
+    }
+
+    int DetectionPredictor::predict(const std::vector<std::string>& imgs) {
+        if (_model_config._predictor_mode == "NATIVE") {
+            return native_predict(imgs);
+        }
+        else if (_model_config._predictor_mode == "ANALYSIS") {
+            return analysis_predict(imgs);
+        }
+        return -1;
+    }
+
+    int DetectionPredictor::native_predict(const std::vector<std::string>& imgs) {
+        int config_batch_size = _model_config._batch_size;
+
+        int channels = _model_config._channels;
+        int eval_width = _model_config._resize[0];
+        int eval_height = _model_config._resize[1];
+        std::size_t total_size = imgs.size();
+        int default_batch_size = std::min(config_batch_size, (int)total_size);
+        int batch = total_size / default_batch_size + ((total_size % default_batch_size) != 0);
+        int batch_buffer_size = default_batch_size * channels * eval_width * eval_height;
+
+        auto& input_buffer = _buffer;
+        auto& imgs_batch = _imgs_batch;
+        float sr;
+    //    DetectionResultsContainer result_container;
+        for (int u = 0; u < batch; ++u) {
+            int batch_size = default_batch_size;
+            if (u == (batch - 1) && (total_size % default_batch_size)) {
+                batch_size = total_size % default_batch_size;
+            }
+
+            int real_buffer_size = batch_size * channels * eval_width * eval_height;
+            std::vector<paddle::PaddleTensor> feeds;
+            input_buffer.clear();
+            imgs_batch.clear();
+            for (int i = 0; i < batch_size; ++i) {
+                int idx = u * default_batch_size + i;
+                imgs_batch.push_back(imgs[idx]);
+            }
+            std::vector<int> ori_widths;
+            std::vector<int> ori_heights;
+            std::vector<int> resize_widths;
+            std::vector<int> resize_heights;
+            std::vector<float> scale_ratios;
+            ori_widths.resize(batch_size);
+            ori_heights.resize(batch_size);
+            resize_widths.resize(batch_size);
+            resize_heights.resize(batch_size);
+            scale_ratios.resize(batch_size);
+            std::vector<std::vector<float>> lod_buffer(batch_size);
+            if (!_preprocessor->batch_process(imgs_batch, lod_buffer, ori_widths.data(), ori_heights.data(),
+                                          resize_widths.data(), resize_heights.data(), scale_ratios.data())) {
+                return -1;
+            }
+            // flatten and padding 
+            padding_minibatch(lod_buffer, input_buffer, resize_heights, resize_widths, channels, _model_config._coarsest_stride);
+            paddle::PaddleTensor im_tensor, im_size_tensor, im_info_tensor;
+
+            im_tensor.name = "image";
+            im_tensor.shape = std::vector<int>({ batch_size, channels, resize_heights[0], resize_widths[0] });
+            im_tensor.data.Reset(input_buffer.data(), input_buffer.size() * sizeof(float));
+            im_tensor.dtype = paddle::PaddleDType::FLOAT32;
+ 
+            std::vector<float> image_infos;
+            for(int i = 0; i < batch_size; ++i) {
+                image_infos.push_back(resize_heights[i]);
+                image_infos.push_back(resize_widths[i]);
+                image_infos.push_back(scale_ratios[i]);
+            }
+            im_info_tensor.name = "info";
+            im_info_tensor.shape = std::vector<int>({batch_size, 3});
+            im_info_tensor.data.Reset(image_infos.data(), batch_size * 3 * sizeof(float));
+            im_info_tensor.dtype = paddle::PaddleDType::FLOAT32;
+            
+            std::vector<int> image_size;
+            for(int i = 0; i < batch_size; ++i) {
+                image_size.push_back(ori_heights[i]);
+                image_size.push_back(ori_widths[i]);
+            }
+
+           std::vector<float> image_size_f;
+           for(int i = 0; i < batch_size; ++i) {
+               image_size_f.push_back(ori_heights[i]);
+               image_size_f.push_back(ori_widths[i]);
+               image_size_f.push_back(1.0);
+           }
+           
+           int feeds_size = _model_config._feeds_size;
+           im_size_tensor.name = "im_size";
+           if(feeds_size == 2) {
+                im_size_tensor.shape = std::vector<int>({ batch_size, 2});
+                im_size_tensor.data.Reset(image_size.data(), batch_size * 2 * sizeof(int));
+                im_size_tensor.dtype = paddle::PaddleDType::INT32;
+           }
+           else if(feeds_size == 3) {
+                im_size_tensor.shape = std::vector<int>({ batch_size, 3});
+                im_size_tensor.data.Reset(image_size_f.data(), batch_size * 3 * sizeof(float));
+                im_size_tensor.dtype = paddle::PaddleDType::FLOAT32;
+           }
+           std::cout << "Feed size = " << feeds_size << std::endl;
+           feeds.push_back(im_tensor);
+           if(_model_config._feeds_size > 2) {
+                feeds.push_back(im_info_tensor);
+           }
+           feeds.push_back(im_size_tensor);
+           _outputs.clear();
+
+            auto t1 = std::chrono::high_resolution_clock::now();
+            if (!_main_predictor->Run(feeds, &_outputs, batch_size)) {
+                LOG(ERROR) << "Failed: NativePredictor->Run() return false at batch: " << u;
+                continue;
+            }
+            auto t2 = std::chrono::high_resolution_clock::now();
+            auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
+            std::cout << "runtime = " << duration << std::endl;
+            std::cout << "Number of outputs:"  << _outputs.size() << std::endl;
+            int out_num = 1;
+            // print shape of first output tensor for debugging
+            std::cout << "size of outputs[" << 0 << "]: (";
+            for (int j = 0; j < _outputs[0].shape.size(); ++j) {
+                out_num *= _outputs[0].shape[j];
+                std::cout << _outputs[0].shape[j] << ",";
+            }
+            std::cout << ")" << std::endl;
+       
+        //    const size_t nums = _outputs.front().data.length() / sizeof(float);
+        //    if (out_num % batch_size != 0 || out_num != nums) {
+        //        LOG(ERROR) << "outputs data size mismatch with shape size.";
+        //        return -1;
+        //    }
+            float* out_addr = (float *)(_outputs[0].data.data());
+            output_detection_result(out_addr, _outputs[0].lod, imgs_batch);
+        }
+        return 0;
+    }
+
+    int DetectionPredictor::analysis_predict(const std::vector<std::string>& imgs) {
+
+        int config_batch_size = _model_config._batch_size;
+        int channels = _model_config._channels;
+        int eval_width = _model_config._resize[0];
+        int eval_height = _model_config._resize[1];
+        auto total_size = imgs.size();
+        int default_batch_size = std::min(config_batch_size, (int)total_size);
+        int batch = total_size / default_batch_size + ((total_size % default_batch_size) != 0);
+        int batch_buffer_size = default_batch_size * channels * eval_width * eval_height;
+
+        auto& input_buffer = _buffer;
+        auto& imgs_batch = _imgs_batch;
+        //DetectionResultsContainer result_container;
+        for (int u = 0; u < batch; ++u) {
+            int batch_size = default_batch_size;
+            if (u == (batch - 1) && (total_size % default_batch_size)) {
+                batch_size = total_size % default_batch_size;
+            }
+
+            int real_buffer_size = batch_size * channels * eval_width * eval_height;
+            std::vector<paddle::PaddleTensor> feeds;
+            //input_buffer.resize(real_buffer_size);
+            input_buffer.clear();
+            imgs_batch.clear();
+            for (int i = 0; i < batch_size; ++i) {
+                int idx = u * default_batch_size + i;
+                imgs_batch.push_back(imgs[idx]);
+            }
+        
+            std::vector<int> ori_widths;
+            std::vector<int> ori_heights;
+            std::vector<int> resize_widths;
+            std::vector<int> resize_heights;
+            std::vector<float> scale_ratios;
+            ori_widths.resize(batch_size);
+            ori_heights.resize(batch_size);
+            resize_widths.resize(batch_size);
+            resize_heights.resize(batch_size);
+            scale_ratios.resize(batch_size);
+        
+            std::vector<std::vector<float>> lod_buffer(batch_size);
+            if (!_preprocessor->batch_process(imgs_batch, lod_buffer, ori_widths.data(), ori_heights.data(),
+                          resize_widths.data(), resize_heights.data(), scale_ratios.data())){
+                std::cout << "Failed to preprocess!" << std::endl;
+                return -1;
+            }
+
+            //flatten tensor
+            padding_minibatch(lod_buffer, input_buffer, resize_heights, resize_widths, channels, _model_config._coarsest_stride);
+
+            std::vector<std::string> input_names = _main_predictor->GetInputNames();
+            auto im_tensor = _main_predictor->GetInputTensor(input_names.front());
+            im_tensor->Reshape({ batch_size, channels, resize_heights[0], resize_widths[0] });
+            im_tensor->copy_from_cpu(input_buffer.data());
+ 
+            if(input_names.size() > 2){
+                std::vector<float> image_infos;
+                for(int i = 0; i < batch_size; ++i) {
+                    image_infos.push_back(resize_heights[i]);
+                    image_infos.push_back(resize_widths[i]);
+                    image_infos.push_back(scale_ratios[i]);
+                }        
+                auto im_info_tensor = _main_predictor->GetInputTensor(input_names[1]);
+                im_info_tensor->Reshape({batch_size, 3});
+                im_info_tensor->copy_from_cpu(image_infos.data());
+            }
+
+            std::vector<int> image_size;
+            for(int i = 0; i < batch_size; ++i) {
+                image_size.push_back(ori_heights[i]);
+                image_size.push_back(ori_widths[i]);
+            }
+            std::vector<float> image_size_f;
+            for(int i = 0; i < batch_size; ++i) {
+                image_size_f.push_back(static_cast<float>(ori_heights[i]));
+                image_size_f.push_back(static_cast<float>(ori_widths[i]));
+                image_size_f.push_back(1.0);
+            }
+             
+            auto im_size_tensor = _main_predictor->GetInputTensor(input_names.back());
+            if(input_names.size() > 2) {
+                im_size_tensor->Reshape({batch_size, 3});
+                im_size_tensor->copy_from_cpu(image_size_f.data());
+            }
+            else{
+                im_size_tensor->Reshape({batch_size, 2});
+                im_size_tensor->copy_from_cpu(image_size.data());
+            }
+        
+
+            auto t1 = std::chrono::high_resolution_clock::now();
+            _main_predictor->ZeroCopyRun();
+            auto t2 = std::chrono::high_resolution_clock::now();
+            auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
+            std::cout << "runtime = " << duration << std::endl;
+
+            auto output_names = _main_predictor->GetOutputNames();
+            auto output_t = _main_predictor->GetOutputTensor(output_names[0]);
+            std::vector<float> out_data;
+            std::vector<int> output_shape = output_t->shape();
+
+            int out_num = 1;
+            std::cout << "size of outputs[" << 0 << "]: (";
+            for (int j = 0; j < output_shape.size(); ++j) {
+                out_num *= output_shape[j];
+                std::cout << output_shape[j] << ",";
+            }
+            std::cout << ")" << std::endl;
+
+            out_data.resize(out_num);
+            output_t->copy_to_cpu(out_data.data());
+
+            float* out_addr = (float *)(out_data.data());
+            auto lod_vector = output_t->lod();
+            output_detection_result(out_addr, lod_vector, imgs_batch);            
+        }
+        return 0;
+    }
+}
--- a/PaddleCV/PaddleDetection/inference/predictor/detection_predictor.h
+++ b/PaddleCV/PaddleDetection/inference/predictor/detection_predictor.h
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include <memory>
+#include <string>
+#include <vector>
+#include <thread>
+#include <chrono>
+#include <algorithm>
+#include <glog/logging.h>
+#include <yaml-cpp/yaml.h>
+#include <opencv2/opencv.hpp>
+#include <paddle_inference_api.h>
+
+#include <utils/conf_parser.h>
+#include <utils/utils.h>
+#include <preprocessor/preprocessor.h>
+
+namespace PaddleSolution {
+    class DetectionPredictor {
+    public:
+        // init a predictor with a yaml config file
+        int init(const std::string& conf);
+        // predict api
+        int predict(const std::vector<std::string>& imgs);
+
+    private:
+        int native_predict(const std::vector<std::string>& imgs);
+        int analysis_predict(const std::vector<std::string>& imgs);
+    private:
+        std::vector<float> _buffer;
+        std::vector<std::string> _imgs_batch;
+        std::vector<paddle::PaddleTensor> _outputs;
+
+        PaddleSolution::PaddleModelConfigPaser _model_config;
+        std::shared_ptr<PaddleSolution::ImagePreProcessor> _preprocessor;
+        std::unique_ptr<paddle::PaddlePredictor> _main_predictor;
+    };
+}
--- a/PaddleCV/PaddleDetection/inference/preprocessor/preprocessor.cpp
+++ b/PaddleCV/PaddleDetection/inference/preprocessor/preprocessor.cpp
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <glog/logging.h>
+
+#include "preprocessor.h"
+#include "preprocessor_detection.h"
+
+namespace PaddleSolution {
+
+    std::shared_ptr<ImagePreProcessor> create_processor(const std::string& conf_file) {
+
+        auto config = std::make_shared<PaddleSolution::PaddleModelConfigPaser>();
+        if (!config->load_config(conf_file)) {
+            LOG(FATAL) << "fail to laod conf file [" << conf_file << "]";
+            return nullptr;
+        }
+
+        if (config->_pre_processor == "DetectionPreProcessor") {
+            auto p = std::make_shared<DetectionPreProcessor>();
+            if (!p->init(config)) {
+                return nullptr;
+            }
+            return p;
+        }
+	
+
+        LOG(FATAL) << "unknown processor_name [" << config->_pre_processor << "]";
+
+        return nullptr;
+    }
+}
--- a/PaddleCV/PaddleDetection/inference/preprocessor/preprocessor.h
+++ b/PaddleCV/PaddleDetection/inference/preprocessor/preprocessor.h
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+#include <vector>
+#include <string>
+#include <memory>
+
+#include <opencv2/core/core.hpp>
+#include <opencv2/imgproc/imgproc.hpp>
+#include <opencv2/highgui/highgui.hpp>
+
+#include "utils/conf_parser.h"
+
+namespace  PaddleSolution {
+
+class ImagePreProcessor {
+protected:
+    ImagePreProcessor() {};
+    
+public:
+    virtual ~ImagePreProcessor() {}
+
+    virtual bool single_process(const std::string& fname, float* data, int* ori_w, int* ori_h) {
+        return true;
+    }
+
+    virtual bool batch_process(const std::vector<std::string>& imgs, float* data, int* ori_w, int* ori_h) {
+        return true;
+    }
+
+    virtual bool single_process(const std::string& fname, float* data) {
+        return true;
+    }
+    
+    virtual bool batch_process(const std::vector<std::string>& imgs, float* data) {
+        return true;
+    }
+    
+    virtual bool single_process(const std::string& fname, std::vector<float> &data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio) {
+	return true;
+    }
+
+    virtual bool batch_process(const std::vector<std::string>& imgs, std::vector<std::vector<float>> &data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio) {
+	return true;
+    }
+
+}; // end of class ImagePreProcessor
+
+std::shared_ptr<ImagePreProcessor> create_processor(const std::string &config_file);
+
+} // end of namespace paddle_solution
+
--- a/PaddleCV/PaddleDetection/inference/preprocessor/preprocessor_detection.cpp
+++ b/PaddleCV/PaddleDetection/inference/preprocessor/preprocessor_detection.cpp
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <thread>
+#include <mutex>
+
+#include <glog/logging.h>
+
+#include "preprocessor_detection.h"
+#include "utils/utils.h"
+
+namespace PaddleSolution {
+    bool DetectionPreProcessor::single_process(const std::string& fname, std::vector<float> &vec_data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio) {
+        cv::Mat im1 = cv::imread(fname, -1);
+        cv::Mat im;
+        if(_config->_feeds_size == 3) { // faster rcnn
+            im1.convertTo(im, CV_32FC3, 1/255.0);
+        }
+        else if(_config->_feeds_size == 2){ //yolo v3
+            im = im1;
+        }
+        if (im.data == nullptr || im.empty()) {
+            LOG(ERROR) << "Failed to open image: " << fname;
+            return false;
+        }
+        
+        int channels = im.channels();
+        if (channels == 1) {
+            cv::cvtColor(im, im, cv::COLOR_GRAY2BGR);
+        }
+        channels = im.channels();
+        if (channels != 3 && channels != 4) {
+            LOG(ERROR) << "Only support rgb(gray) and rgba image.";
+            return false;
+        }
+        *ori_w = im.cols;
+        *ori_h = im.rows;
+        cv::cvtColor(im, im, cv::COLOR_BGR2RGB);      
+        //channels = im.channels();
+
+        //resize
+        int rw = im.cols;
+        int rh = im.rows;
+        float im_scale_ratio;
+        utils::scaling(_config->_resize_type, rw, rh, _config->_resize[0], _config->_resize[1], _config->_target_short_size, _config->_resize_max_size, im_scale_ratio);
+        cv::Size resize_size(rw, rh);
+        *resize_w = rw;
+        *resize_h = rh;
+        *scale_ratio = im_scale_ratio;
+        if (*ori_h != rh || *ori_w != rw) {
+            cv::Mat im_temp;
+            if(_config->_resize_type == utils::SCALE_TYPE::UNPADDING) {
+                cv::resize(im, im_temp, resize_size, 0, 0, cv::INTER_LINEAR);
+            }
+            else if(_config->_resize_type == utils::SCALE_TYPE::RANGE_SCALING) {
+                    cv::resize(im, im_temp, cv::Size(), im_scale_ratio, im_scale_ratio, cv::INTER_LINEAR);
+            }
+            im = im_temp;
+        }
+
+        vec_data.resize(channels * rw * rh);
+        float *data = vec_data.data();
+
+        float* pmean = _config->_mean.data();
+        float* pscale = _config->_std.data();
+        for (int h = 0; h < rh; ++h) {
+            const uchar* uptr = im.ptr<uchar>(h);
+            const float* fptr = im.ptr<float>(h);
+            int im_index = 0;
+            for (int w = 0; w < rw; ++w) {
+                for (int c = 0; c < channels; ++c) {
+                    int top_index = (c * rh + h) * rw + w;
+                    float pixel;// = static_cast<float>(fptr[im_index]);// / 255.0;
+                    if(_config->_feeds_size == 2){ //yolo v3
+                        pixel = static_cast<float>(uptr[im_index++]) / 255.0;
+                    }
+                    else if(_config->_feeds_size == 3){
+                        pixel = fptr[im_index++];
+                    }
+                    pixel = (pixel - pmean[c]) / pscale[c];
+                    data[top_index] = pixel;
+                }
+            }
+        }
+        return true;
+    }
+
+    bool DetectionPreProcessor::batch_process(const std::vector<std::string>& imgs, std::vector<std::vector<float>> &data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio) {
+        auto ic = _config->_channels;
+        auto iw = _config->_resize[0];
+        auto ih = _config->_resize[1];
+        std::vector<std::thread> threads;
+        for (int i = 0; i < imgs.size(); ++i) {
+            std::string path = imgs[i];
+            int* width = &ori_w[i];
+            int* height = &ori_h[i];
+            int* resize_width = &resize_w[i];
+            int* resize_height = &resize_h[i];
+            float* sr = &scale_ratio[i];
+            threads.emplace_back([this, &data, i, path, width, height, resize_width, resize_height, sr] {
+                std::vector<float> buffer;
+                single_process(path, buffer, width, height, resize_width, resize_height, sr);
+                data[i] = buffer;
+                });
+        }
+        for (auto& t : threads) {
+            if (t.joinable()) {
+                t.join();
+            }
+        }
+        return true;
+    }
+
+    bool DetectionPreProcessor::init(std::shared_ptr<PaddleSolution::PaddleModelConfigPaser> config) {
+        _config = config;
+        return true;
+    }
+
+}
--- a/PaddleCV/PaddleDetection/inference/preprocessor/preprocessor_detection.h
+++ b/PaddleCV/PaddleDetection/inference/preprocessor/preprocessor_detection.h
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include "preprocessor.h"
+
+namespace PaddleSolution {
+
+    class DetectionPreProcessor : public ImagePreProcessor {
+
+    public:
+        DetectionPreProcessor() : _config(nullptr) {
+        };
+
+        bool init(std::shared_ptr<PaddleSolution::PaddleModelConfigPaser> config);
+         
+        bool single_process(const std::string& fname, std::vector<float> &data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio);
+
+        bool batch_process(const std::vector<std::string>& imgs, std::vector<std::vector<float>> &data, int* ori_w, int* ori_h, int* resize_w, int* resize_h, float* scale_ratio);
+    private:
+        std::shared_ptr<PaddleSolution::PaddleModelConfigPaser> _config;
+    };
+
+}
--- a/PaddleCV/PaddleDetection/inference/tools/detection_result_pb2.py
+++ b/PaddleCV/PaddleDetection/inference/tools/detection_result_pb2.py
+# Generated by the protocol buffer compiler.  DO NOT EDIT!
+# source: detection_result.proto
+
+import sys
+_b = sys.version_info[0] < 3 and (lambda x: x) or (lambda x: x.encode('latin1'))
+from google.protobuf import descriptor as _descriptor
+from google.protobuf import message as _message
+from google.protobuf import reflection as _reflection
+from google.protobuf import symbol_database as _symbol_database
+from google.protobuf import descriptor_pb2
+# @@protoc_insertion_point(imports)
+
+_sym_db = _symbol_database.Default()
+
+DESCRIPTOR = _descriptor.FileDescriptor(
+    name='detection_result.proto',
+    package='PaddleSolution',
+    syntax='proto2',
+    serialized_pb=_b(
+        '\n\x16\x64\x65tection_result.proto\x12\x0ePaddleSolution\"\x84\x01\n\x0c\x44\x65tectionBox\x12\r\n\x05\x63lass\x18\x01 \x01(\x05\x12\r\n\x05score\x18\x02 \x01(\x02\x12\x12\n\nleft_top_x\x18\x03 \x01(\x02\x12\x12\n\nleft_top_y\x18\x04 \x01(\x02\x12\x16\n\x0eright_bottom_x\x18\x05 \x01(\x02\x12\x16\n\x0eright_bottom_y\x18\x06 \x01(\x02\"Z\n\x0f\x44\x65tectionResult\x12\x10\n\x08\x66ilename\x18\x01 \x01(\t\x12\x35\n\x0f\x64\x65tection_boxes\x18\x02 \x03(\x0b\x32\x1c.PaddleSolution.DetectionBox'
+    ))
+_sym_db.RegisterFileDescriptor(DESCRIPTOR)
+
+_DETECTIONBOX = _descriptor.Descriptor(
+    name='DetectionBox',
+    full_name='PaddleSolution.DetectionBox',
+    filename=None,
+    file=DESCRIPTOR,
+    containing_type=None,
+    fields=[
+        _descriptor.FieldDescriptor(
+            name='class',
+            full_name='PaddleSolution.DetectionBox.class',
+            index=0,
+            number=1,
+            type=5,
+            cpp_type=1,
+            label=1,
+            has_default_value=False,
+            default_value=0,
+            message_type=None,
+            enum_type=None,
+            containing_type=None,
+            is_extension=False,
+            extension_scope=None,
+            options=None),
+        _descriptor.FieldDescriptor(
+            name='score',
+            full_name='PaddleSolution.DetectionBox.score',
+            index=1,
+            number=2,
+            type=2,
+            cpp_type=6,
+            label=1,
+            has_default_value=False,
+            default_value=float(0),
+            message_type=None,
+            enum_type=None,
+            containing_type=None,
+            is_extension=False,
+            extension_scope=None,
+            options=None),
+        _descriptor.FieldDescriptor(
+            name='left_top_x',
+            full_name='PaddleSolution.DetectionBox.left_top_x',
+            index=2,
+            number=3,
+            type=2,
+            cpp_type=6,
+            label=1,
+            has_default_value=False,
+            default_value=float(0),
+            message_type=None,
+            enum_type=None,
+            containing_type=None,
+            is_extension=False,
+            extension_scope=None,
+            options=None),
+        _descriptor.FieldDescriptor(
+            name='left_top_y',
+            full_name='PaddleSolution.DetectionBox.left_top_y',
+            index=3,
+            number=4,
+            type=2,
+            cpp_type=6,
+            label=1,
+            has_default_value=False,
+            default_value=float(0),
+            message_type=None,
+            enum_type=None,
+            containing_type=None,
+            is_extension=False,
+            extension_scope=None,
+            options=None),
+        _descriptor.FieldDescriptor(
+            name='right_bottom_x',
+            full_name='PaddleSolution.DetectionBox.right_bottom_x',
+            index=4,
+            number=5,
+            type=2,
+            cpp_type=6,
+            label=1,
+            has_default_value=False,
+            default_value=float(0),
+            message_type=None,
+            enum_type=None,
+            containing_type=None,
+            is_extension=False,
+            extension_scope=None,
+            options=None),
+        _descriptor.FieldDescriptor(
+            name='right_bottom_y',
+            full_name='PaddleSolution.DetectionBox.right_bottom_y',
+            index=5,
+            number=6,
+            type=2,
+            cpp_type=6,
+            label=1,
+            has_default_value=False,
+            default_value=float(0),
+            message_type=None,
+            enum_type=None,
+            containing_type=None,
+            is_extension=False,
+            extension_scope=None,
+            options=None),
+    ],
+    extensions=[],
+    nested_types=[],
+    enum_types=[],
+    options=None,
+    is_extendable=False,
+    syntax='proto2',
+    extension_ranges=[],
+    oneofs=[],
+    serialized_start=43,
+    serialized_end=175, )
+
+_DETECTIONRESULT = _descriptor.Descriptor(
+    name='DetectionResult',
+    full_name='PaddleSolution.DetectionResult',
+    filename=None,
+    file=DESCRIPTOR,
+    containing_type=None,
+    fields=[
+        _descriptor.FieldDescriptor(
+            name='filename',
+            full_name='PaddleSolution.DetectionResult.filename',
+            index=0,
+            number=1,
+            type=9,
+            cpp_type=9,
+            label=1,
+            has_default_value=False,
+            default_value=_b("").decode('utf-8'),
+            message_type=None,
+            enum_type=None,
+            containing_type=None,
+            is_extension=False,
+            extension_scope=None,
+            options=None),
+        _descriptor.FieldDescriptor(
+            name='detection_boxes',
+            full_name='PaddleSolution.DetectionResult.detection_boxes',
+            index=1,
+            number=2,
+            type=11,
+            cpp_type=10,
+            label=3,
+            has_default_value=False,
+            default_value=[],
+            message_type=None,
+            enum_type=None,
+            containing_type=None,
+            is_extension=False,
+            extension_scope=None,
+            options=None),
+    ],
+    extensions=[],
+    nested_types=[],
+    enum_types=[],
+    options=None,
+    is_extendable=False,
+    syntax='proto2',
+    extension_ranges=[],
+    oneofs=[],
+    serialized_start=177,
+    serialized_end=267, )
+
+_DETECTIONRESULT.fields_by_name['detection_boxes'].message_type = _DETECTIONBOX
+DESCRIPTOR.message_types_by_name['DetectionBox'] = _DETECTIONBOX
+DESCRIPTOR.message_types_by_name['DetectionResult'] = _DETECTIONRESULT
+
+DetectionBox = _reflection.GeneratedProtocolMessageType(
+    'DetectionBox',
+    (_message.Message, ),
+    dict(
+        DESCRIPTOR=_DETECTIONBOX,
+        __module__='detection_result_pb2'
+        # @@protoc_insertion_point(class_scope:PaddleSolution.DetectionBox)
+    ))
+_sym_db.RegisterMessage(DetectionBox)
+
+DetectionResult = _reflection.GeneratedProtocolMessageType(
+    'DetectionResult',
+    (_message.Message, ),
+    dict(
+        DESCRIPTOR=_DETECTIONRESULT,
+        __module__='detection_result_pb2'
+        # @@protoc_insertion_point(class_scope:PaddleSolution.DetectionResult)
+    ))
+_sym_db.RegisterMessage(DetectionResult)
+
+# @@protoc_insertion_point(module_scope)
--- a/PaddleCV/PaddleDetection/inference/tools/vis.py
+++ b/PaddleCV/PaddleDetection/inference/tools/vis.py
--- a/PaddleCV/PaddleDetection/inference/utils/conf_parser.h
+++ b/PaddleCV/PaddleDetection/inference/utils/conf_parser.h
--- a/PaddleCV/PaddleDetection/inference/utils/detection_result.pb.cc
+++ b/PaddleCV/PaddleDetection/inference/utils/detection_result.pb.cc
--- a/PaddleCV/PaddleDetection/inference/utils/detection_result.pb.h
+++ b/PaddleCV/PaddleDetection/inference/utils/detection_result.pb.h
--- a/PaddleCV/PaddleDetection/inference/utils/detection_result.proto
+++ b/PaddleCV/PaddleDetection/inference/utils/detection_result.proto
--- a/PaddleCV/PaddleDetection/inference/utils/utils.h
+++ b/PaddleCV/PaddleDetection/inference/utils/utils.h
--- a/PaddleCV/PaddleDetection/ppdet/core/workspace.py
+++ b/PaddleCV/PaddleDetection/ppdet/core/workspace.py
--- a/PaddleCV/PaddleDetection/ppdet/data/data_feed.py
+++ b/PaddleCV/PaddleDetection/ppdet/data/data_feed.py
--- a/PaddleCV/PaddleDetection/ppdet/data/reader.py
+++ b/PaddleCV/PaddleDetection/ppdet/data/reader.py
--- a/PaddleCV/PaddleDetection/ppdet/data/source/__init__.py
+++ b/PaddleCV/PaddleDetection/ppdet/data/source/__init__.py
--- a/PaddleCV/PaddleDetection/ppdet/data/source/class_aware_sampling_roidb_source.py
+++ b/PaddleCV/PaddleDetection/ppdet/data/source/class_aware_sampling_roidb_source.py
--- a/PaddleCV/PaddleDetection/ppdet/data/source/coco_loader.py
+++ b/PaddleCV/PaddleDetection/ppdet/data/source/coco_loader.py
@@ -101,7 +101,8 @@ def load(anno_path, sample_num=-1, with_background=True):
            gt_class[i][0] = catid2clsid[catid]
            gt_bbox[i, :] = box['clean_bbox']
            is_crowd[i][0] = box['iscrowd']
-            gt_poly[i] = box['segmentation']
+            if 'segmentation' in box:
+                gt_poly[i] = box['segmentation']

        coco_rec = {
            'im_file': im_fname,

--- a/PaddleCV/PaddleDetection/ppdet/data/tools/labelme2coco.py
+++ b/PaddleCV/PaddleDetection/ppdet/data/tools/labelme2coco.py
--- a/PaddleCV/PaddleDetection/ppdet/data/transform/__init__.py
+++ b/PaddleCV/PaddleDetection/ppdet/data/transform/__init__.py
--- a/PaddleCV/PaddleDetection/ppdet/data/transform/arrange_sample.py
+++ b/PaddleCV/PaddleDetection/ppdet/data/transform/arrange_sample.py
--- a/PaddleCV/PaddleDetection/ppdet/data/transform/operators.py
+++ b/PaddleCV/PaddleDetection/ppdet/data/transform/operators.py
--- a/PaddleCV/PaddleDetection/ppdet/data/transform/post_map.py
+++ b/PaddleCV/PaddleDetection/ppdet/data/transform/post_map.py
--- a/PaddleCV/PaddleDetection/ppdet/modeling/architectures/cascade_mask_rcnn.py
+++ b/PaddleCV/PaddleDetection/ppdet/modeling/architectures/cascade_mask_rcnn.py
--- a/PaddleCV/PaddleDetection/ppdet/modeling/architectures/cascade_rcnn.py
+++ b/PaddleCV/PaddleDetection/ppdet/modeling/architectures/cascade_rcnn.py
--- a/PaddleCV/PaddleDetection/ppdet/modeling/architectures/faster_rcnn.py
+++ b/PaddleCV/PaddleDetection/ppdet/modeling/architectures/faster_rcnn.py
--- a/PaddleCV/PaddleDetection/ppdet/modeling/architectures/mask_rcnn.py
+++ b/PaddleCV/PaddleDetection/ppdet/modeling/architectures/mask_rcnn.py
--- a/PaddleCV/PaddleDetection/ppdet/modeling/backbones/fpn.py
+++ b/PaddleCV/PaddleDetection/ppdet/modeling/backbones/fpn.py
--- a/PaddleCV/PaddleDetection/ppdet/modeling/model_input.py
+++ b/PaddleCV/PaddleDetection/ppdet/modeling/model_input.py
--- a/PaddleCV/PaddleDetection/ppdet/modeling/roi_heads/bbox_head.py
+++ b/PaddleCV/PaddleDetection/ppdet/modeling/roi_heads/bbox_head.py
--- a/PaddleCV/PaddleDetection/ppdet/modeling/roi_heads/cascade_head.py
+++ b/PaddleCV/PaddleDetection/ppdet/modeling/roi_heads/cascade_head.py
--- a/PaddleCV/PaddleDetection/ppdet/utils/checkpoint.py
+++ b/PaddleCV/PaddleDetection/ppdet/utils/checkpoint.py
--- a/PaddleCV/PaddleDetection/ppdet/utils/cli.py
+++ b/PaddleCV/PaddleDetection/ppdet/utils/cli.py
--- a/PaddleCV/PaddleDetection/ppdet/utils/coco_eval.py
+++ b/PaddleCV/PaddleDetection/ppdet/utils/coco_eval.py
--- a/PaddleCV/PaddleDetection/ppdet/utils/download.py
+++ b/PaddleCV/PaddleDetection/ppdet/utils/download.py
--- a/PaddleCV/PaddleDetection/ppdet/utils/eval_utils.py
+++ b/PaddleCV/PaddleDetection/ppdet/utils/eval_utils.py
--- a/PaddleCV/PaddleDetection/ppdet/utils/post_process.py
+++ b/PaddleCV/PaddleDetection/ppdet/utils/post_process.py
--- a/PaddleCV/PaddleDetection/slim/distillation/compress.py
+++ b/PaddleCV/PaddleDetection/slim/distillation/compress.py
--- a/PaddleCV/PaddleDetection/slim/eval.py
+++ b/PaddleCV/PaddleDetection/slim/eval.py
--- a/PaddleCV/PaddleDetection/slim/prune/compress.py
+++ b/PaddleCV/PaddleDetection/slim/prune/compress.py
--- a/PaddleCV/PaddleDetection/slim/quantization/README.md
+++ b/PaddleCV/PaddleDetection/slim/quantization/README.md
--- a/PaddleCV/PaddleDetection/slim/quantization/compress.py
+++ b/PaddleCV/PaddleDetection/slim/quantization/compress.py
--- a/PaddleCV/PaddleDetection/slim/quantization/freeze.py
+++ b/PaddleCV/PaddleDetection/slim/quantization/freeze.py
--- a/PaddleCV/PaddleDetection/tools/configure.py
+++ b/PaddleCV/PaddleDetection/tools/configure.py
--- a/PaddleCV/PaddleDetection/tools/eval.py
+++ b/PaddleCV/PaddleDetection/tools/eval.py
--- a/PaddleCV/PaddleDetection/tools/export_model.py
+++ b/PaddleCV/PaddleDetection/tools/export_model.py
--- a/PaddleCV/PaddleDetection/tools/face_eval.py
+++ b/PaddleCV/PaddleDetection/tools/face_eval.py
--- a/PaddleCV/PaddleDetection/tools/infer.py
+++ b/PaddleCV/PaddleDetection/tools/infer.py
--- a/PaddleCV/PaddleDetection/tools/train.py
+++ b/PaddleCV/PaddleDetection/tools/train.py