Merge branch 'master' of https://github.com/PaddlePaddle/PaddleDetection into add_ppyolo_mbv3

d95d69b6 · dengkaipeng · 16583703 · 4bb142cf · 16583703 · d95d69b6
34 changed file
--- a/README.md
+++ b/README.md
-English | [简体中文](README_cn.md)
-Documentation:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
-# PaddleDetection
-PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle, which
-aims to help developers in the whole development of training models, optimizing performance and
-inference speed, and deploying models. PaddleDetection provides varied object detection architectures
-in modular design, and wealthy data augmentation methods, network components, loss functions, etc.
-PaddleDetection supported practical projects such as industrial quality inspection, remote sensing
-image object detection, and automatic inspection with its practical features such as model compression
-and multi-platform deployment.
-[PP-YOLO](https://arxiv.org/abs/2007.12099), which is faster and has higer performance than YOLOv4,
-has been released, it reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single
-Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details.
-**Now all models in PaddleDetection require PaddlePaddle version 1.8 or higher, or suitable develop version.**
-<div align="center">
-  <img src="docs/images/000000570688.jpg" />
-</div>
-## Introduction
-Features:
- Rich models:
-  PaddleDetection provides rich of models, including 100+ pre-trained models
-such as object detection, instance segmentation, face detection etc. It covers
-the champion models, the practical detection models for cloud and edge device.
- Production Ready:
-  Key operations are implemented in C++ and CUDA, together with PaddlePaddle's
-highly efficient inference engine, enables easy deployment in server environments.
- Highly Flexible:
-  Components are designed to be modular. Model architectures, as well as data
-preprocess pipelines, can be easily customized with simple configuration
-changes.
- Performance Optimized:
-  With the help of the underlying PaddlePaddle framework, faster training and
-reduced GPU memory footprint is achieved. Notably, YOLOv3 training is
-much faster compared to other frameworks. Another example is Mask-RCNN
-(ResNet50), we managed to fit up to 4 images per GPU (Tesla V100 16GB) during
-multi-GPU training.
-Supported Architectures:
-|                     | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet |  HRNet | Res2Net |
-| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: |:------:|:-----:  |
-| Faster R-CNN        |   ✓    |                             ✓ |     x      |   ✓   |     ✗     |   ✗    |  ✗      |
-| Faster R-CNN + FPN  |   ✓    |                             ✓ |     ✓      |   ✓   |     ✗     |   ✓    |  ✓      |
-| Mask R-CNN          |   ✓    |                             ✓ |     x      |   ✓   |     ✗     |   ✗    |  ✗      |
-| Mask R-CNN + FPN    |   ✓    |                             ✓ |     ✓      |   ✓   |     ✗     |   ✗    |  ✓      |
-| Cascade Faster-RCNN |   ✓    |                             ✓ |     ✓      |   ✗   |     ✗     |   ✗    |  ✗      |
-| Cascade Mask-RCNN   |   ✓    |                             ✗ |     ✗      |   ✓   |     ✗     |   ✗    |  ✗      |
-| Libra R-CNN         |   ✗    |                             ✓ |     ✗      |   ✗   |     ✗     |   ✗    |  ✗      |
-| RetinaNet           |   ✓    |                             ✗ |     ✗      |   ✗   |     ✗     |   ✗    |  ✗      |
-| YOLOv3              |   ✓    |                             ✓ |     ✗      |   ✗   |     ✓     |   ✗    |  ✗      |
-| SSD                 |   ✗    |                             ✗ |     ✗      |   ✗   |     ✓     |   ✗    |  ✗      |
-| BlazeFace           |   ✗    |                             ✗ |     ✗      |   ✗   |     ✗     |   ✗    |  ✗      |
-| Faceboxes           |   ✗    |                             ✗ |     ✗      |   ✗   |     ✗     |   ✗    |  ✗      |
-<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost.
-**NOTE:** ✓ for config file and pretrain model provided in [Model Zoo](docs/MODEL_ZOO.md), ✗ for not provided but is supported generally.
-More models:
- EfficientDet
- FCOS
- CornerNet-Squeeze
- YOLOv4
- PP-YOLO
-More Backbones:
- DarkNet
- VGG
- GCNet
- CBNet
-Advanced Features:
- [x] **Synchronized Batch Norm**
- [x] **Group Norm**
- [x] **Modulated Deformable Convolution**
- [x] **Deformable PSRoI Pooling**
- [x] **Non-local and GCNet**
-**NOTE:** Synchronized batch normalization can only be used on multiple GPU devices, can not be used on CPU devices or single GPU device.
-The following is the relationship between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones.
-<div align="center">
-  <img src="docs/images/map_fps.png" width=800 />
-</div>
-**NOTE:**
- `CBResNet` stands for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3% in PaddleDetection models
- `Cascade-Faster-RCNN` stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8%
- The enhanced `YOLOv3-ResNet50vd-DCN` is 10.6 absolute percentage points higher than paper on COCO mAP, and inference speed is nearly 70% faster than the darknet framework
- All these models can be get in [Model Zoo](#Model-Zoo)
-The following is the relationship between COCO mAP and FPS on Tesla V100 of SOTA object detecters and PP-YOLO, which is faster and has better performance than YOLOv4, and reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details.
-<div align="center">
-  <img src="docs/images/ppyolo_map_fps.png" width=600 />
-</div>
-## Tutorials
-### Get Started
- [Installation guide](docs/tutorials/INSTALL.md)
- [Quick start on small dataset](docs/tutorials/QUICK_STARTED.md)
- [Train/Evaluation/Inference](docs/tutorials/GETTING_STARTED.md)
- [How to train a custom dataset](docs/tutorials/Custom_DataSet.md)
- [FAQ](docs/FAQ.md)
-### Advanced Tutorial
- [Guide to preprocess pipeline and dataset definition](docs/advanced_tutorials/READER.md)
- [Models technical](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- [Transfer learning document](docs/advanced_tutorials/TRANSFER_LEARNING.md)
- [Parameter configuration](docs/advanced_tutorials/config_doc):
-  - [Introduction to the configuration workflow](docs/advanced_tutorials/config_doc/CONFIG.md)
-  - [Parameter configuration for RCNN model](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md)
- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
- [Model compression](slim)
-    - [Model compression benchmark](slim)
-    - [Quantization](slim/quantization)
-    - [Model pruning](slim/prune)
-    - [Model distillation](slim/distillation)
-    - [Neural Architecture Search](slim/nas)
- [Deployment](deploy)
-    - [Export model for inference](docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
-    - [Python inference](deploy/python)
-    - [C++ inference](deploy/cpp)
-    - [Inference benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md)
-## Model Zoo
- Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md).
- [Mobile models](configs/mobile/README.md)
- [Anchor free models](configs/anchor_free/README.md)
- [Face detection models](docs/featured_model/FACE_DETECTION_en.md)
- [Pretrained models for pedestrian detection](docs/featured_model/CONTRIB.md)
- [Pretrained models for vehicle detection](docs/featured_model/CONTRIB.md)
- [YOLOv3 enhanced model](docs/featured_model/YOLOv3_ENHANCEMENT.md): Compared to MAP of 33.0% in paper, enhanced YOLOv3 reaches the MAP of 43.6%, and inference speed is improved as well
- [PP-YOLO](configs/ppyolo/README.md): PP-YOLO reeached mAP as 45.3% on COCO dataset，and 72.9 FPS on single Tesla V100
- [Objects365 2019 Challenge champion model](docs/featured_model/champion_model/CACascadeRCNN.md)
- [Best single model of Open Images 2019-Object Detction](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
- [Practical Server-side detection method](configs/rcnn_enhance/README_en.md): Inference speed on single V100 GPU can reach 20FPS when COCO mAP is 47.8%.
- [Large-scale practical object detection models](docs/featured_model/LARGE_SCALE_DET_MODEL_en.md): Large-scale practical server-side detection pretrained models with 676 categories are provided for most application scenarios, which can be used not only for direct inference but also finetuning on other datasets.
-## License
-PaddleDetection is released under the [Apache 2.0 license](LICENSE).
-## Updates
-v0.4.0 was released at `05/2020`, add PP-YOLO, TTFNet, HTC, ACFPN, etc. And add BlaceFace face landmark detection model, add a series of optimized SSDLite models on mobile side, add data augmentations GridMask and RandomErasing, add Matrix NMS and EMA training, and improved ease of use, fix many known bugs, etc.
-Please refer to [版本更新文档](docs/CHANGELOG.md) for details.
-## Contributing
-Contributions are highly welcomed and we would really appreciate your feedback!!
--- a/README.md
+++ b/README.md
+README_cn.md
\ No newline at end of file
--- a/README_cn.md
+++ b/README_cn.md
-简体中文 | [English](README.md)
+简体中文 | [English](README_en.md)
 文档：[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)

--- a/README_en.md
+++ b/README_en.md
+English | [简体中文](README_cn.md)
+Documentation:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
+# PaddleDetection
+PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle, which
+aims to help developers in the whole development of training models, optimizing performance and
+inference speed, and deploying models. PaddleDetection provides varied object detection architectures
+in modular design, and wealthy data augmentation methods, network components, loss functions, etc.
+PaddleDetection supported practical projects such as industrial quality inspection, remote sensing
+image object detection, and automatic inspection with its practical features such as model compression
+and multi-platform deployment.
+[PP-YOLO](https://arxiv.org/abs/2007.12099), which is faster and has higer performance than YOLOv4,
+has been released, it reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single
+Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details.
+**Now all models in PaddleDetection require PaddlePaddle version 1.8 or higher, or suitable develop version.**
+<div align="center">
+  <img src="docs/images/000000570688.jpg" />
+</div>
+## Introduction
+Features:
+- Rich models:
+  PaddleDetection provides rich of models, including 100+ pre-trained models
+such as object detection, instance segmentation, face detection etc. It covers
+the champion models, the practical detection models for cloud and edge device.
+- Production Ready:
+  Key operations are implemented in C++ and CUDA, together with PaddlePaddle's
+highly efficient inference engine, enables easy deployment in server environments.
+- Highly Flexible:
+  Components are designed to be modular. Model architectures, as well as data
+preprocess pipelines, can be easily customized with simple configuration
+changes.
+- Performance Optimized:
+  With the help of the underlying PaddlePaddle framework, faster training and
+reduced GPU memory footprint is achieved. Notably, YOLOv3 training is
+much faster compared to other frameworks. Another example is Mask-RCNN
+(ResNet50), we managed to fit up to 4 images per GPU (Tesla V100 16GB) during
+multi-GPU training.
+Supported Architectures:
+|                     | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet |  HRNet | Res2Net |
+| ------------------- | :----: | ----------------------------: | :--------: | :---: | :-------: |:------:|:-----:  |
+| Faster R-CNN        |   ✓    |                             ✓ |     x      |   ✓   |     ✗     |   ✗    |  ✗      |
+| Faster R-CNN + FPN  |   ✓    |                             ✓ |     ✓      |   ✓   |     ✗     |   ✓    |  ✓      |
+| Mask R-CNN          |   ✓    |                             ✓ |     x      |   ✓   |     ✗     |   ✗    |  ✗      |
+| Mask R-CNN + FPN    |   ✓    |                             ✓ |     ✓      |   ✓   |     ✗     |   ✗    |  ✓      |
+| Cascade Faster-RCNN |   ✓    |                             ✓ |     ✓      |   ✗   |     ✗     |   ✗    |  ✗      |
+| Cascade Mask-RCNN   |   ✓    |                             ✗ |     ✗      |   ✓   |     ✗     |   ✗    |  ✗      |
+| Libra R-CNN         |   ✗    |                             ✓ |     ✗      |   ✗   |     ✗     |   ✗    |  ✗      |
+| RetinaNet           |   ✓    |                             ✗ |     ✗      |   ✗   |     ✗     |   ✗    |  ✗      |
+| YOLOv3              |   ✓    |                             ✓ |     ✗      |   ✗   |     ✓     |   ✗    |  ✗      |
+| SSD                 |   ✗    |                             ✗ |     ✗      |   ✗   |     ✓     |   ✗    |  ✗      |
+| BlazeFace           |   ✗    |                             ✗ |     ✗      |   ✗   |     ✗     |   ✗    |  ✗      |
+| Faceboxes           |   ✗    |                             ✗ |     ✗      |   ✗   |     ✗     |   ✗    |  ✗      |
+<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost.
+**NOTE:** ✓ for config file and pretrain model provided in [Model Zoo](docs/MODEL_ZOO.md), ✗ for not provided but is supported generally.
+More models:
+- EfficientDet
+- FCOS
+- CornerNet-Squeeze
+- YOLOv4
+- PP-YOLO
+More Backbones:
+- DarkNet
+- VGG
+- GCNet
+- CBNet
+Advanced Features:
+- [x] **Synchronized Batch Norm**
+- [x] **Group Norm**
+- [x] **Modulated Deformable Convolution**
+- [x] **Deformable PSRoI Pooling**
+- [x] **Non-local and GCNet**
+**NOTE:** Synchronized batch normalization can only be used on multiple GPU devices, can not be used on CPU devices or single GPU device.
+The following is the relationship between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones.
+<div align="center">
+  <img src="docs/images/map_fps.png" width=800 />
+</div>
+**NOTE:**
+- `CBResNet` stands for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3% in PaddleDetection models
+- `Cascade-Faster-RCNN` stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8%
+- The enhanced `YOLOv3-ResNet50vd-DCN` is 10.6 absolute percentage points higher than paper on COCO mAP, and inference speed is nearly 70% faster than the darknet framework
+- All these models can be get in [Model Zoo](#Model-Zoo)
+The following is the relationship between COCO mAP and FPS on Tesla V100 of SOTA object detecters and PP-YOLO, which is faster and has better performance than YOLOv4, and reached mAP(0.5:0.95) as 45.2% on COCO test2019 dataset and 72.9 FPS on single Test V100. Please refer to [PP-YOLO](configs/ppyolo/README.md) for details.
+<div align="center">
+  <img src="docs/images/ppyolo_map_fps.png" width=600 />
+</div>
+## Tutorials
+### Get Started
+- [Installation guide](docs/tutorials/INSTALL.md)
+- [Quick start on small dataset](docs/tutorials/QUICK_STARTED.md)
+- [Train/Evaluation/Inference](docs/tutorials/GETTING_STARTED.md)
+- [How to train a custom dataset](docs/tutorials/Custom_DataSet.md)
+- [FAQ](docs/FAQ.md)
+### Advanced Tutorial
+- [Guide to preprocess pipeline and dataset definition](docs/advanced_tutorials/READER.md)
+- [Models technical](docs/advanced_tutorials/MODEL_TECHNICAL.md)
+- [Transfer learning document](docs/advanced_tutorials/TRANSFER_LEARNING.md)
+- [Parameter configuration](docs/advanced_tutorials/config_doc):
+  - [Introduction to the configuration workflow](docs/advanced_tutorials/config_doc/CONFIG.md)
+  - [Parameter configuration for RCNN model](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md)
+- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
+- [Model compression](slim)
+    - [Model compression benchmark](slim)
+    - [Quantization](slim/quantization)
+    - [Model pruning](slim/prune)
+    - [Model distillation](slim/distillation)
+    - [Neural Architecture Search](slim/nas)
+- [Deployment](deploy)
+    - [Export model for inference](docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
+    - [Python inference](deploy/python)
+    - [C++ inference](deploy/cpp)
+    - [Inference benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md)
+## Model Zoo
+- Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md).
+- [Mobile models](configs/mobile/README.md)
+- [Anchor free models](configs/anchor_free/README.md)
+- [Face detection models](docs/featured_model/FACE_DETECTION_en.md)
+- [Pretrained models for pedestrian detection](docs/featured_model/CONTRIB.md)
+- [Pretrained models for vehicle detection](docs/featured_model/CONTRIB.md)
+- [YOLOv3 enhanced model](docs/featured_model/YOLOv3_ENHANCEMENT.md): Compared to MAP of 33.0% in paper, enhanced YOLOv3 reaches the MAP of 43.6%, and inference speed is improved as well
+- [PP-YOLO](configs/ppyolo/README.md): PP-YOLO reeached mAP as 45.3% on COCO dataset，and 72.9 FPS on single Tesla V100
+- [Objects365 2019 Challenge champion model](docs/featured_model/champion_model/CACascadeRCNN.md)
+- [Best single model of Open Images 2019-Object Detction](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
+- [Practical Server-side detection method](configs/rcnn_enhance/README_en.md): Inference speed on single V100 GPU can reach 20FPS when COCO mAP is 47.8%.
+- [Large-scale practical object detection models](docs/featured_model/LARGE_SCALE_DET_MODEL_en.md): Large-scale practical server-side detection pretrained models with 676 categories are provided for most application scenarios, which can be used not only for direct inference but also finetuning on other datasets.
+## License
+PaddleDetection is released under the [Apache 2.0 license](LICENSE).
+## Updates
+v0.4.0 was released at `05/2020`, add PP-YOLO, TTFNet, HTC, ACFPN, etc. And add BlaceFace face landmark detection model, add a series of optimized SSDLite models on mobile side, add data augmentations GridMask and RandomErasing, add Matrix NMS and EMA training, and improved ease of use, fix many known bugs, etc.
+Please refer to [版本更新文档](docs/CHANGELOG.md) for details.
+## Contributing
+Contributions are highly welcomed and we would really appreciate your feedback!!
--- a/configs/dcn/yolov3_r50vd_dcn.yml
+++ b/configs/dcn/yolov3_r50vd_dcn.yml
@@ -40,11 +40,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: false

--- a/configs/dcn/yolov3_r50vd_dcn_db_iouaware_obj365_pretrained_coco.yml
+++ b/configs/dcn/yolov3_r50vd_dcn_db_iouaware_obj365_pretrained_coco.yml
@@ -44,7 +44,6 @@ YOLOv3Head:
  drop_block: true
 YOLOv3Loss:
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: false
  use_fine_grained_loss: true

--- a/configs/dcn/yolov3_r50vd_dcn_db_iouloss_obj365_pretrained_coco.yml
+++ b/configs/dcn/yolov3_r50vd_dcn_db_iouloss_obj365_pretrained_coco.yml
@@ -42,11 +42,6 @@ YOLOv3Head:
  drop_block: true
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: false
  use_fine_grained_loss: true

--- a/configs/dcn/yolov3_r50vd_dcn_db_obj365_pretrained_coco.yml
+++ b/configs/dcn/yolov3_r50vd_dcn_db_obj365_pretrained_coco.yml
@@ -43,11 +43,6 @@ YOLOv3Head:
  keep_prob: 0.94
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: false
  use_fine_grained_loss: true

--- a/configs/dcn/yolov3_r50vd_dcn_obj365_pretrained_coco.yml
+++ b/configs/dcn/yolov3_r50vd_dcn_obj365_pretrained_coco.yml
@@ -41,11 +41,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: false
  use_fine_grained_loss: true

--- a/configs/ppyolo/README.md
+++ b/configs/ppyolo/README.md
@@ -88,7 +88,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppy
 optional: Run `tools/anchor_cluster.py` to get anchors suitable for your dataset, and modify the anchor setting in `configs/ppyolo/ppyolo.yml`.
 ``` bash
-python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -m v2 -i 1000
+python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000
 ```
 ### 2. Evaluation

--- a/configs/ppyolo/README_cn.md
+++ b/configs/ppyolo/README_cn.md
@@ -85,9 +85,9 @@ PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度：
 ```bash
 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppyolo.yml --eval
 ```
-可选：在训练之前使用tools/anchor_cluster.py得到适用于你的数据集的anchor，并修改configs/ppyolo/ppyolo.yml中的anchor设置
+可选：在训练之前使用`tools/anchor_cluster.py`得到适用于你的数据集的anchor，并修改`configs/ppyolo/ppyolo.yml`中的anchor设置
 ```bash
-python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -m v2 -i 1000
+python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000
 ```
 ### 2. 评估

--- a/configs/ppyolo/ppyolo.yml
+++ b/configs/ppyolo/ppyolo.yml
@@ -44,7 +44,6 @@ YOLOv3Head:
  drop_block: true
 YOLOv3Loss:
-  batch_size: 24
  ignore_thresh: 0.7
  scale_x_y: 1.05
  label_smooth: false

--- a/configs/ppyolo/ppyolo_2x.yml
+++ b/configs/ppyolo/ppyolo_2x.yml
@@ -44,7 +44,6 @@ YOLOv3Head:
  drop_block: true
 YOLOv3Loss:
-  batch_size: 24
  ignore_thresh: 0.7
  scale_x_y: 1.05
  label_smooth: false

--- a/configs/ppyolo/ppyolo_r18vd.yml
+++ b/configs/ppyolo/ppyolo_r18vd.yml
@@ -39,7 +39,6 @@ YOLOv3Head:
  drop_block: true
 YOLOv3Loss:
-  batch_size: 32
  ignore_thresh: 0.7
  scale_x_y: 1.05
  label_smooth: false

--- a/configs/ppyolo/ppyolo_test.yml
+++ b/configs/ppyolo/ppyolo_test.yml
@@ -47,7 +47,6 @@ YOLOv3Head:
  drop_block: true
 YOLOv3Loss:
-  batch_size: 24
  ignore_thresh: 0.7
  scale_x_y: 1.05
  label_smooth: false

--- a/configs/yolov3_darknet.yml
+++ b/configs/yolov3_darknet.yml
@@ -35,11 +35,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: true

--- a/configs/yolov3_darknet_voc.yml
+++ b/configs/yolov3_darknet_voc.yml
@@ -36,11 +36,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: false

--- a/configs/yolov3_darknet_voc_diouloss.yml
+++ b/configs/yolov3_darknet_voc_diouloss.yml
@@ -36,7 +36,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: false
  iou_loss: DiouLossYolo

--- a/configs/yolov3_mobilenet_v1.yml
+++ b/configs/yolov3_mobilenet_v1.yml
@@ -36,11 +36,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: true

--- a/configs/yolov3_mobilenet_v1_fruit.yml
+++ b/configs/yolov3_mobilenet_v1_fruit.yml
@@ -38,11 +38,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: true

--- a/configs/yolov3_mobilenet_v1_voc.yml
+++ b/configs/yolov3_mobilenet_v1_voc.yml
@@ -37,11 +37,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: false

--- a/configs/yolov3_mobilenet_v3.yml
+++ b/configs/yolov3_mobilenet_v3.yml
@@ -38,11 +38,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: false

--- a/configs/yolov3_r34.yml
+++ b/configs/yolov3_r34.yml
@@ -38,11 +38,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: true

--- a/configs/yolov3_r34_voc.yml
+++ b/configs/yolov3_r34_voc.yml
@@ -39,11 +39,6 @@ YOLOv3Head:
    score_threshold: 0.01
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: false

--- a/configs/yolov4/yolov4_cspdarknet.yml
+++ b/configs/yolov4/yolov4_cspdarknet.yml
@@ -35,11 +35,6 @@ YOLOv4Head:
  scale_x_y: [1.2, 1.1, 1.05]
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 4
  ignore_thresh: 0.7
  label_smooth: true
  downsample: [8,16,32]

--- a/configs/yolov4/yolov4_cspdarknet_coco.yml
+++ b/configs/yolov4/yolov4_cspdarknet_coco.yml
@@ -34,11 +34,6 @@ YOLOv4Head:
  scale_x_y: [1.2, 1.1, 1.05]
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 8
  ignore_thresh: 0.7
  label_smooth: true
  downsample: [8,16,32]

--- a/configs/yolov4/yolov4_cspdarknet_voc.yml
+++ b/configs/yolov4/yolov4_cspdarknet_voc.yml
@@ -34,11 +34,6 @@ YOLOv4Head:
  scale_x_y: [1.2, 1.1, 1.05]
 YOLOv3Loss:
-  # batch_size here is only used for fine grained loss, not used
-  # for training batch_size setting, training batch_size setting
-  # is in configs/yolov3_reader.yml TrainReader.batch_size, batch
-  # size here should be set as same value as TrainReader.batch_size
-  batch_size: 4
  ignore_thresh: 0.7
  label_smooth: true
  downsample: [8,16,32]

--- a/deploy/README.md
+++ b/deploy/README.md
 # PaddleDetection 预测部署
-`PaddleDetection`目前支持使用`Python`和`C++`部署在`Windows` 和`Linux` 上运行。
+`PaddleDetection`目前支持：
+- 使用`Python`和`C++`部署在`Windows` 和`Linux` 上运行
+- [在线服务化部署](./serving/README.md)
+- [移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo)
 ## 模型导出
 训练得到一个满足要求的模型后，如果想要将该模型接入到C++服务器端预测库或移动端预测库，需要通过`tools/export_model.py`导出该模型。
@@ -20,4 +23,5 @@ yolov3_darknet # 模型目录
 ## 预测部署
 - [1. Python预测(支持 Linux 和 Windows)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/deploy/python)
 - [2. C++预测(支持 Linux 和 Windows)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/deploy/cpp)
- [3. 移动端部署参考Paddle-Lite文档](https://paddle-lite.readthedocs.io/zh/latest/)
+- [3. 在线服务化部署](./serving/README.md)
+- [4. 移动端部署](https://github.com/PaddlePaddle/Paddle-Lite-Demo)
--- a/deploy/serving/README.md
+++ b/deploy/serving/README.md
+# 服务端预测部署
+`PaddleDetection`训练出来的模型可以使用[Serving](https://github.com/PaddlePaddle/Serving) 部署在服务端。  
+本教程以在路标数据集[roadsign_voc](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar) 使用`configs/yolov3_mobilenet_v1_roadsign.yml`算法训练的模型进行部署。  
+预训练模型权重文件为[yolov3_mobilenet_v1_roadsign.pdparams](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_roadsign.pdparams) 。
+## 1. 首先验证模型
+```
+python tools/infer.py -c configs/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_roadsign.pdparams --infer_img=demo/road554.png
+```
+## 2. 安装 paddle serving
+```
+# 安装 paddle-serving-client
+pip install paddle-serving-client -i https://mirror.baidu.com/pypi/simple
+# 安装 paddle-serving-server
+pip install paddle-serving-server -i https://mirror.baidu.com/pypi/simple
+# 安装 paddle-serving-server-gpu
+pip install paddle-serving-server-gpu -i https://mirror.baidu.com/pypi/simple
+```
+## 3. 导出模型
+PaddleDetection在训练过程包括网络的前向和优化器相关参数，而在部署过程中，我们只需要前向参数，具体参考:[导出模型](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
+```
+python tools/export_serving_model.py -c configs/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1_roadsign.pdparams --output_dir=./inference_model
+```
+以上命令会在./inference_model文件夹下生成一个`yolov3_mobilenet_v1_roadsign`文件夹：
+```
+inference_model
+│   ├── yolov3_mobilenet_v1_roadsign
+│   │   ├── infer_cfg.yml
+│   │   ├── serving_client
+│   │   │   ├── serving_client_conf.prototxt
+│   │   │   ├── serving_client_conf.stream.prototxt
+│   │   ├── serving_server
+│   │   │   ├── conv1_bn_mean
+│   │   │   ├── conv1_bn_offset
+│   │   │   ├── conv1_bn_scale
+│   │   │   ├── ...
+```
+`serving_client`文件夹下`serving_client_conf.prototxt`详细说明了模型输入输出信息
+`serving_client_conf.prototxt`文件内容为：
+```
+feed_var {
+  name: "image"
+  alias_name: "image"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 608
+  shape: 608
+}
+feed_var {
+  name: "im_size"
+  alias_name: "im_size"
+  is_lod_tensor: false
+  feed_type: 2
+  shape: 2
+}
+fetch_var {
+  name: "multiclass_nms_0.tmp_0"
+  alias_name: "multiclass_nms_0.tmp_0"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+```
+## 4. 启动PaddleServing服务
+```
+cd inference_model/yolov3_mobilenet_v1_roadsign/
+# GPU
+python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_ids 0
+# CPU
+python -m paddle_serving_server.serve --model serving_server --port 9393
+```
+## 5. 测试部署的服务
+准备`label_list.txt`文件
+```
+# 进入到导出模型文件夹
+cd inference_model/yolov3_mobilenet_v1_roadsign/
+# 将数据集对应的label_list.txt文件拷贝到当前文件夹下
+cp ../../dataset/roadsign_voc/label_list.txt .
+```
+设置`prototxt`文件路径为`serving_client/serving_client_conf.prototxt` 。  
+设置`fetch`为`fetch=["multiclass_nms_0.tmp_0"])`
+测试
+```
+# 进入目录
+cd inference_model/yolov3_mobilenet_v1_roadsign/
+# 测试代码 test_client.py 会自动创建output文件夹，并在output下生成`bbox.json`和`road554.png`两个文件
+python ../../deploy/serving/test_client.py ../../demo/road554.png
+```
--- a/deploy/serving/test_client.py
+++ b/deploy/serving/test_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+import numpy as np
+from paddle_serving_client import Client
+from paddle_serving_app.reader import *
+import cv2
+preprocess = Sequential([
+    File2Image(), BGR2RGB(), Resize(
+        (608, 608), interpolation=cv2.INTER_LINEAR), Div(255.0), Transpose(
+            (2, 0, 1))
+])
+postprocess = RCNNPostprocess("label_list.txt", "output", [608, 608])
+client = Client()
+client.load_client_config("serving_client/serving_client_conf.prototxt")
+client.connect(['127.0.0.1:9393'])
+im = preprocess(sys.argv[1])
+fetch_map = client.predict(
+    feed={
+        "image": im,
+        "im_size": np.array(list(im.shape[1:])),
+    },
+    fetch=["multiclass_nms_0.tmp_0"])
+fetch_map["image"] = sys.argv[1]
+postprocess(fetch_map)
--- a/docs/tutorials/Custom_DataSet.md
+++ b/docs/tutorials/Custom_DataSet.md
@@ -6,8 +6,9 @@
    - [将数据集转换为VOC格式](#方式二将数据集转换为VOC格式)
    - [添加新数据源](#方式三添加新数据源)
 - [2.选择模型](#2选择模型)
- [3.修改参数配置](#3修改参数配置)
+- [3.生成Anchor](#3生成Anchor)
- [4.开始训练与部署](#4开始训练与部署)
+- [4.修改参数配置](#4修改参数配置)
+- [5.开始训练与部署](#5开始训练与部署)
 - [附：一个自定义数据集demo](#附一个自定义数据集demo)
 ## 1.准备数据
@@ -97,8 +98,23 @@ PaddleDetection中提供了丰富的模型库，具体可在[模型库](../MODEL
 同时也可以尝试PaddleDetection中开发的[YOLOv3增强模型](../featured_model/YOLOv3_ENHANCEMENT.md)、[YOLOv4模型](../featured_model/YOLO_V4.md)与[Anchor Free模型](../featured_model/ANCHOR_FREE_DETECTION.md)等。
+## 3.生成Anchor
-## 3.修改参数配置
+在yolo系列模型中，可以运行`tools/anchor_cluster.py`来得到适用于你的数据集Anchor，使用方法如下：
+``` bash
+python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000
+```
+目前`tools/anchor_cluster.py`支持的主要参数配置如下表所示：
+|    参数    |    用途    |    默认值    |    备注    |
+|:------:|:------:|:------:|:------:|
+| -c/--config | 模型的配置文件 | 无默认值 | 必须指定 |
+| -n/--n | 聚类的簇数 | 9 | Anchor的数目 |
+| -s/--size | 图片的输入尺寸 | None | 若指定，则使用指定的尺寸，如果不指定, 则尝试从配置文件中读取图片尺寸 |
+|  -m/--method  |  使用的Anchor聚类方法  |  v2  |  目前只支持yolov2/v5的聚类算法  |
+|  -i/--iters  |  kmeans聚类算法的迭代次数  |  1000  | kmeans算法收敛或者达到迭代次数后终止 |
+| -gi/--gen_iters |  遗传算法的迭代次数  | 1000 |  该参数只用于yolov5的Anchor聚类算法  |
+| -t/--thresh|  Anchor尺度的阈值  | 0.25 | 该参数只用于yolov5的Anchor聚类算法 |
+## 4.修改参数配置
 选择好模型后，需要在`configs`目录中找到对应的配置文件，为了适配在自定义数据集上训练，需要对参数配置做一些修改：
@@ -133,7 +149,7 @@ PaddleDetection中提供了丰富的模型库，具体可在[模型库](../MODEL
 - 预训练模型配置：通过在yaml配置文件中的`pretrain_weights: path/to/weights`参数可以配置路径，可以是链接或权重文件路径。可直接沿用配置文件中给出的在ImageNet数据集上的预训练模型。同时我们支持训练在COCO或Obj365数据集上的模型权重作为预训练模型，做迁移学习，详情可参考[迁移学习文档](../advanced_tutorials/TRANSFER_LEARNING_cn.md)。
-## 4.开始训练与部署
+## 5.开始训练与部署
 - 参数配置完成后，就可以开始训练模型了，具体可参考[训练/评估/预测](GETTING_STARTED_cn.md)入门文档。
 - 训练测试完成后，根据需要可以进行模型部署：首先需要导出可预测的模型，可参考[导出模型教程](../advanced_tutorials/deploy/EXPORT_MODEL.md)；导出模型后就可以进行[C++预测部署](../advanced_tutorials/deploy/DEPLOY_CPP.md)或者[python端预测部署](../advanced_tutorials/deploy/DEPLOY_PY.md)。

--- a/ppdet/core/workspace.py
+++ b/ppdet/core/workspace.py
@@ -97,6 +97,15 @@ def load_config(file_path):
        del cfg[READER_KEY]
    merge_config(cfg)
+    # NOTE: training batch size defined only in TrainReader, sychornized
+    #       batch size config to global, models can get batch size config
+    #       from global config when building model.
+    #       batch size in evaluation or inference can also be added here
+    if 'TrainReader' in global_config:
+        global_config['train_batch_size'] = global_config['TrainReader'][
+            'batch_size']
    return global_config

--- a/ppdet/modeling/losses/yolo_loss.py
+++ b/ppdet/modeling/losses/yolo_loss.py
@@ -32,17 +32,17 @@ class YOLOv3Loss(object):
    Combined loss for YOLOv3 network
    Args:
-        batch_size (int): training batch size
+        train_batch_size (int): training batch size
        ignore_thresh (float): threshold to ignore confidence loss
        label_smooth (bool): whether to use label smoothing
        use_fine_grained_loss (bool): whether use fine grained YOLOv3 loss
                                      instead of fluid.layers.yolov3_loss
    """
    __inject__ = ['iou_loss', 'iou_aware_loss']
-    __shared__ = ['use_fine_grained_loss']
+    __shared__ = ['use_fine_grained_loss', 'train_batch_size']
    def __init__(self,
-                 batch_size=8,
+                 train_batch_size=8,
                 ignore_thresh=0.7,
                 label_smooth=True,
                 use_fine_grained_loss=False,
@@ -51,7 +51,7 @@ class YOLOv3Loss(object):
                 downsample=[32, 16, 8],
                 scale_x_y=1.,
                 match_score=False):
-        self._batch_size = batch_size
+        self._train_batch_size = train_batch_size
        self._ignore_thresh = ignore_thresh
        self._label_smooth = label_smooth
        self._use_fine_grained_loss = use_fine_grained_loss
@@ -65,7 +65,7 @@ class YOLOv3Loss(object):
                 anchor_masks, mask_anchors, num_classes, prefix_name):
        if self._use_fine_grained_loss:
            return self._get_fine_grained_loss(
-                outputs, targets, gt_box, self._batch_size, num_classes,
+                outputs, targets, gt_box, self._train_batch_size, num_classes,
                mask_anchors, self._ignore_thresh)
        else:
            losses = []
@@ -95,7 +95,7 @@ class YOLOv3Loss(object):
                               outputs,
                               targets,
                               gt_box,
-                               batch_size,
+                               train_batch_size,
                               num_classes,
                               mask_anchors,
                               ignore_thresh,
@@ -108,7 +108,7 @@ class YOLOv3Loss(object):
            targets ([Variables]): List of Variables, The targets for yolo
                                   loss calculatation.
            gt_box (Variable): The ground-truth boudding boxes.
-            batch_size (int): The training batch size
+            train_batch_size (int): The training batch size
            num_classes (int): class num of dataset
            mask_anchors ([[float]]): list of anchors in each output layer
            ignore_thresh (float): prediction bbox overlap any gt_box greater
@@ -171,7 +171,7 @@ class YOLOv3Loss(object):
            loss_h = fluid.layers.reduce_sum(loss_h, dim=[1, 2, 3])
            if self._iou_loss is not None:
                loss_iou = self._iou_loss(x, y, w, h, tx, ty, tw, th, anchors,
-                                          downsample, self._batch_size,
+                                          downsample, self._train_batch_size,
                                          scale_x_y)
                loss_iou = loss_iou * tscale_tobj
                loss_iou = fluid.layers.reduce_sum(loss_iou, dim=[1, 2, 3])
@@ -180,14 +180,14 @@ class YOLOv3Loss(object):
            if self._iou_aware_loss is not None:
                loss_iou_aware = self._iou_aware_loss(
                    ioup, x, y, w, h, tx, ty, tw, th, anchors, downsample,
-                    self._batch_size, scale_x_y)
+                    self._train_batch_size, scale_x_y)
                loss_iou_aware = loss_iou_aware * tobj
                loss_iou_aware = fluid.layers.reduce_sum(
                    loss_iou_aware, dim=[1, 2, 3])
                loss_iou_awares.append(fluid.layers.reduce_mean(loss_iou_aware))
            loss_obj_pos, loss_obj_neg = self._calc_obj_loss(
-                output, obj, tobj, gt_box, self._batch_size, anchors,
+                output, obj, tobj, gt_box, self._train_batch_size, anchors,
                num_classes, downsample, self._ignore_thresh, scale_x_y)
            loss_cls = fluid.layers.sigmoid_cross_entropy_with_logits(cls, tcls)

--- a/ppdet/utils/voc_eval.py
+++ b/ppdet/utils/voc_eval.py
@@ -107,7 +107,7 @@ def bbox_eval(results,
    logger.info("Accumulating evaluatation results...")
    detection_map.accumulate()
    map_stat = 100. * detection_map.get_map()
-    logger.info("mAP({:.2f}, {}) = {:.2f}".format(overlap_thresh, map_type,
+    logger.info("mAP({:.2f}, {}) = {:.2f}%".format(overlap_thresh, map_type,
                                                   map_stat))
    return map_stat