Update SSD documentation. (#968)

* update wiki * refine wiki * train.py default args * update wiki cn * update wiki

Update SSD documentation. (#968)
* update wiki * refine wiki * train.py default args * update wiki cn * update wiki
3aa16d52 · Xingyuan Bu · qingqing01 · 55849d4e · 3aa16d52 · 3aa16d52
9 changed file
--- a/fluid/object_detection/README.md
+++ b/fluid/object_detection/README.md
-The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+The minimum PaddlePaddle version needed for the code sample in this directory is the latest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
 ---
@@ -6,7 +6,14 @@ The minimum PaddlePaddle version needed for the code sample in this directory is
 ### Introduction
-[Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) framework for object detection is based on a feed-forward convolutional network. The early network is a standard convolutional architecture for image classification, such as VGG, ResNet, or MobileNet, which is also called base network. In this tutorial we used [MobileNet](https://arxiv.org/abs/1704.04861).
+[Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) framework for object detection can be categorized as a single stage detector. A single stage detector simplifies object detection as a regression problem, which directly predicts the bounding boxes and class probabilities without region proposal. SSD further makes improves by producing these predictions of different scales from different layers, as shown below. Six levels predictions are made in six different scale feature maps. And there are two 3x3 convolutional layers in each feature map, which predict category or a shape offset relative to the prior box(also called anchor), respectively. Thus, we get 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 detections per class.
+<p align="center">
+<img src="images/SSD_paper_figure.jpg" height=300 width=900 hspace='10'/> <br />
+The Single Shot MultiBox Detector (SSD)
+</p>
+SSD is readily pluggable into a wide variant standard convolutional network, such as VGG, ResNet, or MobileNet, which is also called base network or backbone. In this tutorial we used [MobileNet](https://arxiv.org/abs/1704.04861).
 ### Data Preparation
@@ -14,7 +21,7 @@ You can use [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) or [MS
 #### PASCAL VOC Dataset
-If you want to train model on PASCAL VOC dataset, please download datset at first, skip this step if you already have one.
+If you want to train a model on PASCAL VOC dataset, please download dataset at first, skip this step if you already have one.
 ```bash
 cd data/pascalvoc
@@ -25,7 +32,7 @@ The command `download.sh` also will create training and testing file lists.
 #### MS-COCO Dataset
-If you want to train model on MS-COCO dataset, please download datset at first, skip this step if you already have one.
+If you want to train a model on MS-COCO dataset, please download dataset at first, skip this step if you already have one.
 ```
 cd data/coco
@@ -36,45 +43,46 @@ cd data/coco
 #### Download the Pre-trained Model.
-We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other dataset, like PASCAL VOC. Then other pre-trained model is MobileNet v1 trained on ImageNet 2012 dataset, but removed the last weights and bias in Fully-Connected layer.
+We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other datasets, like PASCAL VOC. The other pre-trained model is MobileNet-v1 trained on ImageNet 2012 dataset but removed the last weights and bias in the Fully-Connected layer.
-Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md). The MobileNet v1 model is converted [Caffe](https://github.com/shicai/MobileNet-Caffe).
+Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md). The MobileNet-v1 model is converted from [Caffe](https://github.com/shicai/MobileNet-Caffe).
+We will release the pre-trained models by ourself in the upcoming soon.
  - Download MobileNet-v1 SSD:
-    ```
+    ```bash
    ./pretrained/download_coco.sh
    ```
  - Download MobileNet-v1:
-    ```
+    ```bash
    ./pretrained/download_imagenet.sh
    ```
 #### Train on PASCAL VOC
-  - Train on one device (/GPU).
-  ```python
-  env CUDA_VISIBLE_DEVICES=0 python -u train.py --parallel=False --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/'
-  ```
-  - Train on multi devices (/GPUs).
-  ```python
+`train.py` is the main caller of the training module. Examples of usage are shown below.
-  env CUDA_VISIBLE_DEVICES=0,1 python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/'
+  ```bash
+  python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/'
  ```
+   - Set ```export CUDA_VISIBLE_DEVICES=0,1``` to specifiy the number of GPU you want to use.
+   - Set ```--dataset='coco2014'``` or ```--dataset='coco2017'``` to train model on MS COCO dataset.
+   - For more help on arguments:
-#### Train on MS-COCO
+  ```bash
-  - Train on one device (/GPU).
+  python train.py --help
-  ```python
-  env CUDA_VISIBLE_DEVICES=0 python -u train.py --parallel=False --dataset='coco2014' --pretrained_model='pretrained/mobilenet_v1_imagenet/'
-  ```
-  - Train on multi devices (/GPUs).
-  ```python
-  env CUDA_VISIBLE_DEVICES=0,1 python -u train.py --batch_size=64 --dataset='coco2014' --pretrained_model='pretrained/mobilenet_v1_imagenet/'
  ```
-TBD
+We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achive XXX% mAP under 11point metric.
 ### Evaluate
-You can evaluate your trained model in different metric like 11point, integral on both PASCAL VOC and COCO dataset. Moreover, we provide eval_coco_map.py which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). To use this eval_coco_map.py, [cocoapi](https://github.com/cocodataset/cocoapi) is needed.
+You can evaluate your trained model in different metrics like 11point, integral on both PASCAL VOC and COCO dataset. Note we set the default test list to the dataset's test/val list, you can use your own test list by setting ```--test_list``` args.
+`eval.py` is the main caller of the evaluating module. Examples of usage are shown below.
+```bash
+python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45
+```
+You can set ```--dataset``` to ```coco2014``` or ```coco2017``` to evaluate COCO dataset. Moreover, we provide `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). To use this eval_coco_map.py, [cocoapi](https://github.com/cocodataset/cocoapi) is needed.
 Install the cocoapi:
 ```
 # COCOAPI=/path/to/clone/cocoapi
@@ -86,44 +94,25 @@ make install
 # not to install the COCO API into global site-packages
 python2 setup.py install --user
 ```
-Note we set the defualt test list to the dataset's test/val list, you can use your own test list by setting test_list args.
-#### Evaluate on PASCAL VOC
-```python
-env CUDA_VISIBLE_DEVICES=0 python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/90' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point'
-```
-#### Evaluate on MS-COCO
-```python
-env CUDA_VISIBLE_DEVICES=0 python eval.py --dataset='coco2014' --nms_threshold=0.5 --model_dir='train_coco_model/40' --test_list='annotations/instances_minival2014.json' --ap_version='integral'
-env CUDA_VISIBLE_DEVICES=0 python eval_coco_map.py --dataset='coco2017' --nms_threshold=0.5 --model_dir='train_coco_model/40' --test_list='annotations/instances_minival2017.json'
-```
-TBD
 ### Infer and Visualize
+`infer.py` is the main caller of the inferring module. Examples of usage are shown below.
-```python
+```bash
-env CUDA_VISIBLE_DEVICES=0 python infer.py --dataset='coco' --nms_threshold=0.5 --model_dir='train_coco_model/20' --image_path='./data/coco/val2014/COCO_val2014_000000000139.jpg'
+python infer.py --dataset='pascalvoc' --nms_threshold=0.45 --model_dir='train_pascal_model/best_model' --image_path='./data/pascalvoc/VOCdevkit/VOC2007/JPEGImages/009963.jpg'
 ```
-Below is the examples after running python infer.py to inference and visualize the model result.
+Below are the examples of running the inference and visualizing the model result.
 <p align="center">
-<img src="images/COCO_val2014_000000000139.jpg" height=300 width=400 hspace='10'/>
+<img src="images/009943.jpg" height=300 width=400 hspace='10'/>
-<img src="images/COCO_val2014_000000000785.jpg" height=300 width=400 hspace='10'/>
+<img src="images/009956.jpg" height=300 width=400 hspace='10'/>
-<img src="images/COCO_val2014_000000142324.jpg" height=300 width=400 hspace='10'/>
+<img src="images/009960.jpg" height=300 width=400 hspace='10'/>
-<img src="images/COCO_val2014_000000144003.jpg" height=300 width=400 hspace='10'/> <br />
+<img src="images/009962.jpg" height=300 width=400 hspace='10'/> <br />
-MobileNet-SSD300x300 Visualization Examples
+MobileNet-v1-SSD 300x300 Visualization Examples
 </p>
-TBD
 ### Released Model
 | Model                    | Pre-trained Model  | Training data    | Test data    | mAP |
 |:------------------------:|:------------------:|:----------------:|:------------:|:----:|
-|MobileNet-v1-SSD 300x300  | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | xx%  |
+|MobileNet-v1-SSD 300x300  | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | XXX%  |
-|MobileNet-v1-SSD 300x300  | ImageNet MobileNet | VOC07+12 trainval| VOC07 test   | xx%  |
-|MobileNet-v1-SSD 300x300  | ImageNet MobileNet | MS-COCO trainval | MS-COCO test | xx%  |
-TBD
--- a/fluid/object_detection/README_cn.md
+++ b/fluid/object_detection/README_cn.md
+运行本目录下的程序示例需要使用 PaddlePaddle 最新的 develop branch 版本。如果您的 PaddlePaddle 安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。
+---
+## SSD 目标检测
+### 简介
+[Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) 是一种单阶段的目标检测器。与两阶段的检测方法不同，单阶段目标检测并不进行区域推荐，而是直接从特征图回归出目标的边界框和分类概率。SSD 运用了这种单阶段检测的思想，并且对其进行改进：在不同尺度的特征图上检测对应尺度的目标。如下图所示，SSD 在六个尺度的特征图上进行了不同层级的预测。每个层级由两个3x3卷积分别对目标类别和边界框偏移进行回归。因此对于每个类别，SSD 的六个层级一共会产生 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 个检测结果。
+<p align="center">
+<img src="images/SSD_paper_figure.jpg" height=300 width=900 hspace='10'/> <br />
+SSD 目标检测模型
+</p>
+SSD 可以方便地插入到任何一种标准卷积网络中，比如 VGG、ResNet 或者 MobileNet，这些网络被称作检测器的基网络。在这个示例中我们使用 [MobileNet](https://arxiv.org/abs/1704.04861)。
+### 数据准备
+你可以使用 [PASCAL VOC 数据集](http://host.robots.ox.ac.uk/pascal/VOC/) 或者 [MS-COCO 数据集](http://cocodataset.org/#download)。
+#### PASCAL VOC 数据集
+如果你想在 PASCAL VOC 数据集上进行训练，请先使用下面的命令下载数据集。
+```bash
+cd data/pascalvoc
+./download.sh
+```
+`download.sh` 命令会自动创建训练和测试用的列表文件。
+#### MS-COCO 数据集
+如果你想在 MS-COCO 数据集上进行训练，请先使用下面的命令下载数据集。
+```
+cd data/coco
+./download.sh
+```
+### 模型训练
+#### 下载预训练模型
+我们提供了两个预训练模型。第一个模型是在 COCO 数据集上预训练的 MobileNet-v1 SSD，我们将它的预测头移除了以便在 COCO 以外的数据集上进行训练。第二个模型是在 ImageNet 2012 数据集上预训练的 MobileNet-v1，我们也将最后的全连接层移除以便进行目标检测训练。
+声明：MobileNet-v1 SSD 模型转换自[TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md)。MobileNet-v1 模型转换自[Caffe](https://github.com/shicai/MobileNet-Caffe)。我们不久也会发布我们自己预训练的模型。
+  - 下载 MobileNet-v1 SSD:
+    ```bash
+    ./pretrained/download_coco.sh
+    ```
+  - 下载 MobileNet-v1:
+    ```bash
+    ./pretrained/download_imagenet.sh
+    ```
+#### 训练 PASCAL VOC 数据集
+`train.py` 是训练模块的主要执行程序，调用示例如下：
+  ```bash
+  python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/'
+  ```
+   - 可以通过设置 ```export CUDA_VISIBLE_DEVICES=0,1``` 指定想要使用的GPU数量。
+   - 可以通过设置 ```--dataset='coco2014'``` 或 ```--dataset='coco2017'``` 指定训练 MS-COCO数据集。
+   - 更多的可选参数见:
+  ```bash
+  python train.py --help
+  ```
+我们使用了 RMSProp 优化算法来训练 MobileNet-SSD，batch大小为64，权重衰减系数为0.00005，初始学习率为 0.001，并且在第40、60、80、100 轮时使用 0.5, 0.25, 0.1, 0.01乘子进行学习率衰减。在120轮训练后，11point评价标准下的mAP为XXX%。
+### 模型评估
+你可以使用11point、integral等指标在PASCAL VOC 和 COCO 数据集上评估训练好的模型。不失一般性，我们采用相应数据集的测试列表作为样例代码的默认列表，你也可以通过设置```--test_list```来指定自己的测试样本列表。
+`eval.py`是评估模块的主要执行程序，调用示例如下：
+```bash
+python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45
+```
+你可以设置```--dataset``` 为 ```coco2014``` 或 ```coco2017```来评估 COCO 数据集。我们也提供了`eval_coco_map.py`以进行[COCO官方评估](http://cocodataset.org/#detections-eval)。若要使用 eval_coco_map.py, 需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi)：
+```
+# COCOAPI=/path/to/clone/cocoapi
+git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
+cd $COCOAPI/PythonAPI
+# Install into global site-packages
+make install
+# Alternatively, if you do not have permissions or prefer
+# not to install the COCO API into global site-packages
+python2 setup.py install --user
+```
+### 模型预测以及可视化
+`infer.py`是预测及可视化模块的主要执行程序，调用示例如下：
+```bash
+python infer.py --dataset='pascalvoc' --nms_threshold=0.45 --model_dir='train_pascal_model/best_model' --image_path='./data/pascalvoc/VOCdevkit/VOC2007/JPEGImages/009963.jpg'
+```
+下图可视化了模型的预测结果：
+<p align="center">
+<img src="images/009943.jpg" height=300 width=400 hspace='10'/>
+<img src="images/009956.jpg" height=300 width=400 hspace='10'/>
+<img src="images/009960.jpg" height=300 width=400 hspace='10'/>
+<img src="images/009962.jpg" height=300 width=400 hspace='10'/> <br />
+MobileNet-v1-SSD 300x300 预测可视化
+</p>
+### 模型发布
+| 模型                    | 预训练模型  | 训练数据    | 测试数据    | mAP |
+|:------------------------:|:------------------:|:----------------:|:------------:|:----:|
+|MobileNet-v1-SSD 300x300  | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | XXX%  |
--- a/fluid/object_detection/images/009943.jpg
+++ b/fluid/object_detection/images/009943.jpg
--- a/fluid/object_detection/images/009956.jpg
+++ b/fluid/object_detection/images/009956.jpg
--- a/fluid/object_detection/images/009960.jpg
+++ b/fluid/object_detection/images/009960.jpg
--- a/fluid/object_detection/images/009962.jpg
+++ b/fluid/object_detection/images/009962.jpg
--- a/fluid/object_detection/images/SSD_paper_figure.jpg
+++ b/fluid/object_detection/images/SSD_paper_figure.jpg
--- a/fluid/object_detection/infer.py
+++ b/fluid/object_detection/infer.py
@@ -5,6 +5,7 @@ import argparse
 import functools
 from PIL import Image
 from PIL import ImageDraw
+from PIL import ImageFont
 import paddle
 import paddle.fluid as fluid
@@ -20,7 +21,7 @@ add_arg('use_gpu',          bool,  True,      "Whether use GPU.")
 add_arg('image_path',       str,   '',        "The image used to inference and visualize.")
 add_arg('model_dir',        str,   '',     "The model path.")
 add_arg('nms_threshold',    float, 0.45,   "NMS threshold.")
-add_arg('confs_threshold',  float, 0.2,    "Confidence threshold to draw bbox.")
+add_arg('confs_threshold',  float, 0.5,    "Confidence threshold to draw bbox.")
 add_arg('resize_h',         int,   300,    "The resized image height.")
 add_arg('resize_w',         int,   300,    "The resized image height.")
 add_arg('mean_value_B',     float, 127.5,  "Mean value for B channel which will be subtracted.")  #123.68
@@ -52,22 +53,18 @@ def infer(args, data_args, image_path, model_dir):
    infer_reader = reader.infer(data_args, image_path)
    feeder = fluid.DataFeeder(place=place, feed_list=[image])
-    def infer():
+    data = infer_reader()
-        data = infer_reader()
+    nmsed_out_v = exe.run(fluid.default_main_program(),
-        nmsed_out_v = exe.run(fluid.default_main_program(),
+                          feed=feeder.feed([[data]]),
-                              feed=feeder.feed([[data]]),
+                          fetch_list=[nmsed_out],
-                              fetch_list=[nmsed_out],
+                          return_numpy=False)
-                              return_numpy=False)
+    nmsed_out_v = np.array(nmsed_out_v[0])
-        nmsed_out_v = np.array(nmsed_out_v[0])
+    draw_bounding_box_on_image(image_path, nmsed_out_v, args.confs_threshold,
-        draw_bounding_box_on_image(image_path, nmsed_out_v,
+                               data_args.label_list)
-                                   args.confs_threshold)
-        for dt in nmsed_out_v:
-            category_id, score, xmin, ymin, xmax, ymax = dt.tolist()
-    infer()
+def draw_bounding_box_on_image(image_path, nms_out, confs_threshold,
-def draw_bounding_box_on_image(image_path, nms_out, confs_threshold):
+                               label_list):
    image = Image.open(image_path)
    draw = ImageDraw.Draw(image)
    im_width, im_height = image.size
@@ -85,6 +82,8 @@ def draw_bounding_box_on_image(image_path, nms_out, confs_threshold):
             (left, top)],
            width=4,
            fill='red')
+        if image.mode == 'RGB':
+            draw.text((left, top), label_list[int(category_id)], (255, 255, 0))
    image_name = image_path.split('/')[-1]
    print("image with bbox drawed saved as {}".format(image_name))
    image.save(image_name)
@@ -96,8 +95,8 @@ if __name__ == '__main__':
    data_args = reader.Settings(
        dataset=args.dataset,
-        data_dir='',
+        data_dir='data/pascalvoc',
-        label_file='',
+        label_file='label_list',
        resize_h=args.resize_h,
        resize_w=args.resize_w,
        mean_value=[args.mean_value_B, args.mean_value_G, args.mean_value_R],

--- a/fluid/object_detection/train.py
+++ b/fluid/object_detection/train.py
@@ -15,7 +15,7 @@ parser = argparse.ArgumentParser(description=__doc__)
 add_arg = functools.partial(add_arguments, argparser=parser)
 # yapf: disable
 add_arg('learning_rate',    float, 0.001,     "Learning rate.")
-add_arg('batch_size',       int,   32,        "Minibatch size.")
+add_arg('batch_size',       int,   64,        "Minibatch size.")
 add_arg('num_passes',       int,   120,       "Epoch number.")
 add_arg('use_gpu',          bool,  True,      "Whether use GPU.")
 add_arg('parallel',         bool,  True,      "Parallel.")
@@ -24,9 +24,9 @@ add_arg('dataset',          str,   'pascalvoc', "coco2014, coco2017, and pascalv
 add_arg('model_save_dir',   str,   'model',     "The path to save model.")
 add_arg('pretrained_model', str,   'pretrained/ssd_mobilenet_v1_coco/', "The init model path.")
 add_arg('apply_distort',    bool,  True,   "Whether apply distort.")
-add_arg('apply_expand',     bool,  False,  "Whether appley expand.")
+add_arg('apply_expand',     bool,  True,  "Whether appley expand.")
 add_arg('nms_threshold',    float, 0.45,   "NMS threshold.")
-add_arg('ap_version',       str,   'integral',   "integral, 11point.")
+add_arg('ap_version',       str,   '11point',   "integral, 11point.")
 add_arg('resize_h',         int,   300,    "The resized image height.")
 add_arg('resize_w',         int,   300,    "The resized image height.")
 add_arg('mean_value_B',     float, 127.5,  "Mean value for B channel which will be subtracted.")  #123.68