Update SSD documentation (#987)

* Refine documentation and code.

Update SSD documentation (#987)
* Refine documentation and code.
3257b64c · Xingyuan Bu · qingqing01 · 894e7ac0 · 3257b64c · 3257b64c
10 changed file
--- a/fluid/object_detection/README.md
+++ b/fluid/object_detection/README.md
@@ -4,6 +4,14 @@ The minimum PaddlePaddle version needed for the code sample in this directory is

 ## SSD Object Detection

+## Table of Contents
+- [Introduction](#introduction)
+- [Data Preparation](#data-preparation)
+- [Train](#train)
+- [Evaluate](#evaluate)
+- [Infer and Visualize](#infer-and-visualize)
+- [Released Model](#released-model)
+
 ### Introduction

 [Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) framework for object detection can be categorized as a single stage detector. A single stage detector simplifies object detection as a regression problem, which directly predicts the bounding boxes and class probabilities without region proposal. SSD further makes improves by producing these predictions of different scales from different layers, as shown below. Six levels predictions are made in six different scale feature maps. And there are two 3x3 convolutional layers in each feature map, which predict category or a shape offset relative to the prior box(also called anchor), respectively. Thus, we get 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 detections per class.
@@ -19,8 +27,6 @@ SSD is readily pluggable into a wide variant standard convolutional network, suc

 You can use [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) or [MS-COCO dataset](http://cocodataset.org/#download).

-#### PASCAL VOC Dataset
-
 If you want to train a model on PASCAL VOC dataset, please download dataset at first, skip this step if you already have one.

 ```bash
@@ -30,8 +36,6 @@ cd data/pascalvoc

 The command `download.sh` also will create training and testing file lists.

-#### MS-COCO Dataset
-
 If you want to train a model on MS-COCO dataset, please download dataset at first, skip this step if you already have one.

 ```
@@ -71,7 +75,13 @@ We will release the pre-trained models by ourself in the upcoming soon.
  python train.py --help
  ```

-We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achive XXX% mAP under 11point metric.
+Data reader is defined in `reader.py`. All images will be resized to 300x300. In training stage, images are randomly distorted, expanded, cropped and flipped:
+   - distort: distort brightness, contrast, saturation, and hue.
+   - expand: put the original image into a larger expanded image which is initialized using image mean.
+   - crop: crop image with respect to different scale, aspect ratio, and overlap.
+   - flip: flip horizontally.
+
+We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achieve 73.32% mAP under 11point metric.

 ### Evaluate

@@ -115,4 +125,4 @@ MobileNet-v1-SSD 300x300 Visualization Examples

 | Model                    | Pre-trained Model  | Training data    | Test data    | mAP |
 |:------------------------:|:------------------:|:----------------:|:------------:|:----:|
-|MobileNet-v1-SSD 300x300  | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | XXX%  |
+|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | 73.32%  |
--- a/fluid/object_detection/README_cn.md
+++ b/fluid/object_detection/README_cn.md
@@ -4,6 +4,14 @@

 ## SSD 目标检测

+## Table of Contents
+- [简介](#简介)
+- [数据准备](#数据准备)
+- [模型训练](#模型训练)
+- [模型评估](#模型评估)
+- [模型预测以及可视化](#模型预测以及可视化)
+- [模型发布](#模型发布)
+
 ### 简介

 [Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) 是一种单阶段的目标检测器。与两阶段的检测方法不同，单阶段目标检测并不进行区域推荐，而是直接从特征图回归出目标的边界框和分类概率。SSD 运用了这种单阶段检测的思想，并且对其进行改进：在不同尺度的特征图上检测对应尺度的目标。如下图所示，SSD 在六个尺度的特征图上进行了不同层级的预测。每个层级由两个3x3卷积分别对目标类别和边界框偏移进行回归。因此对于每个类别，SSD 的六个层级一共会产生 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 个检测结果。
@@ -19,8 +27,6 @@ SSD 可以方便地插入到任何一种标准卷积网络中，比如 VGG、Res

 你可以使用 [PASCAL VOC 数据集](http://host.robots.ox.ac.uk/pascal/VOC/) 或者 [MS-COCO 数据集](http://cocodataset.org/#download)。

-#### PASCAL VOC 数据集
-
 如果你想在 PASCAL VOC 数据集上进行训练，请先使用下面的命令下载数据集。

 ```bash
@@ -30,8 +36,6 @@ cd data/pascalvoc

 `download.sh` 命令会自动创建训练和测试用的列表文件。

-#### MS-COCO 数据集
-
 如果你想在 MS-COCO 数据集上进行训练，请先使用下面的命令下载数据集。

 ```
@@ -70,7 +74,13 @@ cd data/coco
  python train.py --help
  ```

-我们使用了 RMSProp 优化算法来训练 MobileNet-SSD，batch大小为64，权重衰减系数为0.00005，初始学习率为 0.001，并且在第40、60、80、100 轮时使用 0.5, 0.25, 0.1, 0.01乘子进行学习率衰减。在120轮训练后，11point评价标准下的mAP为XXX%。
+数据的读取行为定义在 `reader.py` 中，所有的图片都会被缩放到300x300。在训练时，数据还会进行图片增强和标签增强，图片增强包括对图片本身的随机扰动、扩张和翻转，标签增强包括随机裁剪:
+   - 扰动: 扰动图片亮度、对比度、饱和度和色相。
+   - 扩张: 将原始图片放进一张使用像素均值填充(随后会在减均值操作中减掉)的扩张图中，再对此图进行裁剪、缩放和翻转。
+   - 翻转: 水平翻转。
+   - 裁剪: 根据缩放比例、长宽比例两个参数生成若干候选框，再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果。
+
+我们使用了 RMSProp 优化算法来训练 MobileNet-SSD，batch大小为64，权重衰减系数为0.00005，初始学习率为 0.001，并且在第40、60、80、100 轮时使用 0.5, 0.25, 0.1, 0.01乘子进行学习率衰减。在120轮训练后，11point评价标准下的mAP为73.32%。

 ### 模型评估

@@ -114,4 +124,4 @@ MobileNet-v1-SSD 300x300 预测可视化

 | 模型                    | 预训练模型  | 训练数据    | 测试数据    | mAP |
 |:------------------------:|:------------------:|:----------------:|:------------:|:----:|
-|MobileNet-v1-SSD 300x300  | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | XXX%  |
+|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | 73.32%  |
--- a/fluid/object_detection/eval.py
+++ b/fluid/object_detection/eval.py
@@ -64,6 +64,7 @@ def eval(args, data_args, test_list, batch_size, model_dir=None):
        place=place, feed_list=[image, gt_box, gt_label, difficult])

    def test():
+        # switch network to test mode (i.e. batch norm test mode)
        test_program = fluid.default_main_program().clone(for_test=True)
        with fluid.program_guard(test_program):
            map_eval = fluid.evaluator.DetectionMAP(
@@ -79,12 +80,12 @@ def eval(args, data_args, test_list, batch_size, model_dir=None):
        _, accum_map = map_eval.get_map_var()
        map_eval.reset(exe)
        for batch_id, data in enumerate(test_reader()):
-            test_map = exe.run(test_program,
+            test_map, = exe.run(test_program,
                                feed=feeder.feed(data),
                                fetch_list=[accum_map])
            if batch_id % 20 == 0:
-                print("Batch {0}, map {1}".format(batch_id, test_map[0]))
-        print("Test model {0}, map {1}".format(model_dir, test_map[0]))
+                print("Batch {0}, map {1}".format(batch_id, test_map))
+        print("Test model {0}, map {1}".format(model_dir, test_map))

    test()

@@ -101,9 +102,9 @@ if __name__ == '__main__':
        raise ValueError("The model path [%s] does not exist." %
                         (args.model_dir))
    if 'coco' in args.dataset:
-        data_dir = './data/coco'
+        data_dir = 'data/coco'
        if '2014' in args.dataset:
-            test_list = 'annotations/instances_minival2014.json'
+            test_list = 'annotations/instances_val2014.json'
        elif '2017' in args.dataset:
            test_list = 'annotations/instances_val2017.json'


--- a/fluid/object_detection/eval_coco_map.py
+++ b/fluid/object_detection/eval_coco_map.py
@@ -133,7 +133,7 @@ if __name__ == '__main__':

    data_dir = './data/coco'
    if '2014' in args.dataset:
-        test_list = 'annotations/instances_minival2014.json'
+        test_list = 'annotations/instances_val2014.json'
    elif '2017' in args.dataset:
        test_list = 'annotations/instances_val2017.json'


--- a/fluid/object_detection/images/009943.jpg
+++ b/fluid/object_detection/images/009943.jpg
--- a/fluid/object_detection/images/009956.jpg
+++ b/fluid/object_detection/images/009956.jpg
--- a/fluid/object_detection/images/009960.jpg
+++ b/fluid/object_detection/images/009960.jpg
--- a/fluid/object_detection/images/009962.jpg
+++ b/fluid/object_detection/images/009962.jpg
--- a/fluid/object_detection/infer.py
+++ b/fluid/object_detection/infer.py
@@ -34,8 +34,20 @@ def infer(args, data_args, image_path, model_dir):
    image_shape = [3, data_args.resize_h, data_args.resize_w]
    if 'coco' in data_args.dataset:
        num_classes = 91
+        # cocoapi
+        from pycocotools.coco import COCO
+        from pycocotools.cocoeval import COCOeval
+        label_fpath = os.path.join(data_dir, label_file)
+        coco = COCO(label_fpath)
+        category_ids = coco.getCatIds()
+        label_list = {
+            item['id']: item['name']
+            for item in coco.loadCats(category_ids)
+        }
+        label_list[0] = ['background']
    elif 'pascalvoc' in data_args.dataset:
        num_classes = 21
+        label_list = data_args.label_list

    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
    locs, confs, box, box_var = mobile_net(num_classes, image, image_shape)
@@ -54,13 +66,16 @@ def infer(args, data_args, image_path, model_dir):
    feeder = fluid.DataFeeder(place=place, feed_list=[image])

    data = infer_reader()
-    nmsed_out_v = exe.run(fluid.default_main_program(),
+
+    # switch network to test mode (i.e. batch norm test mode)
+    test_program = fluid.default_main_program().clone(for_test=True)
+    nmsed_out_v, = exe.run(test_program,
                           feed=feeder.feed([[data]]),
                           fetch_list=[nmsed_out],
                           return_numpy=False)
-    nmsed_out_v = np.array(nmsed_out_v[0])
+    nmsed_out_v = np.array(nmsed_out_v)
    draw_bounding_box_on_image(image_path, nmsed_out_v, args.confs_threshold,
-                               data_args.label_list)
+                               label_list)


 def draw_bounding_box_on_image(image_path, nms_out, confs_threshold,
@@ -93,10 +108,20 @@ if __name__ == '__main__':
    args = parser.parse_args()
    print_arguments(args)

+    data_dir = 'data/pascalvoc'
+    label_file = 'label_list'
+
+    if not os.path.exists(args.model_dir):
+        raise ValueError("The model path [%s] does not exist." %
+                         (args.model_dir))
+    if 'coco' in args.dataset:
+        data_dir = 'data/coco'
+        label_file = 'annotations/instances_val2014.json'
+
    data_args = reader.Settings(
        dataset=args.dataset,
-        data_dir='data/pascalvoc',
-        label_file='label_list',
+        data_dir=data_dir,
+        label_file=label_file,
        resize_h=args.resize_h,
        resize_w=args.resize_w,
        mean_value=[args.mean_value_B, args.mean_value_G, args.mean_value_R],

--- a/fluid/object_detection/train.py
+++ b/fluid/object_detection/train.py
@@ -19,7 +19,6 @@ add_arg('batch_size',       int,   64,        "Minibatch size.")
 add_arg('num_passes',       int,   120,       "Epoch number.")
 add_arg('use_gpu',          bool,  True,      "Whether use GPU.")
 add_arg('parallel',         bool,  True,      "Parallel.")
-add_arg('use_nccl',         bool,  True,      "NCCL.")
 add_arg('dataset',          str,   'pascalvoc', "coco2014, coco2017, and pascalvoc.")
 add_arg('model_save_dir',   str,   'model',     "The path to save model.")
 add_arg('pretrained_model', str,   'pretrained/ssd_mobilenet_v1_coco/', "The init model path.")
@@ -35,133 +34,8 @@ add_arg('mean_value_R',     float, 127.5,  "Mean value for R channel which will
 add_arg('is_toy',           int,   0, "Toy for quick debug, 0 means using all data, while n means using only n sample.")
 #yapf: enable

-def parallel_do(args,
-                train_file_list,
-                val_file_list,
-                data_args,
-                learning_rate,
-                batch_size,
-                num_passes,
-                model_save_dir,
-                pretrained_model=None):
-    image_shape = [3, data_args.resize_h, data_args.resize_w]
-    if data_args.dataset == 'coco':
-        num_classes = 81
-    elif data_args.dataset == 'pascalvoc':
-        num_classes = 21
-
-    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
-    gt_box = fluid.layers.data(
-        name='gt_box', shape=[4], dtype='float32', lod_level=1)
-    gt_label = fluid.layers.data(
-        name='gt_label', shape=[1], dtype='int32', lod_level=1)
-    difficult = fluid.layers.data(
-        name='gt_difficult', shape=[1], dtype='int32', lod_level=1)
-
-    if args.parallel:
-        places = fluid.layers.get_places()
-        pd = fluid.layers.ParallelDo(places, use_nccl=args.use_nccl)
-        with pd.do():
-            image_ = pd.read_input(image)
-            gt_box_ = pd.read_input(gt_box)
-            gt_label_ = pd.read_input(gt_label)
-            difficult_ = pd.read_input(difficult)
-            locs, confs, box, box_var = mobile_net(num_classes, image_,
-                                                   image_shape)
-            loss = fluid.layers.ssd_loss(locs, confs, gt_box_, gt_label_, box,
-                                         box_var)
-            nmsed_out = fluid.layers.detection_output(
-                locs, confs, box, box_var, nms_threshold=0.45)
-            loss = fluid.layers.reduce_sum(loss)
-            pd.write_output(loss)
-            pd.write_output(nmsed_out)
-
-        loss, nmsed_out = pd()
-        loss = fluid.layers.mean(loss)
-    else:
-        locs, confs, box, box_var = mobile_net(num_classes, image, image_shape)
-        nmsed_out = fluid.layers.detection_output(
-            locs, confs, box, box_var, nms_threshold=0.45)
-        loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box,
-                                     box_var)
-        loss = fluid.layers.reduce_sum(loss)
-
-    test_program = fluid.default_main_program().clone(for_test=True)
-    with fluid.program_guard(test_program):
-        map_eval = fluid.evaluator.DetectionMAP(
-            nmsed_out,
-            gt_label,
-            gt_box,
-            difficult,
-            num_classes,
-            overlap_threshold=0.5,
-            evaluate_difficult=False,
-            ap_version=args.ap_version)
-
-    if data_args.dataset == 'coco':
-        # learning rate decay in 12, 19 pass, respectively
-        if '2014' in train_file_list:
-            boundaries = [82783 / batch_size * 12, 82783 / batch_size * 19]
-        elif '2017' in train_file_list:
-            boundaries = [118287 / batch_size * 12, 118287 / batch_size * 19]
-    elif data_args.dataset == 'pascalvoc':
-        boundaries = [40000, 60000]
-    values = [learning_rate, learning_rate * 0.5, learning_rate * 0.25]
-    optimizer = fluid.optimizer.RMSProp(
-        learning_rate=fluid.layers.piecewise_decay(boundaries, values),
-        regularization=fluid.regularizer.L2Decay(0.00005), )
-
-    optimizer.minimize(loss)
-
-    place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    exe.run(fluid.default_startup_program())
-
-    if pretrained_model:
-        def if_exist(var):
-            return os.path.exists(os.path.join(pretrained_model, var.name))
-        fluid.io.load_vars(exe, pretrained_model, predicate=if_exist)
-
-    train_reader = paddle.batch(
-        reader.train(data_args, train_file_list), batch_size=batch_size)
-    test_reader = paddle.batch(
-        reader.test(data_args, val_file_list), batch_size=batch_size)
-    feeder = fluid.DataFeeder(
-        place=place, feed_list=[image, gt_box, gt_label, difficult])
-
-    def test(pass_id):
-        _, accum_map = map_eval.get_map_var()
-        map_eval.reset(exe)
-        test_map = None
-        for data in test_reader():
-            test_map = exe.run(test_program,
-                               feed=feeder.feed(data),
-                               fetch_list=[accum_map])
-        print("Pass {0}, test map {1}".format(pass_id, test_map[0]))
-
-    for pass_id in range(num_passes):
-        start_time = time.time()
-        prev_start_time = start_time
-        end_time = 0
-        for batch_id, data in enumerate(train_reader()):
-            prev_start_time = start_time
-            start_time = time.time()
-            loss_v = exe.run(fluid.default_main_program(),
-                             feed=feeder.feed(data),
-                             fetch_list=[loss])
-            end_time = time.time()
-            if batch_id % 20 == 0:
-                print("Pass {0}, batch {1}, loss {2}, time {3}".format(
-                    pass_id, batch_id, loss_v[0], start_time - prev_start_time))
-        test(pass_id)
-
-        if pass_id % 10 == 0 or pass_id == num_passes - 1:
-            model_path = os.path.join(model_save_dir, str(pass_id))
-            print 'save models to %s' % (model_path)
-            fluid.io.save_persistables(exe, model_path)
-

-def parallel_exe(args,
+def train(args,
          train_file_list,
          val_file_list,
          data_args,
@@ -186,10 +60,6 @@ def parallel_exe(args,
        name='gt_label', shape=[1], dtype='int32', lod_level=1)
    difficult = fluid.layers.data(
        name='gt_difficult', shape=[1], dtype='int32', lod_level=1)
-    gt_iscrowd = fluid.layers.data(
-        name='gt_iscrowd', shape=[1], dtype='int32', lod_level=1)
-    gt_image_info = fluid.layers.data(
-        name='gt_image_id', shape=[3], dtype='int32', lod_level=1)

    locs, confs, box, box_var = mobile_net(num_classes, image, image_shape)
    nmsed_out = fluid.layers.detection_output(
@@ -267,15 +137,15 @@ def parallel_exe(args,
        _, accum_map = map_eval.get_map_var()
        map_eval.reset(exe)
        for batch_id, data in enumerate(test_reader()):
-            test_map = exe.run(test_program,
+            test_map, = exe.run(test_program,
                               feed=feeder.feed(data),
                               fetch_list=[accum_map])
            if batch_id % 20 == 0:
-                print("Batch {0}, map {1}".format(batch_id, test_map[0]))
+                print("Batch {0}, map {1}".format(batch_id, test_map))
        if test_map[0] > best_map:
            best_map = test_map[0]
            save_model('best_model')
-        print("Pass {0}, test map {1}".format(pass_id, test_map[0]))
+        print("Pass {0}, test map {1}".format(pass_id, test_map))
        return best_map

    for pass_id in range(num_passes):
@@ -285,7 +155,9 @@ def parallel_exe(args,
        for batch_id, data in enumerate(train_reader()):
            prev_start_time = start_time
            start_time = time.time()
-            if len(data) < devices_num: continue
+            if len(data) < (devices_num * 2):
+                print("There are too few data to train on all devices.")
+                continue
            if args.parallel:
                loss_v, = train_exe.run(fetch_list=[loss.name],
                                        feed=feeder.feed(data))
@@ -314,10 +186,10 @@ if __name__ == '__main__':
    label_file = 'label_list'
    model_save_dir = args.model_save_dir
    if 'coco' in args.dataset:
-        data_dir = './data/coco'
+        data_dir = 'data/coco'
        if '2014' in args.dataset:
            train_file_list = 'annotations/instances_train2014.json'
-            val_file_list = 'annotations/instances_minival2014.json'
+            val_file_list = 'annotations/instances_val2014.json'
        elif '2017' in args.dataset:
            train_file_list = 'annotations/instances_train2017.json'
            val_file_list = 'annotations/instances_val2017.json'
@@ -333,8 +205,7 @@ if __name__ == '__main__':
        apply_expand=args.apply_expand,
        ap_version = args.ap_version,
        toy=args.is_toy)
-    method = parallel_exe
-    method(
+    train(
        args,
        train_file_list=train_file_list,
        val_file_list=val_file_list,