diff --git a/fluid/object_detection/README.md b/fluid/object_detection/README.md index b0b4d4477e658d49770746417920efa5e0d80caf..ec93f153e085401fd9d89b257b5ba45a700db08c 100644 --- a/fluid/object_detection/README.md +++ b/fluid/object_detection/README.md @@ -4,6 +4,14 @@ The minimum PaddlePaddle version needed for the code sample in this directory is ## SSD Object Detection +## Table of Contents +- [Introduction](#introduction) +- [Data Preparation](#data-preparation) +- [Train](#train) +- [Evaluate](#evaluate) +- [Infer and Visualize](#infer-and-visualize) +- [Released Model](#released-model) + ### Introduction [Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) framework for object detection can be categorized as a single stage detector. A single stage detector simplifies object detection as a regression problem, which directly predicts the bounding boxes and class probabilities without region proposal. SSD further makes improves by producing these predictions of different scales from different layers, as shown below. Six levels predictions are made in six different scale feature maps. And there are two 3x3 convolutional layers in each feature map, which predict category or a shape offset relative to the prior box(also called anchor), respectively. Thus, we get 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 detections per class. @@ -19,8 +27,6 @@ SSD is readily pluggable into a wide variant standard convolutional network, suc You can use [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) or [MS-COCO dataset](http://cocodataset.org/#download). -#### PASCAL VOC Dataset - If you want to train a model on PASCAL VOC dataset, please download dataset at first, skip this step if you already have one. ```bash @@ -30,8 +36,6 @@ cd data/pascalvoc The command `download.sh` also will create training and testing file lists. -#### MS-COCO Dataset - If you want to train a model on MS-COCO dataset, please download dataset at first, skip this step if you already have one. ``` @@ -71,7 +75,13 @@ We will release the pre-trained models by ourself in the upcoming soon. python train.py --help ``` -We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achive XXX% mAP under 11point metric. +Data reader is defined in `reader.py`. All images will be resized to 300x300. In training stage, images are randomly distorted, expanded, cropped and flipped: + - distort: distort brightness, contrast, saturation, and hue. + - expand: put the original image into a larger expanded image which is initialized using image mean. + - crop: crop image with respect to different scale, aspect ratio, and overlap. + - flip: flip horizontally. + +We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achieve 73.32% mAP under 11point metric. ### Evaluate @@ -115,4 +125,4 @@ MobileNet-v1-SSD 300x300 Visualization Examples | Model | Pre-trained Model | Training data | Test data | mAP | |:------------------------:|:------------------:|:----------------:|:------------:|:----:| -|MobileNet-v1-SSD 300x300 | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | XXX% | +|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | 73.32% | diff --git a/fluid/object_detection/README_cn.md b/fluid/object_detection/README_cn.md index 328fa35313dd21c032f89f00ed79785130462c4a..57c4e275d8d7bcc7c38d17af09c6b84329df9b68 100644 --- a/fluid/object_detection/README_cn.md +++ b/fluid/object_detection/README_cn.md @@ -4,6 +4,14 @@ ## SSD 目标检测 +## Table of Contents +- [简介](#简介) +- [数据准备](#数据准备) +- [模型训练](#模型训练) +- [模型评估](#模型评估) +- [模型预测以及可视化](#模型预测以及可视化) +- [模型发布](#模型发布) + ### 简介 [Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) 是一种单阶段的目标检测器。与两阶段的检测方法不同,单阶段目标检测并不进行区域推荐,而是直接从特征图回归出目标的边界框和分类概率。SSD 运用了这种单阶段检测的思想,并且对其进行改进:在不同尺度的特征图上检测对应尺度的目标。如下图所示,SSD 在六个尺度的特征图上进行了不同层级的预测。每个层级由两个3x3卷积分别对目标类别和边界框偏移进行回归。因此对于每个类别,SSD 的六个层级一共会产生 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 个检测结果。 @@ -19,8 +27,6 @@ SSD 可以方便地插入到任何一种标准卷积网络中,比如 VGG、Res 你可以使用 [PASCAL VOC 数据集](http://host.robots.ox.ac.uk/pascal/VOC/) 或者 [MS-COCO 数据集](http://cocodataset.org/#download)。 -#### PASCAL VOC 数据集 - 如果你想在 PASCAL VOC 数据集上进行训练,请先使用下面的命令下载数据集。 ```bash @@ -30,8 +36,6 @@ cd data/pascalvoc `download.sh` 命令会自动创建训练和测试用的列表文件。 -#### MS-COCO 数据集 - 如果你想在 MS-COCO 数据集上进行训练,请先使用下面的命令下载数据集。 ``` @@ -70,7 +74,13 @@ cd data/coco python train.py --help ``` -我们使用了 RMSProp 优化算法来训练 MobileNet-SSD,batch大小为64,权重衰减系数为0.00005,初始学习率为 0.001,并且在第40、60、80、100 轮时使用 0.5, 0.25, 0.1, 0.01乘子进行学习率衰减。在120轮训练后,11point评价标准下的mAP为XXX%。 +数据的读取行为定义在 `reader.py` 中,所有的图片都会被缩放到300x300。在训练时,数据还会进行图片增强和标签增强,图片增强包括对图片本身的随机扰动、扩张和翻转,标签增强包括随机裁剪: + - 扰动: 扰动图片亮度、对比度、饱和度和色相。 + - 扩张: 将原始图片放进一张使用像素均值填充(随后会在减均值操作中减掉)的扩张图中,再对此图进行裁剪、缩放和翻转。 + - 翻转: 水平翻转。 + - 裁剪: 根据缩放比例、长宽比例两个参数生成若干候选框,再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果。 + +我们使用了 RMSProp 优化算法来训练 MobileNet-SSD,batch大小为64,权重衰减系数为0.00005,初始学习率为 0.001,并且在第40、60、80、100 轮时使用 0.5, 0.25, 0.1, 0.01乘子进行学习率衰减。在120轮训练后,11point评价标准下的mAP为73.32%。 ### 模型评估 @@ -114,4 +124,4 @@ MobileNet-v1-SSD 300x300 预测可视化 | 模型 | 预训练模型 | 训练数据 | 测试数据 | mAP | |:------------------------:|:------------------:|:----------------:|:------------:|:----:| -|MobileNet-v1-SSD 300x300 | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | XXX% | +|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | 73.32% | diff --git a/fluid/object_detection/eval.py b/fluid/object_detection/eval.py index 627461c3b3846158fbd2dd815feba09ab0967425..59130d9907a1349237c08256214b24f92b8b36c5 100644 --- a/fluid/object_detection/eval.py +++ b/fluid/object_detection/eval.py @@ -64,6 +64,7 @@ def eval(args, data_args, test_list, batch_size, model_dir=None): place=place, feed_list=[image, gt_box, gt_label, difficult]) def test(): + # switch network to test mode (i.e. batch norm test mode) test_program = fluid.default_main_program().clone(for_test=True) with fluid.program_guard(test_program): map_eval = fluid.evaluator.DetectionMAP( @@ -79,12 +80,12 @@ def eval(args, data_args, test_list, batch_size, model_dir=None): _, accum_map = map_eval.get_map_var() map_eval.reset(exe) for batch_id, data in enumerate(test_reader()): - test_map = exe.run(test_program, - feed=feeder.feed(data), - fetch_list=[accum_map]) + test_map, = exe.run(test_program, + feed=feeder.feed(data), + fetch_list=[accum_map]) if batch_id % 20 == 0: - print("Batch {0}, map {1}".format(batch_id, test_map[0])) - print("Test model {0}, map {1}".format(model_dir, test_map[0])) + print("Batch {0}, map {1}".format(batch_id, test_map)) + print("Test model {0}, map {1}".format(model_dir, test_map)) test() @@ -101,9 +102,9 @@ if __name__ == '__main__': raise ValueError("The model path [%s] does not exist." % (args.model_dir)) if 'coco' in args.dataset: - data_dir = './data/coco' + data_dir = 'data/coco' if '2014' in args.dataset: - test_list = 'annotations/instances_minival2014.json' + test_list = 'annotations/instances_val2014.json' elif '2017' in args.dataset: test_list = 'annotations/instances_val2017.json' diff --git a/fluid/object_detection/eval_coco_map.py b/fluid/object_detection/eval_coco_map.py index b9f03a63004341e7081c424c633ac14d3127b7fb..0837f42ad89cda1e6a81825bc0545a11b48c4b3c 100644 --- a/fluid/object_detection/eval_coco_map.py +++ b/fluid/object_detection/eval_coco_map.py @@ -133,7 +133,7 @@ if __name__ == '__main__': data_dir = './data/coco' if '2014' in args.dataset: - test_list = 'annotations/instances_minival2014.json' + test_list = 'annotations/instances_val2014.json' elif '2017' in args.dataset: test_list = 'annotations/instances_val2017.json' diff --git a/fluid/object_detection/images/009943.jpg b/fluid/object_detection/images/009943.jpg index 0f5f38423e3c5de2bd0ef7bd859475da16de4f3c..d6262f97052aa7d82068e7d01f4d9982fcf0d3a9 100644 Binary files a/fluid/object_detection/images/009943.jpg and b/fluid/object_detection/images/009943.jpg differ diff --git a/fluid/object_detection/images/009956.jpg b/fluid/object_detection/images/009956.jpg index adb2bcaadfd1d2f758585a8b60863fa15f7d0da1..320d3e251782e946395e7fcadbef051bc2e94bee 100644 Binary files a/fluid/object_detection/images/009956.jpg and b/fluid/object_detection/images/009956.jpg differ diff --git a/fluid/object_detection/images/009960.jpg b/fluid/object_detection/images/009960.jpg index 67c464cef4540c4e6b2ac45a0948eeee3a9592d8..2f73d3d6f1956b1fa9ae1aba3b5d516a53f26b8f 100644 Binary files a/fluid/object_detection/images/009960.jpg and b/fluid/object_detection/images/009960.jpg differ diff --git a/fluid/object_detection/images/009962.jpg b/fluid/object_detection/images/009962.jpg index 90e875396c5e04862d87d5031400e8fb417eb8d8..182d6677bb80d94c5e7e4db3bf6654d3c064566c 100644 Binary files a/fluid/object_detection/images/009962.jpg and b/fluid/object_detection/images/009962.jpg differ diff --git a/fluid/object_detection/infer.py b/fluid/object_detection/infer.py index 8117a9727dfa38558364eff1fc404a96af99fe81..9861004127f9d7fcc5cd0881097daa189dd0783f 100644 --- a/fluid/object_detection/infer.py +++ b/fluid/object_detection/infer.py @@ -34,8 +34,20 @@ def infer(args, data_args, image_path, model_dir): image_shape = [3, data_args.resize_h, data_args.resize_w] if 'coco' in data_args.dataset: num_classes = 91 + # cocoapi + from pycocotools.coco import COCO + from pycocotools.cocoeval import COCOeval + label_fpath = os.path.join(data_dir, label_file) + coco = COCO(label_fpath) + category_ids = coco.getCatIds() + label_list = { + item['id']: item['name'] + for item in coco.loadCats(category_ids) + } + label_list[0] = ['background'] elif 'pascalvoc' in data_args.dataset: num_classes = 21 + label_list = data_args.label_list image = fluid.layers.data(name='image', shape=image_shape, dtype='float32') locs, confs, box, box_var = mobile_net(num_classes, image, image_shape) @@ -54,13 +66,16 @@ def infer(args, data_args, image_path, model_dir): feeder = fluid.DataFeeder(place=place, feed_list=[image]) data = infer_reader() - nmsed_out_v = exe.run(fluid.default_main_program(), - feed=feeder.feed([[data]]), - fetch_list=[nmsed_out], - return_numpy=False) - nmsed_out_v = np.array(nmsed_out_v[0]) + + # switch network to test mode (i.e. batch norm test mode) + test_program = fluid.default_main_program().clone(for_test=True) + nmsed_out_v, = exe.run(test_program, + feed=feeder.feed([[data]]), + fetch_list=[nmsed_out], + return_numpy=False) + nmsed_out_v = np.array(nmsed_out_v) draw_bounding_box_on_image(image_path, nmsed_out_v, args.confs_threshold, - data_args.label_list) + label_list) def draw_bounding_box_on_image(image_path, nms_out, confs_threshold, @@ -93,10 +108,20 @@ if __name__ == '__main__': args = parser.parse_args() print_arguments(args) + data_dir = 'data/pascalvoc' + label_file = 'label_list' + + if not os.path.exists(args.model_dir): + raise ValueError("The model path [%s] does not exist." % + (args.model_dir)) + if 'coco' in args.dataset: + data_dir = 'data/coco' + label_file = 'annotations/instances_val2014.json' + data_args = reader.Settings( dataset=args.dataset, - data_dir='data/pascalvoc', - label_file='label_list', + data_dir=data_dir, + label_file=label_file, resize_h=args.resize_h, resize_w=args.resize_w, mean_value=[args.mean_value_B, args.mean_value_G, args.mean_value_R], diff --git a/fluid/object_detection/train.py b/fluid/object_detection/train.py index bcc0b7365a1f12ec3b033eeab0eb5daa41f4bb1b..c29bd070eda4cf82f5ac36a3eb5699ae13ae86d2 100644 --- a/fluid/object_detection/train.py +++ b/fluid/object_detection/train.py @@ -19,7 +19,6 @@ add_arg('batch_size', int, 64, "Minibatch size.") add_arg('num_passes', int, 120, "Epoch number.") add_arg('use_gpu', bool, True, "Whether use GPU.") add_arg('parallel', bool, True, "Parallel.") -add_arg('use_nccl', bool, True, "NCCL.") add_arg('dataset', str, 'pascalvoc', "coco2014, coco2017, and pascalvoc.") add_arg('model_save_dir', str, 'model', "The path to save model.") add_arg('pretrained_model', str, 'pretrained/ssd_mobilenet_v1_coco/', "The init model path.") @@ -35,141 +34,16 @@ add_arg('mean_value_R', float, 127.5, "Mean value for R channel which will add_arg('is_toy', int, 0, "Toy for quick debug, 0 means using all data, while n means using only n sample.") #yapf: enable -def parallel_do(args, - train_file_list, - val_file_list, - data_args, - learning_rate, - batch_size, - num_passes, - model_save_dir, - pretrained_model=None): - image_shape = [3, data_args.resize_h, data_args.resize_w] - if data_args.dataset == 'coco': - num_classes = 81 - elif data_args.dataset == 'pascalvoc': - num_classes = 21 - - image = fluid.layers.data(name='image', shape=image_shape, dtype='float32') - gt_box = fluid.layers.data( - name='gt_box', shape=[4], dtype='float32', lod_level=1) - gt_label = fluid.layers.data( - name='gt_label', shape=[1], dtype='int32', lod_level=1) - difficult = fluid.layers.data( - name='gt_difficult', shape=[1], dtype='int32', lod_level=1) - - if args.parallel: - places = fluid.layers.get_places() - pd = fluid.layers.ParallelDo(places, use_nccl=args.use_nccl) - with pd.do(): - image_ = pd.read_input(image) - gt_box_ = pd.read_input(gt_box) - gt_label_ = pd.read_input(gt_label) - difficult_ = pd.read_input(difficult) - locs, confs, box, box_var = mobile_net(num_classes, image_, - image_shape) - loss = fluid.layers.ssd_loss(locs, confs, gt_box_, gt_label_, box, - box_var) - nmsed_out = fluid.layers.detection_output( - locs, confs, box, box_var, nms_threshold=0.45) - loss = fluid.layers.reduce_sum(loss) - pd.write_output(loss) - pd.write_output(nmsed_out) - - loss, nmsed_out = pd() - loss = fluid.layers.mean(loss) - else: - locs, confs, box, box_var = mobile_net(num_classes, image, image_shape) - nmsed_out = fluid.layers.detection_output( - locs, confs, box, box_var, nms_threshold=0.45) - loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box, - box_var) - loss = fluid.layers.reduce_sum(loss) - - test_program = fluid.default_main_program().clone(for_test=True) - with fluid.program_guard(test_program): - map_eval = fluid.evaluator.DetectionMAP( - nmsed_out, - gt_label, - gt_box, - difficult, - num_classes, - overlap_threshold=0.5, - evaluate_difficult=False, - ap_version=args.ap_version) - - if data_args.dataset == 'coco': - # learning rate decay in 12, 19 pass, respectively - if '2014' in train_file_list: - boundaries = [82783 / batch_size * 12, 82783 / batch_size * 19] - elif '2017' in train_file_list: - boundaries = [118287 / batch_size * 12, 118287 / batch_size * 19] - elif data_args.dataset == 'pascalvoc': - boundaries = [40000, 60000] - values = [learning_rate, learning_rate * 0.5, learning_rate * 0.25] - optimizer = fluid.optimizer.RMSProp( - learning_rate=fluid.layers.piecewise_decay(boundaries, values), - regularization=fluid.regularizer.L2Decay(0.00005), ) - - optimizer.minimize(loss) - - place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace() - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - if pretrained_model: - def if_exist(var): - return os.path.exists(os.path.join(pretrained_model, var.name)) - fluid.io.load_vars(exe, pretrained_model, predicate=if_exist) - - train_reader = paddle.batch( - reader.train(data_args, train_file_list), batch_size=batch_size) - test_reader = paddle.batch( - reader.test(data_args, val_file_list), batch_size=batch_size) - feeder = fluid.DataFeeder( - place=place, feed_list=[image, gt_box, gt_label, difficult]) - - def test(pass_id): - _, accum_map = map_eval.get_map_var() - map_eval.reset(exe) - test_map = None - for data in test_reader(): - test_map = exe.run(test_program, - feed=feeder.feed(data), - fetch_list=[accum_map]) - print("Pass {0}, test map {1}".format(pass_id, test_map[0])) - - for pass_id in range(num_passes): - start_time = time.time() - prev_start_time = start_time - end_time = 0 - for batch_id, data in enumerate(train_reader()): - prev_start_time = start_time - start_time = time.time() - loss_v = exe.run(fluid.default_main_program(), - feed=feeder.feed(data), - fetch_list=[loss]) - end_time = time.time() - if batch_id % 20 == 0: - print("Pass {0}, batch {1}, loss {2}, time {3}".format( - pass_id, batch_id, loss_v[0], start_time - prev_start_time)) - test(pass_id) - - if pass_id % 10 == 0 or pass_id == num_passes - 1: - model_path = os.path.join(model_save_dir, str(pass_id)) - print 'save models to %s' % (model_path) - fluid.io.save_persistables(exe, model_path) - -def parallel_exe(args, - train_file_list, - val_file_list, - data_args, - learning_rate, - batch_size, - num_passes, - model_save_dir, - pretrained_model=None): +def train(args, + train_file_list, + val_file_list, + data_args, + learning_rate, + batch_size, + num_passes, + model_save_dir, + pretrained_model=None): image_shape = [3, data_args.resize_h, data_args.resize_w] if 'coco' in data_args.dataset: num_classes = 91 @@ -186,10 +60,6 @@ def parallel_exe(args, name='gt_label', shape=[1], dtype='int32', lod_level=1) difficult = fluid.layers.data( name='gt_difficult', shape=[1], dtype='int32', lod_level=1) - gt_iscrowd = fluid.layers.data( - name='gt_iscrowd', shape=[1], dtype='int32', lod_level=1) - gt_image_info = fluid.layers.data( - name='gt_image_id', shape=[3], dtype='int32', lod_level=1) locs, confs, box, box_var = mobile_net(num_classes, image, image_shape) nmsed_out = fluid.layers.detection_output( @@ -267,15 +137,15 @@ def parallel_exe(args, _, accum_map = map_eval.get_map_var() map_eval.reset(exe) for batch_id, data in enumerate(test_reader()): - test_map = exe.run(test_program, + test_map, = exe.run(test_program, feed=feeder.feed(data), fetch_list=[accum_map]) if batch_id % 20 == 0: - print("Batch {0}, map {1}".format(batch_id, test_map[0])) + print("Batch {0}, map {1}".format(batch_id, test_map)) if test_map[0] > best_map: best_map = test_map[0] save_model('best_model') - print("Pass {0}, test map {1}".format(pass_id, test_map[0])) + print("Pass {0}, test map {1}".format(pass_id, test_map)) return best_map for pass_id in range(num_passes): @@ -285,7 +155,9 @@ def parallel_exe(args, for batch_id, data in enumerate(train_reader()): prev_start_time = start_time start_time = time.time() - if len(data) < devices_num: continue + if len(data) < (devices_num * 2): + print("There are too few data to train on all devices.") + continue if args.parallel: loss_v, = train_exe.run(fetch_list=[loss.name], feed=feeder.feed(data)) @@ -314,10 +186,10 @@ if __name__ == '__main__': label_file = 'label_list' model_save_dir = args.model_save_dir if 'coco' in args.dataset: - data_dir = './data/coco' + data_dir = 'data/coco' if '2014' in args.dataset: train_file_list = 'annotations/instances_train2014.json' - val_file_list = 'annotations/instances_minival2014.json' + val_file_list = 'annotations/instances_val2014.json' elif '2017' in args.dataset: train_file_list = 'annotations/instances_train2017.json' val_file_list = 'annotations/instances_val2017.json' @@ -333,8 +205,7 @@ if __name__ == '__main__': apply_expand=args.apply_expand, ap_version = args.ap_version, toy=args.is_toy) - method = parallel_exe - method( + train( args, train_file_list=train_file_list, val_file_list=val_file_list,