README.md 6.7 KB
Newer Older
X
Xingyuan Bu 已提交
1
The minimum PaddlePaddle version needed for the code sample in this directory is the latest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
D
dangqingqing 已提交
2 3 4

---

D
dangqingqing 已提交
5
## SSD Object Detection
D
dangqingqing 已提交
6

X
Xingyuan Bu 已提交
7 8 9 10 11 12 13 14
## Table of Contents
- [Introduction](#introduction)
- [Data Preparation](#data-preparation)
- [Train](#train)
- [Evaluate](#evaluate)
- [Infer and Visualize](#infer-and-visualize)
- [Released Model](#released-model)

D
dangqingqing 已提交
15 16
### Introduction

X
Xingyuan Bu 已提交
17 18 19 20 21 22 23 24
[Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) framework for object detection can be categorized as a single stage detector. A single stage detector simplifies object detection as a regression problem, which directly predicts the bounding boxes and class probabilities without region proposal. SSD further makes improves by producing these predictions of different scales from different layers, as shown below. Six levels predictions are made in six different scale feature maps. And there are two 3x3 convolutional layers in each feature map, which predict category or a shape offset relative to the prior box(also called anchor), respectively. Thus, we get 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 detections per class.
<p align="center">
<img src="images/SSD_paper_figure.jpg" height=300 width=900 hspace='10'/> <br />
The Single Shot MultiBox Detector (SSD)
</p>

SSD is readily pluggable into a wide variant standard convolutional network, such as VGG, ResNet, or MobileNet, which is also called base network or backbone. In this tutorial we used [MobileNet](https://arxiv.org/abs/1704.04861).

D
dangqingqing 已提交
25 26 27 28 29

### Data Preparation

You can use [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) or [MS-COCO dataset](http://cocodataset.org/#download).

X
Xingyuan Bu 已提交
30
If you want to train a model on PASCAL VOC dataset, please download dataset at first, skip this step if you already have one.
D
dangqingqing 已提交
31 32

```bash
D
dangqingqing 已提交
33 34
cd data/pascalvoc
./download.sh
D
dangqingqing 已提交
35
```
D
dangqingqing 已提交
36 37 38

The command `download.sh` also will create training and testing file lists.

X
Xingyuan Bu 已提交
39
If you want to train a model on MS-COCO dataset, please download dataset at first, skip this step if you already have one.
D
dangqingqing 已提交
40

D
dangqingqing 已提交
41
```
D
dangqingqing 已提交
42 43
cd data/coco
./download.sh
D
dangqingqing 已提交
44
```
D
dangqingqing 已提交
45

D
dangqingqing 已提交
46 47
### Train

D
dangqingqing 已提交
48
#### Download the Pre-trained Model.
D
dangqingqing 已提交
49

X
Xingyuan Bu 已提交
50
We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other datasets, like PASCAL VOC. The other pre-trained model is MobileNet-v1 trained on ImageNet 2012 dataset but removed the last weights and bias in the Fully-Connected layer.
D
dangqingqing 已提交
51

X
Xingyuan Bu 已提交
52 53
Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md). The MobileNet-v1 model is converted from [Caffe](https://github.com/shicai/MobileNet-Caffe).
We will release the pre-trained models by ourself in the upcoming soon.
D
dangqingqing 已提交
54

D
dangqingqing 已提交
55
  - Download MobileNet-v1 SSD:
X
Xingyuan Bu 已提交
56
    ```bash
D
dangqingqing 已提交
57 58 59
    ./pretrained/download_coco.sh
    ```
  - Download MobileNet-v1:
X
Xingyuan Bu 已提交
60
    ```bash
D
dangqingqing 已提交
61 62 63 64 65
    ./pretrained/download_imagenet.sh
    ```

#### Train on PASCAL VOC

X
Xingyuan Bu 已提交
66 67 68
`train.py` is the main caller of the training module. Examples of usage are shown below.
  ```bash
  python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/'
D
dangqingqing 已提交
69
  ```
X
Xingyuan Bu 已提交
70 71 72
   - Set ```export CUDA_VISIBLE_DEVICES=0,1``` to specifiy the number of GPU you want to use.
   - Set ```--dataset='coco2014'``` or ```--dataset='coco2017'``` to train model on MS COCO dataset.
   - For more help on arguments:
D
dangqingqing 已提交
73

X
Xingyuan Bu 已提交
74 75
  ```bash
  python train.py --help
D
dangqingqing 已提交
76 77
  ```

X
Xingyuan Bu 已提交
78 79 80 81 82 83 84
Data reader is defined in `reader.py`. All images will be resized to 300x300. In training stage, images are randomly distorted, expanded, cropped and flipped:
   - distort: distort brightness, contrast, saturation, and hue.
   - expand: put the original image into a larger expanded image which is initialized using image mean.
   - crop: crop image with respect to different scale, aspect ratio, and overlap.
   - flip: flip horizontally.

We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achieve 73.32% mAP under 11point metric.
D
dangqingqing 已提交
85 86 87

### Evaluate

X
Xingyuan Bu 已提交
88 89 90 91 92 93 94 95
You can evaluate your trained model in different metrics like 11point, integral on both PASCAL VOC and COCO dataset. Note we set the default test list to the dataset's test/val list, you can use your own test list by setting ```--test_list``` args.

`eval.py` is the main caller of the evaluating module. Examples of usage are shown below.
```bash
python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45
```

You can set ```--dataset``` to ```coco2014``` or ```coco2017``` to evaluate COCO dataset. Moreover, we provide `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). To use this eval_coco_map.py, [cocoapi](https://github.com/cocodataset/cocoapi) is needed.
96 97 98 99 100 101 102 103 104 105 106
Install the cocoapi:
```
# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python2 setup.py install --user
```
D
dangqingqing 已提交
107

D
dangqingqing 已提交
108
### Infer and Visualize
X
Xingyuan Bu 已提交
109 110 111
`infer.py` is the main caller of the inferring module. Examples of usage are shown below.
```bash
python infer.py --dataset='pascalvoc' --nms_threshold=0.45 --model_dir='train_pascal_model/best_model' --image_path='./data/pascalvoc/VOCdevkit/VOC2007/JPEGImages/009963.jpg'
D
dangqingqing 已提交
112
```
X
Xingyuan Bu 已提交
113
Below are the examples of running the inference and visualizing the model result.
114
<p align="center">
X
Xingyuan Bu 已提交
115 116 117 118 119
<img src="images/009943.jpg" height=300 width=400 hspace='10'/>
<img src="images/009956.jpg" height=300 width=400 hspace='10'/>
<img src="images/009960.jpg" height=300 width=400 hspace='10'/>
<img src="images/009962.jpg" height=300 width=400 hspace='10'/> <br />
MobileNet-v1-SSD 300x300 Visualization Examples
120
</p>
D
dangqingqing 已提交
121 122


D
dangqingqing 已提交
123 124 125
### Released Model


D
dangqingqing 已提交
126 127
| Model                    | Pre-trained Model  | Training data    | Test data    | mAP |
|:------------------------:|:------------------:|:----------------:|:------------:|:----:|
X
Xingyuan Bu 已提交
128
|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | 73.32%  |