index.html 13.6 KB
Newer Older
Y
yangyaming 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

<html>
<head>
  <script type="text/x-mathjax-config">
  MathJax.Hub.Config({
    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
    jax: ["input/TeX", "output/HTML-CSS"],
    tex2jax: {
      inlineMath: [ ['$','$'] ],
      displayMath: [ ['$$','$$'] ],
      processEscapes: true
    },
    "HTML-CSS": { availableFonts: ["TeX"] }
  });
  </script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
  <script type="text/javascript" src="../.tools/theme/marked.js">
  </script>
  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
</head>
<style type="text/css" >
.markdown-body {
    box-sizing: border-box;
    min-width: 200px;
    max-width: 980px;
    margin: 0 auto;
    padding: 45px;
}
</style>


<body>

<div id="context" class="container-fluid markdown-body">
</div>

<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
43
# Single Shot MultiBox Detector (SSD) Object Detection
Y
yangyaming 已提交
44

45 46
## Introduction
Single Shot MultiBox Detector (SSD) is one of the new and enhanced detection algorithms detecting objects in images [ 1 ]. SSD algorithm is characterized by rapid detection and high detection accuracy. PaddlePaddle has an integrated SSD algorithm! This example demonstrates how to use the SSD model in PaddlePaddle for object detection. We first provide a brief introduction to the SSD principle. Then we describe how to train, evaluate and test on the PASCAL VOC data set, and finally on how to use SSD on custom data set.
Y
yangyaming 已提交
47

48 49 50 51 52 53 54 55
## SSD Architecture
SSD uses a convolutional neural network to achieve end-to-end detection. The term "End-to-end" is used because it uses the input as the original image and the output for the test results, without the use of external tools or processes for feature extraction. One popular model of SSD is VGG16 [ 2 ]. SSD differs from VGG16 network model as in following.

1. The final fc6, fc7 full connection layer into a convolution layer, convolution layer parameters through the original fc6, fc7 parameters obtained.
2. Change the parameters of the pool5 layer from 2x2-s2 (kernel size 2x2, stride size to 2) to 3x3-s1-p1 (kernel size is 3x3, stride size is 1, padding size is 1).
3. The initial layers are composed of conv4\_3、conv7、conv8\_2、conv9\_2、conv10\_2, and pool11 layers. The main purpose of the priorbox layer is to generate a series of rectangular candidates based on the input feature map. A more detailed introduction to SSD can be found in the paper\[[1](#References)\]。

Below is the overall structure of the model (300x300)
Y
yangyaming 已提交
56 57

<p align="center">
58
<img src="images/ssd_network.png" width="900" height="250" hspace='10'/> <br/>
Y
yangyaming 已提交
59 60 61
图1. SSD网络结构
</p>

62
Each box in the figure represents a convolution layer, and the last two rectangles represent the summary of each convolution layer output and the post-processing phase. Specifically, the network will output a set of candidate rectangles in the prediction phase. Each rectangle contains two types of information: the position and the category score. The network produces thousands of predictions at various scales and aspect ratios before performing non-maximum suppression, resulting in a handful of final tags.
Y
yangyaming 已提交
63

64 65
## Example Overview
This example contains the following files:
Y
yangyaming 已提交
66

Y
yangyaming 已提交
67
<table>
68 69 70 71 72 73 74 75 76 77 78
<caption>Table 1. Directory structure</caption>
<tr><th>File</th><th>Description</th></tr>
<tr><td>train.py</td><td>Training script</td></tr>
<tr><td>eval.py</td><td>Evaluation</td></tr>
<tr><td>infer.py</td><td>Prediction using the trained model</tr>
<tr><td>visual.py</td><td>Visualization of the test results</td></tr>
<tr><td>image_util.py</td><td>Image preprocessing required common function</td></tr>
<tr><td>data_provider.py</td><td>Data processing scripts, generate training, evaluate or detect the required data</td></tr>
<tr><td>config/pascal_voc_conf.py</td><td>  Neural network hyperparameter configuration file</td></tr>
<tr><td>data/label_list</td><td>Label list</td></tr>
<tr><td>data/prepare_voc_data.py</td><td>Prepare training PASCAL VOC data list</td></tr>
Y
yangyaming 已提交
79
</table>
80

81
The training phase requires pre-processing of the data, including clipping, sampling, etc. This is done in ```image_util.py``` and ```data_provider.py```.```config/vgg_config.py```. ```data/prepare_voc_data.py``` is used to generate a list of files, including the training set and test set, the need to use the user to download and extract data, the default use of VOC2007 and VOC2012.
Y
yangyaming 已提交
82

83 84 85 86
## PASCAL VOC Data set

### Data Preparation
First download the data set. VOC2007\[[3](#References)\] contains both training and test data set, and VOC2012\[[4](#References)\] contains only training set. Downloaded data are stored in ```data/VOCdevkit/VOC2007``` and ```data/VOCdevkit/VOC2012```. Next, run ```data/prepare_voc_data.py``` to generate ```trainval.txt``` and ```test.txt```. The relevant function is as following:
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105

```python
def prepare_filelist(devkit_dir, years, output_dir):
    trainval_list = []
    test_list = []
    for year in years:
        trainval, test = walk_dir(devkit_dir, year)
        trainval_list.extend(trainval)
        test_list.extend(test)
    random.shuffle(trainval_list)
    with open(osp.join(output_dir, 'trainval.txt'), 'w') as ftrainval:
        for item in trainval_list:
            ftrainval.write(item[0] + ' ' + item[1] + '\n')

    with open(osp.join(output_dir, 'test.txt'), 'w') as ftest:
        for item in test_list:
            ftest.write(item[0] + ' ' + item[1] + '\n')
```

106
The data in ```trainval.txt``` will look like:
107 108 109 110 111 112 113

```
VOCdevkit/VOC2007/JPEGImages/000005.jpg VOCdevkit/VOC2007/Annotations/000005.xml
VOCdevkit/VOC2007/JPEGImages/000007.jpg VOCdevkit/VOC2007/Annotations/000007.xml
VOCdevkit/VOC2007/JPEGImages/000009.jpg VOCdevkit/VOC2007/Annotations/000009.xml
```

114 115
The first field is the relative path of the image file, and the second field is the relative path of the corresponding label file.

116

117 118 119 120 121
### To Use Pre-trained Model
We also provide a pre-trained model using VGG-16 with good performance. To use the model, download the file http://paddlepaddle.bj.bcebos.com/model_zoo/detection/ssd_model/vgg_model.tar.gz, and place it as ```vgg/vgg_model.tar.gz```。

### Training
Next, run ```python train.py``` to train the model. Note that this example only supports the CUDA GPU environment, and can not be trained on only CPU. This is mainly because the training is very slow using CPU only.
Y
yangyaming 已提交
122 123 124 125 126 127 128 129 130 131 132 133

```python
paddle.init(use_gpu=True, trainer_count=4)
data_args = data_provider.Settings(
                data_dir='./data',
                label_file='label_list',
                resize_h=cfg.IMG_HEIGHT,
                resize_w=cfg.IMG_WIDTH,
                mean_value=[104,117,124])
train(train_file_list='./data/trainval.txt',
      dev_file_list='./data/test.txt',
      data_args=data_args,
Y
yangyaming 已提交
134
      init_model_path='./vgg/vgg_model.tar.gz')
Y
yangyaming 已提交
135 136
```

137
Below is a description about this script:
Y
yangyaming 已提交
138

139 140 141 142
1. Call ```paddle.init``` with 4 GPUs.
2. ```data_provider.Settings()``` is to pass configuration parameters. For ```config/vgg_config.py``` setting,300x300 is a typical configuration for both the accuracy and efficiency. It can be extended to 512x512 by modifying the configuration file.
3. In ```train()```执 function, ```train_file_list``` specifies the training data list, and ```dev_file_list``` specifies the evaluation data list, and ```init_model_path``` specifies the pre-training model location.
4. During the training process will print some log information, each training a batch will output the current number of rounds, the current batch cost and mAP (mean Average Precision. Each training pass will be saved a model to the default saved directory ```heckpoints``` (Need to be created in advance).
Y
yangyaming 已提交
143

144
The following shows the SDD300x300 in the VOC data set.
145 146 147 148 149 150 151

<p align="center">
<img src="images/SSD300x300_map.png" hspace='10'/> <br/>
图2. SSD300x300 mAP收敛曲线
</p>


152 153
### Model Assessment
Next, run ```python eval.py``` to evaluate the model.
Y
yangyaming 已提交
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171

```python
paddle.init(use_gpu=True, trainer_count=4)  # use 4 gpus

data_args = data_provider.Settings(
    data_dir='./data',
    label_file='label_list',
    resize_h=cfg.IMG_HEIGHT,
    resize_w=cfg.IMG_WIDTH,
    mean_value=[104, 117, 124])

eval(
    eval_file_list='./data/test.txt',
    batch_size=4,
    data_args=data_args,
    model_path='models/pass-00000.tar.gz')
```

172 173
### Obejct Detection
Run ```python infer.py``` to perform the object detection using the trained model.
Y
yangyaming 已提交
174 175 176 177 178 179 180 181 182 183 184

```python
infer(
    eval_file_list='./data/infer.txt',
    save_path='infer.res',
    data_args=data_args,
    batch_size=4,
    model_path='models/pass-00000.tar.gz',
    threshold=0.3)
```

185 186 187

Here ```eval_file_list``` specified image path list, ```save_path``` specifies directory to save the prediction result.

188 189 190 191 192 193 194 195 196 197

```
VOCdevkit/VOC2007/JPEGImages/006936.jpg 12 0.997844 131.255611777 162.271582842 396.475315094 334.0
VOCdevkit/VOC2007/JPEGImages/006936.jpg 14 0.998557 229.160234332 49.5991278887 314.098775387 312.913876176
VOCdevkit/VOC2007/JPEGImages/006936.jpg 14 0.372522 187.543615699 133.727034628 345.647156239 327.448492289
...
```

一共包含4个字段,以tab分割,第一个字段是检测图像路径,第二字段为检测矩形框内类别,第三个字段是置信度,第四个字段是4个坐标值(以空格分割)。

198
Below is the example after running ```python visual.py``` to visualize the model result. The default visualization of the image saved in the ```./visual_res```.
199 200 201 202 203 204

<p align="center">
<img src="images/vis_1.jpg" height=150 width=200 hspace='10'/>
<img src="images/vis_2.jpg" height=150 width=200 hspace='10'/>
<img src="images/vis_3.jpg" height=150 width=100 hspace='10'/>
<img src="images/vis_4.jpg" height=150 width=200 hspace='10'/> <br />
205
Figure 3. SSD300x300 Visualization Example
206 207
</p>

Y
yangyaming 已提交
208

209 210
## To Use Custo Data set
In PaddlePaddle, using the custom data set to train SSD model is also easy! Just input the format that ```train.txt``` can understand. Below is a recommended structure to input for ```train.txt```.
Y
yangyaming 已提交
211 212 213 214 215 216 217 218

```
image00000_file_path    image00000_annotation_file_path
image00001_file_path    image00001_annotation_file_path
image00002_file_path    image00002_annotation_file_path
...
```

219
The first column is for the image file path, and the second column for the corresponding marked data file path. In the case of using xml file format, ```data_provider.py``` can be used to process the data as follows.
Y
yangyaming 已提交
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238

```python
bbox_labels = []
root = xml.etree.ElementTree.parse(label_path).getroot()
for object in root.findall('object'):
    bbox_sample = []
    # start from 1
    bbox_sample.append(float(settings.label_list.index(
         object.find('name').text)))
    bbox = object.find('bndbox')
    difficult = float(object.find('difficult').text)
    bbox_sample.append(float(bbox.find('xmin').text)/img_width)
    bbox_sample.append(float(bbox.find('ymin').text)/img_height)
    bbox_sample.append(float(bbox.find('xmax').text)/img_width)
    bbox_sample.append(float(bbox.find('ymax').text)/img_height)
    bbox_sample.append(difficult)
    bbox_labels.append(bbox_sample)
```

239
Now the marked data(e.g. image00000\_annotation\_file\_path)is as follows:
Y
yangyaming 已提交
240 241 242 243 244 245 246

```
label1 xmin1 ymin1 xmax1 ymax1
label2 xmin2 ymin2 xmax2 ymax2
...
```

247
Here each row corresponds to an object for 5 fields. The first is for the label (note the background 0, need to be numbered from 1), and the remaining four are for the coordinates.
Y
yangyaming 已提交
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264

```
bbox_labels = []
with open(label_path) as flabel:
    for line in flabel:
        bbox_sample = []
        bbox = [float(i) for i in line.strip().split()]
        label = bbox[0]
        bbox_sample.append(label)
        bbox_sample.append(bbox[1]/float(img_width))
        bbox_sample.append(bbox[2]/float(img_height))
        bbox_sample.append(bbox[3]/float(img_width))
        bbox_sample.append(bbox[4]/float(img_height))
        bbox_sample.append(0.0)
        bbox_labels.append(bbox_sample)
```

265
Another important thing is to change the size of the image and the size of the object to change the configuration of the network structure. Use ```config/vgg_config.py``` to create the custom configuration file. For more details, please refer to \[[1](#References)\]。
Y
yangyaming 已提交
266

267
## References
Y
yangyaming 已提交
268 269
1. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. [SSD: Single shot multibox detector](https://arxiv.org/abs/1512.02325). European conference on computer vision. Springer, Cham, 2016.
2. Simonyan, Karen, and Andrew Zisserman. [Very deep convolutional networks for large-scale image recognition](https://arxiv.org/abs/1409.1556). arXiv preprint arXiv:1409.1556 (2014).
Y
yangyaming 已提交
270 271
3. [The PASCAL Visual Object Classes Challenge 2007](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html)
4. [Visual Object Classes Challenge 2012 (VOC2012)](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html)
Y
yangyaming 已提交
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293

</div>
<!-- You can change the lines below now. -->

<script type="text/javascript">
marked.setOptions({
  renderer: new marked.Renderer(),
  gfm: true,
  breaks: false,
  smartypants: true,
  highlight: function(code, lang) {
    code = code.replace(/&amp;/g, "&")
    code = code.replace(/&gt;/g, ">")
    code = code.replace(/&lt;/g, "<")
    code = code.replace(/&nbsp;/g, " ")
    return hljs.highlightAuto(code, [lang]).value;
  }
});
document.getElementById("context").innerHTML = marked(
        document.getElementById("markdown").innerHTML)
</script>
</body>