README.md 8.2 KB
Newer Older
J
jerrywgz 已提交
1
# RCNN Objective Detection
J
jerrywgz 已提交
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

---
## Table of Contents

- [Installation](#installation)
- [Introduction](#introduction)
- [Data preparation](#data-preparation)
- [Training](#training)
- [Evaluation](#evaluation)
- [Inference and Visualization](#inference-and-visualization)
- [Appendix](#appendix)

## Installation

Running sample code in this directory requires PaddelPaddle Fluid v.1.0.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/0.15.0/beginners_guide/install/install_doc.html#paddlepaddle) and make an update.

## Introduction

J
jerrywgz 已提交
20 21 22 23
Region Convolutional Neural Network (RCNN) models are two stages detector. According to proposals and feature extraction, obtain class and more precise proposals.
Now RCNN model contains two typical models: Faster RCNN and Mask RCNN.

[Faster RCNN](https://arxiv.org/abs/1506.01497), The total framework of network can be divided into four parts:
J
jerrywgz 已提交
24

J
jerrywgz 已提交
25 26 27 28
1. Base conv layer. As a CNN objective dection, Faster RCNN extract feature maps using a basic convolutional network. The feature maps then can be shared by RPN and fc layers. This sampel uses [ResNet-50](https://arxiv.org/abs/1512.03385) as base conv layer.
2. Region Proposal Network (RPN). RPN generates proposals for detection。This block generates anchors by a set of size and ratio and classifies anchors into fore-ground and back-ground by softmax. Then refine anchors to obtain more precise proposals using box regression.
3. RoI Align. This layer takes feature maps and proposals as input. The proposals are mapped to feature maps and pooled to the same size. The output are sent to fc layers for classification and regression. RoIPool and RoIAlign are used separately to this layer and it can be set in roi\_func in config.py.
4. Detection layer. Using the output of roi pooling to compute the class and locatoin of each proposal in two fc layers.
J
jerrywgz 已提交
29

J
jerrywgz 已提交
30 31 32 33
[Mask RCNN](https://arxiv.org/abs/1703.06870) is a classical instance segmentation model and an extension of Faster RCNN

Mask RCNN is a two stage model as well. At the first stage, it generates proposals from input images. At the second stage, it obtains class result, bbox and mask which is the result from segmentation branch on original Faster RCNN model. It decouples the relation between mask and classification.  

J
jerrywgz 已提交
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
## Data preparation

Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download dataset as below:

    cd dataset/coco
    ./download.sh


## Training

**download the pre-trained model:** This sample provides Resnet-50 pre-trained model which is converted from Caffe. The model fuses the parameters in batch normalization layer. One can download pre-trained model as:

    sh ./pretrained/download.sh

Set `pretrained_model` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well.
J
jerrywgz 已提交
49
Please make sure that pretrained_model is downloaded and loaded correctly, otherwise, the loss may be NAN during training.
J
jerrywgz 已提交
50

J
jerrywgz 已提交
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
**Install the [cocoapi](https://github.com/cocodataset/cocoapi):**

To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed. Install the cocoapi:

    # COCOAPI=/path/to/clone/cocoapi
    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
    cd $COCOAPI/PythonAPI
    # if cython is not installed
    pip install Cython
    # Install into global site-packages
    make install
    # Alternatively, if you do not have permissions or prefer
    # not to install the COCO API into global site-packages
    python2 setup.py install --user

J
jerrywgz 已提交
66 67 68 69
After data preparation, one can start the training step by:

    python train.py \
       --model_save_dir=output/ \
J
jerrywgz 已提交
70 71 72
       --pretrained_model=${path_to_pretrain_model} \
       --data_dir=${path_to_data} \
       --MASK_ON=False
J
jerrywgz 已提交
73 74

- Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train.
J
jerrywgz 已提交
75
- Set ```MASK\_ON``` to choose Faster RCNN or Mask RCNN model.
J
jerrywgz 已提交
76 77 78 79
- For more help on arguments:

    python train.py --help

J
jerrywgz 已提交
80 81 82 83 84 85 86 87 88
**data reader introduction:**

* Data reader is defined in `reader.py`.
* Scaling the short side of all images to `scales`. If the long side is larger than `max_size`, then scaling the long side to `max_size`.
* In training stage, images are horizontally flipped.
* Images in the same batch can be padding to the same size.

**model configuration:**

J
jerrywgz 已提交
89
* Use RoIAlign and RoIPool separately.
J
jerrywgz 已提交
90 91 92 93 94 95 96 97
* NMS threshold=0.7. During training, pre\_nms=12000, post\_nms=2000; during test, pre\_nms=6000, post\_nms=1000.
* In generating proposal lables, fg\_fraction=0.25, fg\_thresh=0.5, bg\_thresh_hi=0.5, bg\_thresh\_lo=0.0.
* In rpn target assignment, rpn\_fg\_fraction=0.5, rpn\_positive\_overlap=0.7, rpn\_negative\_overlap=0.3.

**training strategy:**

*  Use momentum optimizer with momentum=0.9.
*  Weight decay is 0.0001.
J
jerrywgz 已提交
98
*  In first 500 iteration, the learning rate increases linearly from 0.00333 to 0.01. Then lr is decayed at 120000, 160000 iteration with multiplier 0.1, 0.01. The maximum iteration is 180000. Also, we released a 2x model which has 360000 iterations and lr is decayed at 240000, 320000. These configuration can be set by max_iter and lr_steps in config.py.
J
jerrywgz 已提交
99 100 101 102 103
*  Set the learning rate of bias to two times as global lr in non basic convolutional layers.
*  In basic convolutional layers, parameters of affine layers and res body do not update.

## Evaluation

J
jerrywgz 已提交
104
Evaluation is to evaluate the performance of a trained model. This sample provides `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval).
J
jerrywgz 已提交
105 106 107 108 109

`eval_coco_map.py` is the main executor for evalution, one can start evalution step by:

    python eval_coco_map.py \
        --dataset=coco2017 \
J
jerrywgz 已提交
110
        --pretrained_model=${path_to_pretrain_model} \
J
jerrywgz 已提交
111

J
jerrywgz 已提交
112 113
- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to eval.

J
jerrywgz 已提交
114 115
Evalutaion result is shown as below:

J
jerrywgz 已提交
116 117
Faster RCNN:

J
jerrywgz 已提交
118 119
| Model              | RoI function    | Batch size     | Max iteration    | mAP  |
| :--------------- | :--------: | :------------:    | :------------------:    |------: |
J
jerrywgz 已提交
120 121 122 123
| [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8   |    180000        | 0.316 |
| [Fluid RoIPool no padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_no_padding.tar.gz)  | RoIPool | 8   |    180000        | 0.318 |
| [Fluid RoIAlign no padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_align_no_padding.tar.gz)  | RoIAlign | 8   |    180000        | 0.348 |
| [Fluid RoIAlign no padding 2x](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_align_no_padding_2x.tar.gz)  | RoIAlign | 8   |    360000        | 0.367 |
J
jerrywgz 已提交
124 125 126 127

* Fluid RoIPool minibatch padding: Use RoIPool. Images in one batch padding to the same size. This method is same as detectron.
* Fluid RoIPool no padding: Images without padding.
* Fluid RoIAlign no padding: Images without padding.
J
jerrywgz 已提交
128
* Fluid RoIAlign no padding 2x: Images without padding, train for 360000 iterations, learning rate is decayed at 240000, 320000.
J
jerrywgz 已提交
129

J
jerrywgz 已提交
130 131 132 133 134 135 136 137
Mask RCNN:

| Model              | Batch size     | Max iteration | box mAP | mask mAP |
| :--------------- | :--------: | :------------:    | :--------:    |------: |
| [Fluid mask no padding](https://paddlemodels.bj.bcebos.com/faster_rcnn/Fluid_mask_no_padding.tar.gz) | 8 | 180000 | 0.359 | 0.314 |

* Fluid mask no padding: Use RoIAlign. Images without padding.

J
jerrywgz 已提交
138 139 140 141 142 143 144
## Inference and Visualization

Inference is used to get prediction score or image features based on trained models. `infer.py`  is the main executor for inference, one can start infer step by:

    python infer.py \
       --dataset=coco2017 \
        --pretrained_model=${path_to_pretrain_model}  \
145
        --image_path=dataset/coco/val2017/  \
J
jerrywgz 已提交
146 147 148 149 150 151
        --image_name=000000000139.jpg \
        --draw_threshold=0.6

Visualization of infer result is shown as below:
<p align="center">
<img src="image/000000000139.jpg" height=300 width=400 hspace='10'/>
J
jerrywgz 已提交
152
<img src="image/000000127517.jpg" height=300 width=400 hspace='10'/> <br />
J
jerrywgz 已提交
153 154
Faster RCNN Visualization Examples
</p>
J
jerrywgz 已提交
155 156 157 158 159 160

<p align="center">
<img src="image/000000000139_mask.jpg" height=300 width=400 hspace='10'/>
<img src="image/000000127517_mask.jpg" height=300 width=400 hspace='10'/> <br />
Mask RCNN Visualization Examples
</p>