GETTING_STARTED.md 6.8 KB
Newer Older
1 2
# Getting Started

K
Kaipeng Deng 已提交
3
For setting up the running environment, please refer to [installation
Y
Yang Zhang 已提交
4
instructions](INSTALL.md).
5 6


Y
Yang Zhang 已提交
7 8 9
## Training

#### Single-GPU Training
10 11 12 13


```bash
export CUDA_VISIBLE_DEVICES=0
14
export PYTHONPATH=$PYTHONPATH:.
15 16 17
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```

Y
Yang Zhang 已提交
18 19
#### Multi-GPU Training

20
```bash
Y
Yang Zhang 已提交
21
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
22 23 24 25 26 27 28 29 30
export PYTHONPATH=$PYTHONPATH:.
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```

#### CPU Training

```bash
export CPU_NUM=8
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
31
python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
32 33
```

34 35 36 37
##### Optional arguments

- `-r` or `--resume_checkpoint`: Checkpoint path for resuming training. Such as: `-r output/faster_rcnn_r50_1x/10000`
- `--eval`: Whether to perform evaluation in training, default is `False`
38
- `--output_eval`: If perform evaluation in training, this edits evaluation directory, default is current directory.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
- `-d` or `--dataset_dir`: Dataset path, same as `dataset_dir` of configs. Such as: `-d dataset/coco`
- `-o`: Set configuration options in config file. Such as: `-o weights=output/faster_rcnn_r50_1x/model_final`


##### Examples

- Perform evaluation in training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
```

Alternating between training epoch and evaluation run is possible, simply pass
in `--eval` to do so and evaluate at each snapshot_iter. It can be modified at `snapshot_iter` of the configuration file. If evaluation dataset is large and
causes time-consuming in training, we suggest decreasing evaluation times or evaluating after training.


- configuration options and assign Dataset path
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
                         -o weights=output/faster_rcnn_r50_1x/model_final \
                         -d dataset/coco
```


##### NOTES

- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU calculation rules can refer [FAQ](#faq)
- Dataset is stored in `dataset/coco` by default (configurable).
- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
Y
Yang Zhang 已提交
72
- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
73
- Model checkpoints are saved in `output` by default (configurable).
Y
Yang Zhang 已提交
74
- To check out hyper parameters used, please refer to the config file.
75
- RCNN models training on CPU is not supported on PaddlePaddle<=1.5.1 and will be fixed on later version.
76 77 78



Y
Yang Zhang 已提交
79
## Evaluation
80 81 82


```bash
W
wangguanzhong 已提交
83
# run on GPU with:
84
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
85
export CUDA_VISIBLE_DEVICES=0
86 87 88
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
```

89 90 91
#### Optional arguments

- `-d` or `--dataset_dir`: Dataset path, same as dataset_dir of configs. Such as: `-d dataset/coco`
92
- `--output_eval`: Evaluation directory, default is current directory.
93 94 95 96 97 98 99
- `-o`: Set configuration options in config file. Such as: `-o weights=output/faster_rcnn_r50_1x/model_final`
- `--json_eval`: Whether to eval with already existed bbox.json or mask.json. Default is `False`. Json file directory is assigned by `-f` argument.

#### Examples

- configuration options && assign Dataset path
```bash
W
wangguanzhong 已提交
100
# run on GPU with:
101
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
102
export CUDA_VISIBLE_DEVICES=0
103 104 105 106 107 108 109
python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
                        -o weights=output/faster_rcnn_r50_1x/model_final \
                        -d dataset/coco
```

- Evaluation with json
```bash
W
wangguanzhong 已提交
110
# run on GPU with:
111
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
112
export CUDA_VISIBLE_DEVICES=0
113
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
W
wangguanzhong 已提交
114 115
             --json_eval \
             -f evaluation/
116 117 118 119 120 121
```

The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory. Or without the `-f` parameter, default is the current directory.

#### NOTES

Y
Yang Zhang 已提交
122 123 124 125
- Checkpoint is loaded from `output` by default (configurable)
- Multi-GPU evaluation for R-CNN and SSD models is not supported at the
moment, but it is a planned feature

126

Y
Yang Zhang 已提交
127
## Inference
128 129


Y
Yang Zhang 已提交
130
- Run inference on a single image:
131 132

```bash
W
wangguanzhong 已提交
133
# run on GPU with:
134
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
135
export CUDA_VISIBLE_DEVICES=0
Y
Yang Zhang 已提交
136
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
137 138
```

139
- Multi-image inference:
140 141

```bash
W
wangguanzhong 已提交
142
# run on GPU with:
143
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
144
export CUDA_VISIBLE_DEVICES=0
145 146 147
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
```

148 149 150 151 152 153 154 155 156 157
#### Optional arguments

- `--output_dir`: Directory for storing the output visualization files.
- `--draw_threshold`: Threshold to reserve the result for visualization. Default is 0.5.
- `--save_inference_model`: Save inference model in output_dir if True.

#### Examples

- Output specified directory && Set up threshold
```bash
W
wangguanzhong 已提交
158
# run on GPU with:
159
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
160
export CUDA_VISIBLE_DEVICES=0
161 162 163 164 165
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
                      --infer_img=demo/000000570688.jpg \
                      --output_dir=infer_output/ \
                      --draw_threshold=0.5
```
Y
Yang Zhang 已提交
166
The visualization files are saved in `output` by default, to specify a different
W
wangguanzhong 已提交
167
path, simply add a `--output_dir=` flag.  
168
`--draw_threshold` is an optional argument. Default is 0.5. Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659)
169

170 171 172
- Save inference model

```bash
W
wangguanzhong 已提交
173
# run on GPU with:
174
export CUDA_VISIBLE_DEVICES=0
175 176 177
export PYTHONPATH=$PYTHONPATH:.
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
                      --infer_img=demo/000000570688.jpg \
178 179 180
                      --save_inference_model
```

K
Kaipeng Deng 已提交
181
Save inference model by set `--save_inference_model`, which can be loaded by PaddlePaddle predict library.
182

183 184 185

## FAQ

Q
qingqing01 已提交
186 187
**Q:**  Why do I get `NaN` loss values during single GPU training? </br>
**A:**  The default learning rate is tuned to multi-GPU training (8x GPUs), it must
W
wangguanzhong 已提交
188 189
be adapted for single GPU training accordingly (e.g., divide by 8).  
The calculation rules are as follows,they are equivalent: </br>  
190

191

W
wangguanzhong 已提交
192 193
| GPU number  | Learning rate  | Max_iters | Milestones       |  
| :---------: | :------------: | :-------: | :--------------: |  
194 195 196
| 2           | 0.0025         | 720000    | [480000, 640000] |
| 4           | 0.005          | 360000    | [240000, 320000] |
| 8           | 0.01           | 180000    | [120000, 160000] |
197

Q
qingqing01 已提交
198 199 200 201 202
**Q:**  How to reduce GPU memory usage? </br>
**A:**  Setting environment variable FLAGS_conv_workspace_size_limit to a smaller
number can reduce GPU memory footprint without affecting training speed.
Take Mask-RCNN (R50) as example, by setting `export FLAGS_conv_workspace_size_limit=512`,
batch size could reach 4 per GPU (Tesla V100 16GB).