GETTING_STARTED.md 9.1 KB
Newer Older
1 2
# Getting Started

K
Kaipeng Deng 已提交
3
For setting up the running environment, please refer to [installation
4 5 6 7 8 9 10 11 12 13
instructions](INSTALL.md).


## Training

#### Single-GPU Training


```bash
export CUDA_VISIBLE_DEVICES=0
14
export PYTHONPATH=$PYTHONPATH:.
15 16 17 18 19 20 21
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```

#### Multi-GPU Training

```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
22 23 24 25 26 27 28 29 30
export PYTHONPATH=$PYTHONPATH:.
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```

#### CPU Training

```bash
export CPU_NUM=8
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
31
python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
32 33
```

34 35 36 37
##### Optional arguments

- `-r` or `--resume_checkpoint`: Checkpoint path for resuming training. Such as: `-r output/faster_rcnn_r50_1x/10000`
- `--eval`: Whether to perform evaluation in training, default is `False`
38
- `--output_eval`: If perform evaluation in training, this edits evaluation directory, default is current directory.
39
- `-d` or `--dataset_dir`: Dataset path, same as `dataset_dir` of configs. Such as: `-d dataset/coco`
40 41
- `-c`: Select config file and all files are saved in `configs/`
- `-o`: Set configuration options in config file. Such as: `-o max_iters=180000`. `-o` has higher priority to file configured by `-c`
42 43
- `--use_tb`: Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard, default is `False`
- `--tb_log_dir`: tb-paddle logging directory for scalar, default is `tb_log_dir/scalar`
44 45
- `--fp16`: Whether to enable mixed precision training (requires GPU), default is `False`
- `--loss_scale`: Loss scaling factor for mixed precision training, default is `8.0`
46 47 48 49 50 51 52 53 54 55 56 57 58


##### Examples

- Perform evaluation in training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
```

Alternating between training epoch and evaluation run is possible, simply pass
in `--eval` to do so and evaluate at each snapshot_iter. It can be modified at `snapshot_iter` of the configuration file. If evaluation dataset is large and
59
causes time-consuming in training, we suggest decreasing evaluation times or evaluating after training. When perform evaluation in training,
60
the best model with highest MAP is saved at each `snapshot_iter`. `best_model` has the same path as `model_final`.
61 62


63
- Configure dataset path
64 65 66 67 68 69 70
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
                         -d dataset/coco
```

71 72 73 74 75 76 77 78 79 80 81
- Fine-tune other task

When using pre-trained model to fine-tune other task, the excluded pre-trained parameters can be set by finetune_exclude_pretrained_params in YAML config or -o finetune_exclude_pretrained_params in the arguments.

```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
                         -o pretrain_weights=output/faster_rcnn_r50_1x/model_final/ \
                            finetune_exclude_pretrained_params = ['cls_score','bbox_pred']
```
82 83 84 85 86 87

##### NOTES

- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU calculation rules can refer [FAQ](#faq)
- Dataset is stored in `dataset/coco` by default (configurable).
- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
88
- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
89
- Model checkpoints are saved in `output` by default (configurable).
90
- When finetuning, users could set `pretrain_weights` to the models published by PaddlePaddle. Parameters matched by fields in finetune_exclude_pretrained_params will be ignored in loading and fields can be wildcard matching. For detailed information, please refer to [Transfer Learning](TRANSFER_LEARNING.md).
W
wangguanzhong 已提交
91
- To check out hyper parameters used, please refer to the [configs](../configs).
92
- RCNN models training on CPU is not supported on PaddlePaddle<=1.5.1 and will be fixed on later version.
93 94 95 96 97 98



## Evaluation

```bash
W
wangguanzhong 已提交
99
# run on GPU with:
100
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
101
export CUDA_VISIBLE_DEVICES=0
102 103 104
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
```

105 106 107
#### Optional arguments

- `-d` or `--dataset_dir`: Dataset path, same as dataset_dir of configs. Such as: `-d dataset/coco`
108
- `--output_eval`: Evaluation directory, default is current directory.
109 110 111 112 113
- `-o`: Set configuration options in config file. Such as: `-o weights=output/faster_rcnn_r50_1x/model_final`
- `--json_eval`: Whether to eval with already existed bbox.json or mask.json. Default is `False`. Json file directory is assigned by `-f` argument.

#### Examples

114
- Evaluate by specified weights path and dataset path
115
```bash
W
wangguanzhong 已提交
116
# run on GPU with:
117
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
118
export CUDA_VISIBLE_DEVICES=0
119 120 121 122 123
python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
                        -o weights=output/faster_rcnn_r50_1x/model_final \
                        -d dataset/coco
```

124
- Evaluate with json
125
```bash
W
wangguanzhong 已提交
126
# run on GPU with:
127
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
128
export CUDA_VISIBLE_DEVICES=0
129
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
W
wangguanzhong 已提交
130 131
             --json_eval \
             -f evaluation/
132 133 134 135 136 137
```

The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory. Or without the `-f` parameter, default is the current directory.

#### NOTES

138 139 140 141 142 143 144 145 146 147 148
- Checkpoint is loaded from `output` by default (configurable)
- Multi-GPU evaluation for R-CNN and SSD models is not supported at the
moment, but it is a planned feature


## Inference


- Run inference on a single image:

```bash
W
wangguanzhong 已提交
149
# run on GPU with:
150
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
151
export CUDA_VISIBLE_DEVICES=0
152 153 154
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
```

155
- Multi-image inference:
156 157

```bash
W
wangguanzhong 已提交
158
# run on GPU with:
159
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
160
export CUDA_VISIBLE_DEVICES=0
161 162 163
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
```

164 165 166 167 168
#### Optional arguments

- `--output_dir`: Directory for storing the output visualization files.
- `--draw_threshold`: Threshold to reserve the result for visualization. Default is 0.5.
- `--save_inference_model`: Save inference model in output_dir if True.
169 170
- `--use_tb`: Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard, default is `False`
- `--tb_log_dir`: tb-paddle logging directory for image, default is `tb_log_dir/image`
171 172 173 174

#### Examples

- Output specified directory && Set up threshold
175

176
```bash
W
wangguanzhong 已提交
177
# run on GPU with:
178
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
179
export CUDA_VISIBLE_DEVICES=0
180 181 182
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
                      --infer_img=demo/000000570688.jpg \
                      --output_dir=infer_output/ \
183
                      --draw_threshold=0.5 \
184 185
                      -o weights=output/faster_rcnn_r50_1x/model_final \
                      --use_tb=Ture
186
```
187

188
The visualization files are saved in `output` by default, to specify a different path, simply add a `--output_dir=` flag.
189
`--draw_threshold` is an optional argument. Default is 0.5.
190 191
Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
If users want to infer according to customized model path, `-o weights` can be set for specified path.
192
`--use_tb` is an optional argument, if `--use_tb` is `True`, the tb-paddle will record data in directory,
193
so users can see the results in Tensorboard.
194

195 196 197
- Save inference model

```bash
W
wangguanzhong 已提交
198
# run on GPU with:
199
export CUDA_VISIBLE_DEVICES=0
200 201 202
export PYTHONPATH=$PYTHONPATH:.
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
                      --infer_img=demo/000000570688.jpg \
203 204 205
                      --save_inference_model
```

K
Kaipeng Deng 已提交
206
Save inference model by set `--save_inference_model`, which can be loaded by PaddlePaddle predict library.
207

208 209 210

## FAQ

Q
qingqing01 已提交
211 212
**Q:**  Why do I get `NaN` loss values during single GPU training? </br>
**A:**  The default learning rate is tuned to multi-GPU training (8x GPUs), it must
213 214
be adapted for single GPU training accordingly (e.g., divide by 8).
The calculation rules are as follows,they are equivalent: </br>
215

216

217 218
| GPU number  | Learning rate  | Max_iters | Milestones       |
| :---------: | :------------: | :-------: | :--------------: |
219 220 221
| 2           | 0.0025         | 720000    | [480000, 640000] |
| 4           | 0.005          | 360000    | [240000, 320000] |
| 8           | 0.01           | 180000    | [120000, 160000] |
222

223

Q
qingqing01 已提交
224 225 226 227 228
**Q:**  How to reduce GPU memory usage? </br>
**A:**  Setting environment variable FLAGS_conv_workspace_size_limit to a smaller
number can reduce GPU memory footprint without affecting training speed.
Take Mask-RCNN (R50) as example, by setting `export FLAGS_conv_workspace_size_limit=512`,
batch size could reach 4 per GPU (Tesla V100 16GB).
229 230 231 232 233 234


**Q:**  How to change data preprocessing? </br>
**A:**  Set `sample_transform` in configuration. Note that **the whole transforms** need to be added in configuration.
For example, `DecodeImage`, `NormalizeImage` and `Permute` in RCNN models. For detail description, please refer
to [config_example](config_example).