GETTING_STARTED.md 6.8 KB
Newer Older
1 2
# Getting Started

K
Kaipeng Deng 已提交
3
For setting up the running environment, please refer to [installation
4 5 6 7 8 9 10 11 12 13
instructions](INSTALL.md).


## Training

#### Single-GPU Training


```bash
export CUDA_VISIBLE_DEVICES=0
14
export PYTHONPATH=$PYTHONPATH:.
15 16 17 18 19 20 21
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```

#### Multi-GPU Training

```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
22 23 24 25 26 27 28 29 30
export PYTHONPATH=$PYTHONPATH:.
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```

#### CPU Training

```bash
export CPU_NUM=8
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
31
python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
32 33
```

34 35 36 37
##### Optional arguments

- `-r` or `--resume_checkpoint`: Checkpoint path for resuming training. Such as: `-r output/faster_rcnn_r50_1x/10000`
- `--eval`: Whether to perform evaluation in training, default is `False`
38
- `--output_eval`: If perform evaluation in training, this edits evaluation directory, default is current directory.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
- `-d` or `--dataset_dir`: Dataset path, same as `dataset_dir` of configs. Such as: `-d dataset/coco`
- `-o`: Set configuration options in config file. Such as: `-o weights=output/faster_rcnn_r50_1x/model_final`


##### Examples

- Perform evaluation in training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
```

Alternating between training epoch and evaluation run is possible, simply pass
in `--eval` to do so and evaluate at each snapshot_iter. It can be modified at `snapshot_iter` of the configuration file. If evaluation dataset is large and
causes time-consuming in training, we suggest decreasing evaluation times or evaluating after training.


- configuration options and assign Dataset path
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
                         -o weights=output/faster_rcnn_r50_1x/model_final \
                         -d dataset/coco
```


##### NOTES

- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU calculation rules can refer [FAQ](#faq)
- Dataset is stored in `dataset/coco` by default (configurable).
- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
72
- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
73
- Model checkpoints are saved in `output` by default (configurable).
74
- To check out hyper parameters used, please refer to the config file.
75
- RCNN models training on CPU is not supported on PaddlePaddle<=1.5.1 and will be fixed on later version.
76 77 78 79 80 81 82



## Evaluation


```bash
W
wangguanzhong 已提交
83
# run on GPU with:
84
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
85
export CUDA_VISIBLE_DEVICES=0
86 87 88
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
```

89 90 91
#### Optional arguments

- `-d` or `--dataset_dir`: Dataset path, same as dataset_dir of configs. Such as: `-d dataset/coco`
92
- `--output_eval`: Evaluation directory, default is current directory.
93 94 95 96 97 98 99
- `-o`: Set configuration options in config file. Such as: `-o weights=output/faster_rcnn_r50_1x/model_final`
- `--json_eval`: Whether to eval with already existed bbox.json or mask.json. Default is `False`. Json file directory is assigned by `-f` argument.

#### Examples

- configuration options && assign Dataset path
```bash
W
wangguanzhong 已提交
100
# run on GPU with:
101
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
102
export CUDA_VISIBLE_DEVICES=0
103 104 105 106 107 108 109
python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
                        -o weights=output/faster_rcnn_r50_1x/model_final \
                        -d dataset/coco
```

- Evaluation with json
```bash
W
wangguanzhong 已提交
110
# run on GPU with:
111
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
112
export CUDA_VISIBLE_DEVICES=0
113
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
W
wangguanzhong 已提交
114 115
             --json_eval \
             -f evaluation/
116 117 118 119 120 121
```

The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory. Or without the `-f` parameter, default is the current directory.

#### NOTES

122 123 124 125 126 127 128 129 130 131 132
- Checkpoint is loaded from `output` by default (configurable)
- Multi-GPU evaluation for R-CNN and SSD models is not supported at the
moment, but it is a planned feature


## Inference


- Run inference on a single image:

```bash
W
wangguanzhong 已提交
133
# run on GPU with:
134
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
135
export CUDA_VISIBLE_DEVICES=0
136 137 138
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
```

139
- Multi-image inference:
140 141

```bash
W
wangguanzhong 已提交
142
# run on GPU with:
143
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
144
export CUDA_VISIBLE_DEVICES=0
145 146 147
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
```

148 149 150 151 152 153 154 155 156 157
#### Optional arguments

- `--output_dir`: Directory for storing the output visualization files.
- `--draw_threshold`: Threshold to reserve the result for visualization. Default is 0.5.
- `--save_inference_model`: Save inference model in output_dir if True.

#### Examples

- Output specified directory && Set up threshold
```bash
W
wangguanzhong 已提交
158
# run on GPU with:
159
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
160
export CUDA_VISIBLE_DEVICES=0
161 162 163 164 165
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
                      --infer_img=demo/000000570688.jpg \
                      --output_dir=infer_output/ \
                      --draw_threshold=0.5
```
166
The visualization files are saved in `output` by default, to specify a different
W
wangguanzhong 已提交
167
path, simply add a `--output_dir=` flag.  
168
`--draw_threshold` is an optional argument. Default is 0.5. Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659)
169

170 171 172
- Save inference model

```bash
W
wangguanzhong 已提交
173
# run on GPU with:
174
export CUDA_VISIBLE_DEVICES=0
175 176 177
export PYTHONPATH=$PYTHONPATH:.
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
                      --infer_img=demo/000000570688.jpg \
178 179 180
                      --save_inference_model
```

K
Kaipeng Deng 已提交
181
Save inference model by set `--save_inference_model`, which can be loaded by PaddlePaddle predict library.
182

183 184 185

## FAQ

Q
qingqing01 已提交
186 187
**Q:**  Why do I get `NaN` loss values during single GPU training? </br>
**A:**  The default learning rate is tuned to multi-GPU training (8x GPUs), it must
W
wangguanzhong 已提交
188 189
be adapted for single GPU training accordingly (e.g., divide by 8).  
The calculation rules are as follows,they are equivalent: </br>  
190

191

W
wangguanzhong 已提交
192 193
| GPU number  | Learning rate  | Max_iters | Milestones       |  
| :---------: | :------------: | :-------: | :--------------: |  
194 195 196
| 2           | 0.0025         | 720000    | [480000, 640000] |
| 4           | 0.005          | 360000    | [240000, 320000] |
| 8           | 0.01           | 180000    | [120000, 160000] |
197

Q
qingqing01 已提交
198 199 200 201 202
**Q:**  How to reduce GPU memory usage? </br>
**A:**  Setting environment variable FLAGS_conv_workspace_size_limit to a smaller
number can reduce GPU memory footprint without affecting training speed.
Take Mask-RCNN (R50) as example, by setting `export FLAGS_conv_workspace_size_limit=512`,
batch size could reach 4 per GPU (Tesla V100 16GB).