GETTING_STARTED.md 6.9 KB
Newer Older
1 2
# Getting Started

K
Kaipeng Deng 已提交
3
For setting up the running environment, please refer to [installation
4 5 6 7 8 9 10 11 12 13
instructions](INSTALL.md).


## Training

#### Single-GPU Training


```bash
export CUDA_VISIBLE_DEVICES=0
14
export PYTHONPATH=$PYTHONPATH:.
15 16 17 18 19 20 21
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```

#### Multi-GPU Training

```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
22 23 24 25 26 27 28 29 30
export PYTHONPATH=$PYTHONPATH:.
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```

#### CPU Training

```bash
export CPU_NUM=8
export PYTHONPATH=$PYTHONPATH:.
J
jerrywgz 已提交
31
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
32 33
```

34 35 36 37
##### Optional arguments

- `-r` or `--resume_checkpoint`: Checkpoint path for resuming training. Such as: `-r output/faster_rcnn_r50_1x/10000`
- `--eval`: Whether to perform evaluation in training, default is `False`
38
- `--output_eval`: If perform evaluation in training, this edits evaluation directory, default is current directory.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
- `-d` or `--dataset_dir`: Dataset path, same as `dataset_dir` of configs. Such as: `-d dataset/coco`
- `-o`: Set configuration options in config file. Such as: `-o weights=output/faster_rcnn_r50_1x/model_final`


##### Examples

- Perform evaluation in training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
```

Alternating between training epoch and evaluation run is possible, simply pass
in `--eval` to do so and evaluate at each snapshot_iter. It can be modified at `snapshot_iter` of the configuration file. If evaluation dataset is large and
causes time-consuming in training, we suggest decreasing evaluation times or evaluating after training.


- configuration options and assign Dataset path
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PYTHONPATH=$PYTHONPATH:.
python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
                         -o weights=output/faster_rcnn_r50_1x/model_final \
                         -d dataset/coco
```


##### NOTES

- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU calculation rules can refer [FAQ](#faq)
- Dataset is stored in `dataset/coco` by default (configurable).
- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
72
- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
73
- Model checkpoints are saved in `output` by default (configurable).
74
- To check out hyper parameters used, please refer to the config file.
75
- RCNN models training on CPU is not supported on PaddlePaddle<=1.5.1 and will be fixed on later version.
76 77 78 79 80 81 82 83



## Evaluation


```bash
export CUDA_VISIBLE_DEVICES=0
84
export PYTHONPATH=$PYTHONPATH:.
85 86 87 88 89
# or run on CPU with:
# export CPU_NUM=1
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
```

90 91 92
#### Optional arguments

- `-d` or `--dataset_dir`: Dataset path, same as dataset_dir of configs. Such as: `-d dataset/coco`
93
- `--output_eval`: Evaluation directory, default is current directory.
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124
- `-o`: Set configuration options in config file. Such as: `-o weights=output/faster_rcnn_r50_1x/model_final`
- `--json_eval`: Whether to eval with already existed bbox.json or mask.json. Default is `False`. Json file directory is assigned by `-f` argument.

#### Examples

- configuration options && assign Dataset path
```bash
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
# or run on CPU with:
# export CPU_NUM=1
python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
                        -o weights=output/faster_rcnn_r50_1x/model_final \
                        -d dataset/coco
```

- Evaluation with json
```bash
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
# or run on CPU with:
# export CPU_NUM=1
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
		     --json_eval \
		     -f evaluation/
```

The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory. Or without the `-f` parameter, default is the current directory.

#### NOTES

125 126 127 128 129 130 131 132 133 134 135 136
- Checkpoint is loaded from `output` by default (configurable)
- Multi-GPU evaluation for R-CNN and SSD models is not supported at the
moment, but it is a planned feature


## Inference


- Run inference on a single image:

```bash
export CUDA_VISIBLE_DEVICES=0
137
export PYTHONPATH=$PYTHONPATH:.
138 139 140 141 142
# or run on CPU with:
# export CPU_NUM=1
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
```

143
- Multi-image inference:
144 145 146

```bash
export CUDA_VISIBLE_DEVICES=0
147
export PYTHONPATH=$PYTHONPATH:.
148 149 150 151 152
# or run on CPU with:
# export CPU_NUM=1
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
```

153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171
#### Optional arguments

- `--output_dir`: Directory for storing the output visualization files.
- `--draw_threshold`: Threshold to reserve the result for visualization. Default is 0.5.
- `--save_inference_model`: Save inference model in output_dir if True.

#### Examples

- Output specified directory && Set up threshold
```bash
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
# or run on CPU with:
# export CPU_NUM=1
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
                      --infer_img=demo/000000570688.jpg \
                      --output_dir=infer_output/ \
                      --draw_threshold=0.5
```
172
The visualization files are saved in `output` by default, to specify a different
173 174
path, simply add a `--output_dir=` flag.         
`--draw_threshold` is an optional argument. Default is 0.5. Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659)
175

176 177 178 179
- Save inference model

```bash
export CUDA_VISIBLE_DEVICES=0
180
export PYTHONPATH=$PYTHONPATH:.
181 182
# or run on CPU with:
# export CPU_NUM=1
183 184
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
                      --infer_img=demo/000000570688.jpg \
185 186 187
                      --save_inference_model
```

K
Kaipeng Deng 已提交
188
Save inference model by set `--save_inference_model`, which can be loaded by PaddlePaddle predict library.
189

190 191 192

## FAQ

Q
qingqing01 已提交
193 194
**Q:**  Why do I get `NaN` loss values during single GPU training? </br>
**A:**  The default learning rate is tuned to multi-GPU training (8x GPUs), it must
195 196 197
be adapted for single GPU training accordingly (e.g., divide by 8).       
The calculation rules are as follows,they are equivalent: </br>     

198

199 200 201 202 203
| GPU number  | Learning rate  | Max_iters | Milestones       |       
| :---------: | :------------: | :-------: | :--------------: |     
| 2           | 0.0025         | 720000    | [480000, 640000] |
| 4           | 0.005          | 360000    | [240000, 320000] |
| 8           | 0.01           | 180000    | [120000, 160000] |
204

Q
qingqing01 已提交
205 206 207 208 209
**Q:**  How to reduce GPU memory usage? </br>
**A:**  Setting environment variable FLAGS_conv_workspace_size_limit to a smaller
number can reduce GPU memory footprint without affecting training speed.
Take Mask-RCNN (R50) as example, by setting `export FLAGS_conv_workspace_size_limit=512`,
batch size could reach 4 per GPU (Tesla V100 16GB).