GETTING_STARTED.md 8.7 KB
Newer Older
Q
qingqing01 已提交
1 2
English | [简体中文](GETTING_STARTED_cn.md)

3 4
# Getting Started

K
Kaipeng Deng 已提交
5
For setting up the running environment, please refer to [installation
6 7 8
instructions](INSTALL.md).


W
wangguanzhong 已提交
9
## Training/Evaluation/Inference
10

W
wangguanzhong 已提交
11
PaddleDetection provides scripots for training, evalution and inference with various features according to different configure.
12 13

```bash
W
wangguanzhong 已提交
14
# set PYTHONPATH
15
export PYTHONPATH=$PYTHONPATH:.
W
wangguanzhong 已提交
16
# training in single-GPU and multi-GPU. specify different GPU numbers by CUDA_VISIBLE_DEVICES
17
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
18
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
W
wangguanzhong 已提交
19 20 21 22 23
# GPU evalution
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
# Inference
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
24 25
```

W
wangguanzhong 已提交
26
### Optional argument list
27

W
wangguanzhong 已提交
28
list below can be viewed by `--help`
29

W
wangguanzhong 已提交
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
|         FLAG             |  script supported  |    description    |     default     |      remark      |
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
|          -c              |      ALL       |  Select config file  |  None  |  **The whole description of configure can refer to [config_example](config_example)** |
|          -o              |      ALL       |  Set parameters in configure file  |  None  |  `-o` has higher priority to file configured by `-c`. Such as `-o use_gpu=False max_iter=10000`  |  
|   -r/--resume_checkpoint |     train      |  Checkpoint path for resuming training  |  None  |  `-r output/faster_rcnn_r50_1x/10000`  |
|        --eval            |     train      |  Whether to perform evaluation in training  |  False  |    |
|      --output_eval       |     train/eval |  json path in evalution  |  current path  |  `--output_eval ./json_result`  |
|   -d/--dataset_dir       |   train/eval   |  path for dataset, same as dataset_dir in configs  |  None  |  `-d dataset/coco`  |
|       --fp16             |     train      |  Whether to enable mixed precision training  |  False  |  GPU training is required  |
|       --loss_scale       |     train      |  Loss scaling factor for mixed precision training  |  8.0  |  enable when `--fp16` is True  |  
|       --json_eval        |       eval     |  Whether to evaluate with already existed bbox.json or mask.json  |  False  |  json path is set in `--output_eval`  |
|       --output_dir       |      infer     |  Directory for storing the output visualization files  |  `./output`  |  `--output_dir output`  |
|    --draw_threshold      |      infer     |  Threshold to reserve the result for visualization  |  0.5  |  `--draw_threshold 0.7`  |
|  --save\_inference_model |      infer      |  Whether to save inference model in output_dir  |  False  |  save_inference_model is saved in `--output_dir`  |
|      --infer_dir         |       infer     |  Directory for images to perform inference on  |  None  |    |
|      --infer_img         |       infer     |  Image path  |  None  |  higher priority over --infer_dir  |
|        --use_tb          |   train/infer   |  Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard  |  False  |      |
|        --tb\_log_dir     |   train/infer   |  tb-paddle logging directory for image  |  train:`tb_log_dir/scalar` infer: `tb_log_dir/image`  |     |
48 49


W
wangguanzhong 已提交
50
## Examples
51

W
wangguanzhong 已提交
52
### Training
53 54 55

- Perform evaluation in training

W
wangguanzhong 已提交
56 57 58 59
  ```bash
  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
  python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml --eval
  ```
60

W
wangguanzhong 已提交
61
  Perform training and evalution alternatively and evaluate at each snapshot_iter. Meanwhile, the best model with highest MAP is saved at each `snapshot_iter` which has the same path as `model_final`.
62

W
wangguanzhong 已提交
63
  If evaluation dataset is large, we suggest decreasing evaluation times or evaluating after training.
64

65 66
- Fine-tune other task

W
wangguanzhong 已提交
67
  When using pre-trained model to fine-tune other task, two methods can be used:
68

W
wangguanzhong 已提交
69 70
  1. The excluded pre-trained parameters can be set by `finetune_exclude_pretrained_params` in YAML config
  2. Set -o finetune\_exclude\_pretrained_params in the arguments.
71

W
wangguanzhong 已提交
72 73 74 75 76 77 78 79 80 81 82 83 84 85
  ```bash
  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
  python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
                           -o pretrain_weights=output/faster_rcnn_r50_1x/model_final/ \
                              finetune_exclude_pretrained_params = ['cls_score','bbox_pred']
  ```

##### NOTES

- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU calculation rules can refer [FAQ](#faq)
- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
- Checkpoints are saved in `output` by default, and can be revised from save_dir in configure files.
- RCNN models training on CPU is not supported on PaddlePaddle<=1.5.1 and will be fixed on later version.
86

W
wangguanzhong 已提交
87 88

### Mixed Precision Training
89 90 91 92 93 94 95 96 97 98 99 100 101 102

Mixed precision training can be enabled with `--fp16` flag. Currently Faster-FPN, Mask-FPN and Yolov3 have been verified to be working with little to no loss of precision (less than 0.2 mAP)

To speed up mixed precision training, it is recommended to train in multi-process mode, for example

```bash
python -m paddle.distributed.launch --selected_gpus 0,1,2,3,4,5,6,7 tools/train.py --fp16 -c configs/faster_rcnn_r50_fpn_1x.yml
```

If loss becomes `NaN` during training, try tweak the `--loss_scale` value. Please refer to the Nvidia [documentation](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#mptrain) on mixed precision training for details.

Also, please note mixed precision training currently requires changing `norm_type` from `affine_channel` to `bn`.


103

W
wangguanzhong 已提交
104
### Evaluation
105

W
wangguanzhong 已提交
106
- Evaluate by specified weights path and dataset path
107

W
wangguanzhong 已提交
108 109 110 111 112 113
  ```bash
  export CUDA_VISIBLE_DEVICES=0
  python -u tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
                          -o weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar \
                          -d dataset/coco
  ```
114

W
wangguanzhong 已提交
115
  The path of model to be evaluted can be both local path and link in [MODEL_ZOO](MODEL_ZOO_cn.md).
116

117
- Evaluate with json
W
wangguanzhong 已提交
118 119 120 121

  ```bash
  export CUDA_VISIBLE_DEVICES=0
  python tools/eval.py -c configs/faster_rcnn_r50_1x.yml \
W
wangguanzhong 已提交
122 123
             --json_eval \
             -f evaluation/
W
wangguanzhong 已提交
124
  ```
125

W
wangguanzhong 已提交
126
  The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory.
127 128 129

#### NOTES

130 131 132 133
- Multi-GPU evaluation for R-CNN and SSD models is not supported at the
moment, but it is a planned feature


W
wangguanzhong 已提交
134
### Inference
135 136

- Output specified directory && Set up threshold
137

W
wangguanzhong 已提交
138 139 140
  ```bash
  export CUDA_VISIBLE_DEVICES=0
  python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
141 142
                      --infer_img=demo/000000570688.jpg \
                      --output_dir=infer_output/ \
143
                      --draw_threshold=0.5 \
144 145
                      -o weights=output/faster_rcnn_r50_1x/model_final \
                      --use_tb=Ture
W
wangguanzhong 已提交
146 147 148 149
  ```

  `--draw_threshold` is an optional argument. Default is 0.5.
  Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
150

151

152 153
- Save inference model

W
wangguanzhong 已提交
154 155 156
  ```bash
  export CUDA_VISIBLE_DEVICES=0
  python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
157
                      --infer_img=demo/000000570688.jpg \
158
                      --save_inference_model
W
wangguanzhong 已提交
159
  ```
160

W
wangguanzhong 已提交
161
  Save inference model by set `--save_inference_model`, which can be loaded by PaddlePaddle predict library.
162

163 164 165

## FAQ

Q
qingqing01 已提交
166 167
**Q:**  Why do I get `NaN` loss values during single GPU training? </br>
**A:**  The default learning rate is tuned to multi-GPU training (8x GPUs), it must
168 169
be adapted for single GPU training accordingly (e.g., divide by 8).
The calculation rules are as follows,they are equivalent: </br>
170

171

172 173
| GPU number  | Learning rate  | Max_iters | Milestones       |
| :---------: | :------------: | :-------: | :--------------: |
174 175 176
| 2           | 0.0025         | 720000    | [480000, 640000] |
| 4           | 0.005          | 360000    | [240000, 320000] |
| 8           | 0.01           | 180000    | [120000, 160000] |
177

178

Q
qingqing01 已提交
179 180 181 182 183
**Q:**  How to reduce GPU memory usage? </br>
**A:**  Setting environment variable FLAGS_conv_workspace_size_limit to a smaller
number can reduce GPU memory footprint without affecting training speed.
Take Mask-RCNN (R50) as example, by setting `export FLAGS_conv_workspace_size_limit=512`,
batch size could reach 4 per GPU (Tesla V100 16GB).
184 185 186 187 188 189


**Q:**  How to change data preprocessing? </br>
**A:**  Set `sample_transform` in configuration. Note that **the whole transforms** need to be added in configuration.
For example, `DecodeImage`, `NormalizeImage` and `Permute` in RCNN models. For detail description, please refer
to [config_example](config_example).