提交 d421be71 编写于 作者: Y Yang Zhang 提交者: GitHub

Document mixed precision training flags (#3476)

* Document mixed precision training flags

* Clarify GPU is required for mixed precision training
上级 100387be
...@@ -41,6 +41,8 @@ python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false ...@@ -41,6 +41,8 @@ python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
- `-o`: Set configuration options in config file. Such as: `-o max_iters=180000`. `-o` has higher priority to file configured by `-c` - `-o`: Set configuration options in config file. Such as: `-o max_iters=180000`. `-o` has higher priority to file configured by `-c`
- `--use_tb`: Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard, default is `False` - `--use_tb`: Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard, default is `False`
- `--tb_log_dir`: tb-paddle logging directory for scalar, default is `tb_log_dir/scalar` - `--tb_log_dir`: tb-paddle logging directory for scalar, default is `tb_log_dir/scalar`
- `--fp16`: Whether to enable mixed precision training (requires GPU), default is `False`
- `--loss_scale`: Loss scaling factor for mixed precision training, default is `8.0`
##### Examples ##### Examples
...@@ -109,7 +111,7 @@ python tools/eval.py -c configs/faster_rcnn_r50_1x.yml ...@@ -109,7 +111,7 @@ python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
#### Examples #### Examples
- Evaluate by specified weights path and dataset path - Evaluate by specified weights path and dataset path
```bash ```bash
# run on GPU with: # run on GPU with:
export PYTHONPATH=$PYTHONPATH:. export PYTHONPATH=$PYTHONPATH:.
...@@ -183,7 +185,7 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \ ...@@ -183,7 +185,7 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
--use_tb=Ture --use_tb=Ture
``` ```
The visualization files are saved in `output` by default, to specify a different path, simply add a `--output_dir=` flag. The visualization files are saved in `output` by default, to specify a different path, simply add a `--output_dir=` flag.
`--draw_threshold` is an optional argument. Default is 0.5. `--draw_threshold` is an optional argument. Default is 0.5.
Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659). Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
If users want to infer according to customized model path, `-o weights` can be set for specified path. If users want to infer according to customized model path, `-o weights` can be set for specified path.
...@@ -208,12 +210,12 @@ Save inference model by set `--save_inference_model`, which can be loaded by Pad ...@@ -208,12 +210,12 @@ Save inference model by set `--save_inference_model`, which can be loaded by Pad
**Q:** Why do I get `NaN` loss values during single GPU training? </br> **Q:** Why do I get `NaN` loss values during single GPU training? </br>
**A:** The default learning rate is tuned to multi-GPU training (8x GPUs), it must **A:** The default learning rate is tuned to multi-GPU training (8x GPUs), it must
be adapted for single GPU training accordingly (e.g., divide by 8). be adapted for single GPU training accordingly (e.g., divide by 8).
The calculation rules are as follows,they are equivalent: </br> The calculation rules are as follows,they are equivalent: </br>
| GPU number | Learning rate | Max_iters | Milestones | | GPU number | Learning rate | Max_iters | Milestones |
| :---------: | :------------: | :-------: | :--------------: | | :---------: | :------------: | :-------: | :--------------: |
| 2 | 0.0025 | 720000 | [480000, 640000] | | 2 | 0.0025 | 720000 | [480000, 640000] |
| 4 | 0.005 | 360000 | [240000, 320000] | | 4 | 0.005 | 360000 | [240000, 320000] |
| 8 | 0.01 | 180000 | [120000, 160000] | | 8 | 0.01 | 180000 | [120000, 160000] |
......
...@@ -42,6 +42,8 @@ python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false ...@@ -42,6 +42,8 @@ python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
- `-o`: 设置配置文件里的参数内容。例如: `-o max_iters=180000`。使用`-o`配置相较于`-c`选择的配置文件具有更高的优先级。 - `-o`: 设置配置文件里的参数内容。例如: `-o max_iters=180000`。使用`-o`配置相较于`-c`选择的配置文件具有更高的优先级。
- `--use_tb`: 是否使用[tb-paddle](https://github.com/linshuliang/tb-paddle)记录数据,进而在TensorBoard中显示,默认是False。 - `--use_tb`: 是否使用[tb-paddle](https://github.com/linshuliang/tb-paddle)记录数据,进而在TensorBoard中显示,默认是False。
- `--tb_log_dir`: 指定 tb-paddle 记录数据的存储路径,默认是`tb_log_dir/scalar` - `--tb_log_dir`: 指定 tb-paddle 记录数据的存储路径,默认是`tb_log_dir/scalar`
- `--fp16`: 是否使用混合精度训练模式(需GPU训练),默认是`False`
- `--loss_scale`: 设置混合精度训练模式中损失值的缩放比例,默认是`8.0`
##### 例子 ##### 例子
...@@ -184,7 +186,7 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \ ...@@ -184,7 +186,7 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
``` ```
可视化文件默认保存在`output`中,可通过`--output_dir=`指定不同的输出路径。 可视化文件默认保存在`output`中,可通过`--output_dir=`指定不同的输出路径。
`--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算, `--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,
不同阈值会产生不同的结果。如果用户需要对自定义路径的模型进行推断,可以设置`-o weights`指定模型路径。 不同阈值会产生不同的结果。如果用户需要对自定义路径的模型进行推断,可以设置`-o weights`指定模型路径。
`--use_tb`是个可选参数,当为`True`时,可使用 TensorBoard 来可视化参数的变化趋势和图片。 `--use_tb`是个可选参数,当为`True`时,可使用 TensorBoard 来可视化参数的变化趋势和图片。
...@@ -205,12 +207,12 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/0000005 ...@@ -205,12 +207,12 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/0000005
## FAQ ## FAQ
**Q:** 为什么我使用单GPU训练loss会出`NaN`? </br> **Q:** 为什么我使用单GPU训练loss会出`NaN`? </br>
**A:** 默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8)。 **A:** 默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8)。
计算规则表如下所示,它们是等价的: </br> 计算规则表如下所示,它们是等价的: </br>
| GPU数 | 学习率 | 最大轮数 | 变化节点 | | GPU数 | 学习率 | 最大轮数 | 变化节点 |
| :---------: | :------------: | :-------: | :--------------: | | :---------: | :------------: | :-------: | :--------------: |
| 2 | 0.0025 | 720000 | [480000, 640000] | | 2 | 0.0025 | 720000 | [480000, 640000] |
| 4 | 0.005 | 360000 | [240000, 320000] | | 4 | 0.005 | 360000 | [240000, 320000] |
| 8 | 0.01 | 180000 | [120000, 160000] | | 8 | 0.01 | 180000 | [120000, 160000] |
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册