提交 d421be71 编写于 作者: Y Yang Zhang 提交者: GitHub

Document mixed precision training flags (#3476)

* Document mixed precision training flags

* Clarify GPU is required for mixed precision training
上级 100387be
......@@ -41,6 +41,8 @@ python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
- `-o`: Set configuration options in config file. Such as: `-o max_iters=180000`. `-o` has higher priority to file configured by `-c`
- `--use_tb`: Whether to record the data with [tb-paddle](https://github.com/linshuliang/tb-paddle), so as to display in Tensorboard, default is `False`
- `--tb_log_dir`: tb-paddle logging directory for scalar, default is `tb_log_dir/scalar`
- `--fp16`: Whether to enable mixed precision training (requires GPU), default is `False`
- `--loss_scale`: Loss scaling factor for mixed precision training, default is `8.0`
##### Examples
......@@ -109,7 +111,7 @@ python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
#### Examples
- Evaluate by specified weights path and dataset path
- Evaluate by specified weights path and dataset path
```bash
# run on GPU with:
export PYTHONPATH=$PYTHONPATH:.
......@@ -183,7 +185,7 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
--use_tb=Ture
```
The visualization files are saved in `output` by default, to specify a different path, simply add a `--output_dir=` flag.
The visualization files are saved in `output` by default, to specify a different path, simply add a `--output_dir=` flag.
`--draw_threshold` is an optional argument. Default is 0.5.
Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
If users want to infer according to customized model path, `-o weights` can be set for specified path.
......@@ -208,12 +210,12 @@ Save inference model by set `--save_inference_model`, which can be loaded by Pad
**Q:** Why do I get `NaN` loss values during single GPU training? </br>
**A:** The default learning rate is tuned to multi-GPU training (8x GPUs), it must
be adapted for single GPU training accordingly (e.g., divide by 8).
The calculation rules are as follows,they are equivalent: </br>
be adapted for single GPU training accordingly (e.g., divide by 8).
The calculation rules are as follows,they are equivalent: </br>
| GPU number | Learning rate | Max_iters | Milestones |
| :---------: | :------------: | :-------: | :--------------: |
| GPU number | Learning rate | Max_iters | Milestones |
| :---------: | :------------: | :-------: | :--------------: |
| 2 | 0.0025 | 720000 | [480000, 640000] |
| 4 | 0.005 | 360000 | [240000, 320000] |
| 8 | 0.01 | 180000 | [120000, 160000] |
......
......@@ -42,6 +42,8 @@ python tools/train.py -c configs/faster_rcnn_r50_1x.yml -o use_gpu=false
- `-o`: 设置配置文件里的参数内容。例如: `-o max_iters=180000`。使用`-o`配置相较于`-c`选择的配置文件具有更高的优先级。
- `--use_tb`: 是否使用[tb-paddle](https://github.com/linshuliang/tb-paddle)记录数据,进而在TensorBoard中显示,默认是False。
- `--tb_log_dir`: 指定 tb-paddle 记录数据的存储路径,默认是`tb_log_dir/scalar`
- `--fp16`: 是否使用混合精度训练模式(需GPU训练),默认是`False`
- `--loss_scale`: 设置混合精度训练模式中损失值的缩放比例,默认是`8.0`
##### 例子
......@@ -184,7 +186,7 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml \
```
可视化文件默认保存在`output`中,可通过`--output_dir=`指定不同的输出路径。
可视化文件默认保存在`output`中,可通过`--output_dir=`指定不同的输出路径。
`--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,
不同阈值会产生不同的结果。如果用户需要对自定义路径的模型进行推断,可以设置`-o weights`指定模型路径。
`--use_tb`是个可选参数,当为`True`时,可使用 TensorBoard 来可视化参数的变化趋势和图片。
......@@ -205,12 +207,12 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/0000005
## FAQ
**Q:** 为什么我使用单GPU训练loss会出`NaN`? </br>
**A:** 默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8)。
计算规则表如下所示,它们是等价的: </br>
**A:** 默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8)。
计算规则表如下所示,它们是等价的: </br>
| GPU数 | 学习率 | 最大轮数 | 变化节点 |
| :---------: | :------------: | :-------: | :--------------: |
| GPU数 | 学习率 | 最大轮数 | 变化节点 |
| :---------: | :------------: | :-------: | :--------------: |
| 2 | 0.0025 | 720000 | [480000, 640000] |
| 4 | 0.005 | 360000 | [240000, 320000] |
| 8 | 0.01 | 180000 | [120000, 160000] |
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册