未验证 提交 68272833 编写于 作者: M MissPenguin 提交者: GitHub

Merge pull request #4531 from Intsigstephon/dygraph

Add multi-node-multi-card and amp train in detection.md
...@@ -96,6 +96,10 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \ ...@@ -96,6 +96,10 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \
# 单机多卡训练,通过 --gpus 参数设置使用的GPU ID # 单机多卡训练,通过 --gpus 参数设置使用的GPU ID
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
# 多机多卡训练,通过 --ips 参数设置使用的机器IP地址,通过 --gpus 参数设置使用的GPU ID
python3 -m paddle.distributed.launch --ips="10.21.226.181,10.21.226.133" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
``` ```
上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。 上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。
...@@ -106,6 +110,15 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/ ...@@ -106,6 +110,15 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
``` ```
**注意:** 采用多机多卡训练时,需要替换上面命令中的ips值为您机器的地址,机器之间需要能够相互ping通。查看机器ip地址的命令为`ifconfig`
如果您想进一步加快训练速度,可以使用[自动混合精度训练](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), 以单机单卡为例,命令如下:
```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
```
<a name="14-----"></a> <a name="14-----"></a>
## 1.4 断点训练 ## 1.4 断点训练
......
...@@ -99,6 +99,18 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o \ ...@@ -99,6 +99,18 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o \
# Set the GPU ID used by the '--gpus' parameter. # Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
# multi-Node, multi-GPU training
# Set the IPs of your nodes used by the '--ips' parameter. Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --ips="10.21.226.181,10.21.226.133" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
```
**Note:** For multi-Node multi-GPU training, you need to replace the `ips` value in the preceding command with the address of your machine, and the machines must be able to ping each other. The command for viewing the IP address of the machine is `ifconfig`.
If you want to further speed up the training, you can use [automatic mixed precision training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_en.html). for single card training, the command is as follows:
```
python3 tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
``` ```
### 2.2 Load Trained Model and Continue Training ### 2.2 Load Trained Model and Continue Training
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册