diff --git a/fluid/PaddleCV/image_classification/dist_train/README.md b/fluid/PaddleCV/image_classification/dist_train/README.md index 282a026acf1ee6d5b1c17aa05a2a8f734047c006..ecb8c84035d9df3b5b86836bfe3748d9a9c89ae3 100644 --- a/fluid/PaddleCV/image_classification/dist_train/README.md +++ b/fluid/PaddleCV/image_classification/dist_train/README.md @@ -9,7 +9,7 @@ Before getting started, please make sure you have go throught the imagenet [Data 1. The entrypoint file is `dist_train.py`, some important flags are as follows: - - `model`, the model to run with, such as `ResNet50`, `ResNet101` and etc.. + - `model`, the model to run with, default is the fine tune model `DistResnet`. - `batch_size`, the batch_size per device. - `update_method`, specify the update method, can choose from local, pserver or nccl2. - `device`, use CPU or GPU device. @@ -35,14 +35,14 @@ In this example, we launched 4 parameter server instances and 4 trainer instance 1. launch parameter server process - ``` python + ``` bash PADDLE_TRAINING_ROLE=PSERVER \ PADDLE_TRAINERS=4 \ PADDLE_PSERVER_IPS=192.168.0.100,192.168.0.101,192.168.0.102,192.168.0.103 \ PADDLE_CURRENT_IP=192.168.0.100 \ PADDLE_PSERVER_PORT=7164 \ python dist_train.py \ - --model=ResNet50 \ + --model=DistResnet \ --batch_size=32 \ --update_method=pserver \ --device=CPU \ @@ -51,34 +51,33 @@ In this example, we launched 4 parameter server instances and 4 trainer instance 1. launch trainer process - ``` python + ``` bash PADDLE_TRAINING_ROLE=TRAINER \ PADDLE_TRAINERS=4 \ PADDLE_PSERVER_IPS=192.168.0.100,192.168.0.101,192.168.0.102,192.168.0.103 \ PADDLE_TRAINER_ID=0 \ PADDLE_PSERVER_PORT=7164 \ python dist_train.py \ - --model=ResNet50 \ + --model=DistResnet \ --batch_size=32 \ --update_method=pserver \ --device=GPU \ --data_dir=../data/ILSVRC2012 - ``` ### NCCL2 Collective Mode 1. launch trainer process - ``` python + ``` bash PADDLE_TRAINING_ROLE=TRAINER \ PADDLE_TRAINERS=4 \ PADDLE_TRAINER_IPS=192.168.0.100,192.168.0.101,192.168.0.102,192.168.0.103 \ PADDLE_TRAINER_ID=0 \ python dist_train.py \ - --model=ResNet50 \ + --model=DistResnet \ --batch_size=32 \ - --update_method=pserver \ + --update_method=nccl2 \ --device=GPU \ --data_dir=../data/ILSVRC2012 ``` @@ -101,13 +100,35 @@ Pass 0, batch 8, loss 7.264951, accucacys: [0.0, 0.00390625] Pass 0, batch 9, loss 7.43522, accucacys: [0.00390625, 0.00390625] ``` -The training accucacys top1 of local training, distributed training with NCCL2 and parameter server architecture on the ResNet50 model are shown in the below figure: +The below figure shows top 1 train accuracy for local training with 8 GPUs and distributed training +with 32 GPUs, and also distributed training with batch merge feature turned on. Note that the +red curve is train with origin model configuration, which do not have warmup and some detailed modifications. + +For distributed training with 32GPUs using `--model DistResnet` we can achieve test accuracy 75.5% after +90 passes of training (the test accuracy is not shown in below figure).
-Training acc1 curves
+Training top-1 accuracy curves