Merge pull request #1862 from cuicheng01/release/2.3

[cherry-pick]update quickstart docs

Merge pull request #1862 from cuicheng01/release/2.3
[cherry-pick]update quickstart docs
ef99fcfc · cuicheng01 · GitHub · 925619fc · 0bd44ec6 · ef99fcfc
2 changed file
--- a/docs/en/quick_start/quick_start_classification_professional_en.md
+++ b/docs/en/quick_start/quick_start_classification_professional_en.md
@@ -75,6 +75,24 @@ python3 -m paddle.distributed.launch \

 The highest accuracy of the validation set is around 0.415.

+Here, multiple GPUs are used for training. If only one GPU is used, please specify the GPU with the `CUDA_VISIBLE_DEVICES` setting, and specify the GPU with the `--gpus` setting, the same below. For example, to train with only GPU 0:
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python3 -m paddle.distributed.launch \
+    --gpus="0" \
+    tools/train.py \
+        -c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
+        -o Global.output_dir="output_CIFAR" \
+        -o Optimizer.lr.learning_rate=0.01
+```
+
+* **Notice**:
+
+* The GPUs specified in `--gpus` can be a subset of the GPUs specified in `CUDA_VISIBLE_DEVICES`.
+* Since the initial learning rate and batch-size need to maintain a linear relationship, when training is switched from 4 GPUs to 1 GPU, the total batch-size is reduced to 1/4 of the original, and the learning rate also needs to be reduced to 1/4 of the original, so changed the default learning rate from 0.04 to 0.01.
+
+
 <a name="2.1.2"></a> 



--- a/docs/zh_CN/quick_start/quick_start_classification_professional.md
+++ b/docs/zh_CN/quick_start/quick_start_classification_professional.md
@@ -75,6 +75,23 @@ python3 -m paddle.distributed.launch \

 验证集的最高准确率为 0.415 左右。

+此处使用了多个 GPU 训练，如果只使用一个 GPU，请将 `CUDA_VISIBLE_DEVICES` 设置指定 GPU，`--gpus`设置指定 GPU，下同。例如，只使用 0 号 GPU 训练：
+
+```shell
+export CUDA_VISIBLE_DEVICES=0
+python3 -m paddle.distributed.launch \
+    --gpus="0" \
+    tools/train.py \
+        -c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
+        -o Global.output_dir="output_CIFAR" \
+        -o Optimizer.lr.learning_rate=0.01
+```
+
+* **注意**: 
+
+* `--gpus`中指定的 GPU 可以是 `CUDA_VISIBLE_DEVICES` 指定的 GPU 的子集。
+* 由于初始学习率和 batch-size 需要保持线性关系，所以训练从 4 个 GPU 切换到 1 个 GPU 训练时，总 batch-size 缩减为原来的 1/4，学习率也需要缩减为原来的 1/4，所以改变了默认的学习率从 0.04 到 0.01。
+
 <a name="2.1.2"></a>