From 16bc4abe34c95d02e0dea3fca071fe61fb13ce0c Mon Sep 17 00:00:00 2001
From: caojian05 <caojian5@huawei.com>
Date: Tue, 28 Apr 2020 11:41:20 +0800
Subject: [PATCH] add distribute train README for vgg16

---
 example/vgg16_cifar10/README.md | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/example/vgg16_cifar10/README.md b/example/vgg16_cifar10/README.md
index c324673dc..d41f373a8 100644
--- a/example/vgg16_cifar10/README.md
+++ b/example/vgg16_cifar10/README.md
@@ -49,6 +49,24 @@ You will get the accuracy as following:
 result: {'acc': 0.92}
 ```
 
+### Distribute Training
+```
+sh run_distribute_train.sh rank_table.json your_data_path
+```
+The above shell script will run distribute training in the background, you can view the results through the file `train_parallel[X]/log`.
+
+You will get the loss value as following:
+```
+# grep "result: " train_parallel*/log
+train_parallel0/log:epoch: 1 step: 97, loss is 1.9060308
+train_parallel0/log:epcoh: 2 step: 97, loss is 1.6003821
+...
+train_parallel1/log:epoch: 1 step: 97, loss is 1.7095519
+train_parallel1/log:epcoh: 2 step: 97, loss is 1.7133579
+...
+...
+```
+> About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training.html).
 
 ## Usage:
 
@@ -75,4 +93,14 @@ parameters/options:
   --data_path           the storage path of datasetd 
   --device_id           the device which used to evaluate model.
   --checkpoint_path     the checkpoint file path used to evaluate model.
-```
\ No newline at end of file
+```
+
+### Distribute Training
+
+```
+Usage: sh run_distribute_train.sh [MINDSPORE_HCCL_CONFIG_PATH] [DATA_PATH]
+
+parameters/options:
+  MINDSPORE_HCCL_CONFIG_PATH   HCCL configuration file path.
+  DATA_PATH                    the storage path of dataset.
+```
-- 
GitLab