!25 update distributed_training doc

Merge pull request !25 from lichen/master

!25 update distributed_training doc
Merge pull request !25 from lichen/master
6058db43 · mindspore-ci-bot · Gitee · 0e2d8b29 · a4045f10 · 6058db43
2 changed file
--- a/tutorials/source_en/advanced_use/distributed_training.md
+++ b/tutorials/source_en/advanced_use/distributed_training.md
@@ -37,7 +37,7 @@ In this tutorial, we will learn how to train the ResNet-50 network in `DATA_PARA
 When distributed training is performed in the lab environment, you need to configure the networking information file for the current multi-card environment. If HUAWEI CLOUD is used, skip this section.
-The Ascend 910 AI processor and 1980 AIServer are used as an example. The JSON configuration file of a two-card environment is as follows. In this example, the configuration file is named rank_table.json.
+The Ascend 910 AI processor and AIServer are used as an example. The JSON configuration file of a two-card environment is as follows. In this example, the configuration file is named rank_table.json.
 ```json
 {
@@ -67,11 +67,12 @@ The Ascend 910 AI processor and 1980 AIServer are used as an example. The JSON c
 ```
 The following parameters need to be modified based on the actual training environment:
-1. `server_num` indicates the number of hosts, and `server_id` indicates the IP address of the local host.
+1. `board_id` indicates the environment in which the program runs.
-2. `device_num`, `para_plane_nic_num`, and `instance_count` indicate the number of cards.
+2. `server_num` indicates the number of hosts, and `server_id` indicates the IP address of the local host.
-3. `rank_id` indicates the logical sequence number of a card, which starts from 0 fixedly. `device_id` indicates the physical sequence number of a card, that is, the actual sequence number of the host where the card is located.
+3. `device_num`, `para_plane_nic_num`, and `instance_count` indicate the number of cards.
-4. `device_ip` indicates the IP address of the NIC. You can run the `cat /etc/hccn.conf` command on the current host to obtain the IP address of the NIC.
+4. `rank_id` indicates the logical sequence number of a card, which starts from 0 fixedly. `device_id` indicates the physical sequence number of a card, that is, the actual sequence number of the host where the card is located.
-5. `para_plane_nic_name` indicates the name of the corresponding NIC.
+5. `device_ip` indicates the IP address of the NIC. You can run the `cat /etc/hccn.conf` command on the current host to obtain the IP address of the NIC.
+6. `para_plane_nic_name` indicates the name of the corresponding NIC.
 After the networking information file is ready, add the file path to the environment variable `MINDSPORE_HCCL_CONFIG_PATH`. In addition, the `device_id` information needs to be transferred to the script. In this example, the information is transferred by configuring the environment variable DEVICE_ID.

--- a/tutorials/source_zh_cn/advanced_use/distributed_training.md
+++ b/tutorials/source_zh_cn/advanced_use/distributed_training.md
@@ -36,7 +36,7 @@ MindSpore支持数据并行及自动并行。自动并行是MindSpore融合了
 在实验室环境进行分布式训练时，需要配置当前多卡环境的组网信息文件。如果使用华为云环境，可以跳过本小节。
-以Ascend 910 AI处理器、1980 AIServer为例，一个两卡环境的json配置文件示例如下，本样例将该配置文件命名为rank_table.json。
+以Ascend 910 AI处理器、AIServer为例，一个两卡环境的json配置文件示例如下，本样例将该配置文件命名为rank_table.json。
 ```json
 {
@@ -66,11 +66,13 @@ MindSpore支持数据并行及自动并行。自动并行是MindSpore融合了
 ```
 其中需要根据实际训练环境修改的参数项有：
-1. `server_num`表示机器数量， `server_id`表示本机IP地址。
-2. `device_num`、`para_plane_nic_num`及`instance_count`表示卡的数量。
+1. `board_id`表示当前运行的环境。
-3. `rank_id`表示卡逻辑序号，固定从0开始编号，`device_id`表示卡物理序号，即卡所在机器中的实际序号。
+2. `server_num`表示机器数量， `server_id`表示本机IP地址。
-4. `device_ip`表示网卡IP地址，可以在当前机器执行指令`cat /etc/hccn.conf`获取网卡IP地址。
+3. `device_num`、`para_plane_nic_num`及`instance_count`表示卡的数量。
-5. `para_plane_nic_name`对应网卡名称。
+4. `rank_id`表示卡逻辑序号，固定从0开始编号，`device_id`表示卡物理序号，即卡所在机器中的实际序号。
+5. `device_ip`表示网卡IP地址，可以在当前机器执行指令`cat /etc/hccn.conf`获取网卡IP地址。
+6. `para_plane_nic_name`对应网卡名称。
 组网信息文件准备好后，将文件路径加入环境变量`MINDSPORE_HCCL_CONFIG_PATH`中。此外需要将`device_id`信息传入脚本中，本样例通过配置环境变量DEVICE_ID的方式传入。
@@ -221,7 +223,7 @@ opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, mome
 `context.set_auto_parallel_context()`是提供给用户设置并行参数的接口。主要参数包括：
 - `parallel_mode`：分布式并行模式。可选数据并行`ParallelMode.DATA_PARALLEL`及自动并行`ParallelMode.AUTO_PARALLEL`。
- `mirror_mean`: 反向计算时，框架内部会将数据并行参数分散在多台机器的梯度进行收集，得到全局梯度值后再传入优化器中更新。
+- `mirror_mean`: 反向计算时，框架内部会将数据并行参数分散在多台机器的梯度值进行收集，得到全局梯度值后再传入优化器中更新。
 设置为True对应`allreduce_mean`操作，False对应`allreduce_sum`操作。