提交 16fd69de 编写于 作者: C caifubi

Modify Asynchronous Data Dump doc

上级 eae2675d
...@@ -221,6 +221,8 @@ val:[[1 1] ...@@ -221,6 +221,8 @@ val:[[1 1]
When the training result deviates from the expectation on Ascend, the input and output of the operator can be dumped for debugging through Asynchronous Data Dump. When the training result deviates from the expectation on Ascend, the input and output of the operator can be dumped for debugging through Asynchronous Data Dump.
> `comm_ops` operators are not supported by Asynchronous Data Dump. `comm_ops` can be found in [Operator List](https://www.mindspore.cn/docs/en/master/operator_list.html).
1. Turn on the switch to save graph IR: `context.set_context(save_graphs=True)`. 1. Turn on the switch to save graph IR: `context.set_context(save_graphs=True)`.
2. Execute training script. 2. Execute training script.
3. Open `hwopt_d_end_graph_{graph id}.ir` in the directory you execute the script and find the name of the operators you want to Dump. 3. Open `hwopt_d_end_graph_{graph id}.ir` in the directory you execute the script and find the name of the operators you want to Dump.
...@@ -244,6 +246,9 @@ When the training result deviates from the expectation on Ascend, the input and ...@@ -244,6 +246,9 @@ When the training result deviates from the expectation on Ascend, the input and
} }
``` ```
> - Iteration should be set to 0 in non data sink mode and data of every iterationi will be dumped.
> - Iteration should increase by 1 in data sink mode. For example, data of GetNext will be dumped in iteration 0 and data of compute graph will be dumped in iteration 1.
5. Set environment variables. 5. Set environment variables.
```bash ```bash
...@@ -252,9 +257,8 @@ When the training result deviates from the expectation on Ascend, the input and ...@@ -252,9 +257,8 @@ When the training result deviates from the expectation on Ascend, the input and
export DATA_DUMP_CONFIG_PATH=data_dump.json export DATA_DUMP_CONFIG_PATH=data_dump.json
``` ```
> Set the environment variables before executing the training script. Setting environment variables during training will not take effect. > - Set the environment variables before executing the training script. Setting environment variables during training will not take effect.
> - Dump environment variables need to be configured before calling `mindspore.communication.management.init`.
> Dump environment variables need to be configured before calling `mindspore.communication.management.init`.
6. Execute the training script again. 6. Execute the training script again.
7. Parse the Dump file. 7. Parse the Dump file.
......
...@@ -221,6 +221,8 @@ val:[[1 1] ...@@ -221,6 +221,8 @@ val:[[1 1]
在Ascend环境上执行训练,当训练结果和预期有偏差时,可以通过异步数据Dump功能保存算子的输入输出进行调试。 在Ascend环境上执行训练,当训练结果和预期有偏差时,可以通过异步数据Dump功能保存算子的输入输出进行调试。
> 异步数据Dump不支持`comm_ops`类别的算子,算子类别详见[算子支持列表](https://www.mindspore.cn/docs/zh-CN/master/operator_list.html)。
1. 开启IR保存开关: `context.set_context(save_graphs=True)` 1. 开启IR保存开关: `context.set_context(save_graphs=True)`
2. 执行网络脚本。 2. 执行网络脚本。
3. 查看执行目录下的`hwopt_d_end_graph_{graph id}.ir`,找到需要Dump的算子名称。 3. 查看执行目录下的`hwopt_d_end_graph_{graph id}.ir`,找到需要Dump的算子名称。
...@@ -244,6 +246,9 @@ val:[[1 1] ...@@ -244,6 +246,9 @@ val:[[1 1]
} }
``` ```
> - 非数据下沉模式下,iteration需要设置成0,并且会Dump出每个epoch的数据。
> - 数据下沉模式iteration需要增加1。例如iteration-0会Dump出GetNext算子的数据,而iteration-1才会去Dump真正的计算图的数据。
5. 设置数据Dump的环境变量。 5. 设置数据Dump的环境变量。
```bash ```bash
...@@ -252,9 +257,8 @@ val:[[1 1] ...@@ -252,9 +257,8 @@ val:[[1 1]
export DATA_DUMP_CONFIG_PATH=data_dump.json export DATA_DUMP_CONFIG_PATH=data_dump.json
``` ```
> 在网络脚本执行前,设置好环境变量;网络脚本执行过程中设置将会不生效。 > - 在网络脚本执行前,设置好环境变量;网络脚本执行过程中设置将会不生效。
> - 在分布式场景下,Dump环境变量需要调用`mindspore.communication.management.init`之前配置。
> 在分布式场景下,Dump环境变量需要调用`mindspore.communication.management.init`之前配置。
6. 再次执行用例进行异步数据Dump。 6. 再次执行用例进行异步数据Dump。
7. 解析文件。 7. 解析文件。
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册