提交 9940e529 编写于 作者: M mindspore-ci-bot 提交者: Gitee

!496 Modify Asynchronous Data Dump doc

Merge pull request !496 from caifubi/r0.6
......@@ -225,6 +225,8 @@ val:[[1 1]
When the training result deviates from the expectation on Ascend, the input and output of the operator can be dumped for debugging through Asynchronous Data Dump.
> `comm_ops` operators are not supported by Asynchronous Data Dump. `comm_ops` can be found in [Operator List](https://www.mindspore.cn/docs/en/master/operator_list.html).
1. Turn on the switch to save graph IR: `context.set_context(save_graphs=True)`.
2. Execute training script.
3. Open `hwopt_d_end_graph_{graph id}.ir` in the directory you execute the script and find the name of the operators you want to Dump.
......@@ -248,6 +250,9 @@ When the training result deviates from the expectation on Ascend, the input and
}
```
> - Iteration should be set to 0 in non data sink mode and data of every iterationi will be dumped.
> - Iteration should increase by 1 in data sink mode. For example, data of GetNext will be dumped in iteration 0 and data of compute graph will be dumped in iteration 1.
5. Set environment variables.
```bash
......@@ -256,9 +261,8 @@ When the training result deviates from the expectation on Ascend, the input and
export DATA_DUMP_CONFIG_PATH=data_dump.json
```
> Set the environment variables before executing the training script. Setting environment variables during training will not take effect.
> Dump environment variables need to be configured before calling `mindspore.communication.management.init`.
> - Set the environment variables before executing the training script. Setting environment variables during training will not take effect.
> - Dump environment variables need to be configured before calling `mindspore.communication.management.init`.
6. Execute the training script again.
7. Parse the Dump file.
......
......@@ -227,6 +227,8 @@ val:[[1 1]
在Ascend环境上执行训练,当训练结果和预期有偏差时,可以通过异步数据Dump功能保存算子的输入输出进行调试。
> 异步数据Dump不支持`comm_ops`类别的算子,算子类别详见[算子支持列表](https://www.mindspore.cn/docs/zh-CN/master/operator_list.html)。
1. 开启IR保存开关: `context.set_context(save_graphs=True)`
2. 执行网络脚本。
3. 查看执行目录下的`hwopt_d_end_graph_{graph id}.ir`,找到需要Dump的算子名称。
......@@ -250,6 +252,9 @@ val:[[1 1]
}
```
> - 非数据下沉模式下,iteration需要设置成0,并且会Dump出每个epoch的数据。
> - 数据下沉模式iteration需要增加1。例如iteration-0会Dump出GetNext算子的数据,而iteration-1才会去Dump真正的计算图的数据。
5. 设置数据Dump的环境变量。
```bash
......@@ -258,9 +263,8 @@ val:[[1 1]
export DATA_DUMP_CONFIG_PATH=data_dump.json
```
> 在网络脚本执行前,设置好环境变量;网络脚本执行过程中设置将会不生效。
> 在分布式场景下,Dump环境变量需要调用`mindspore.communication.management.init`之前配置。
> - 在网络脚本执行前,设置好环境变量;网络脚本执行过程中设置将会不生效。
> - 在分布式场景下,Dump环境变量需要调用`mindspore.communication.management.init`之前配置。
6. 再次执行用例进行异步数据Dump。
7. 解析文件。
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册