提交 fc17c5f8 编写于 作者: M mindspore-ci-bot 提交者: Gitee

!610 modify checkpoint note

Merge pull request !610 from changzherui/mod_ckpt_note
......@@ -76,6 +76,8 @@ MindSpore adds underscores (_) and digits at the end of the user-defined prefix
For example, `resnet50_3-2_32.ckpt` indicates the CheckPoint file generated during the 32th step of the second epoch after the script is executed for the third time.
> - When the saved single model parameter is large (more than 64M), it will fail to save due to the limitation of Protobuf's own data size. At this time, the restriction can be lifted by setting the environment variable `PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python`.
> - When performing distributed parallel training tasks, each process needs to set different `directory` parameters to save the CheckPoint file to a different directory to prevent file read and write confusion.
### CheckPoint Configuration Policies
......
......@@ -76,7 +76,8 @@ MindSpore为方便用户区分每次生成的文件,会在用户定义的前
例:`resnet50_3-2_32.ckpt` 表示运行第3次脚本生成的第2个epoch的第32个step的CheckPoint文件。
> 当保存的单个模型参数较大时(超过64M),会因为Protobuf自身对数据大小的限制,导致保存失败。这时可通过设置环境变量`PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python`解除限制。
> - 当保存的单个模型参数较大时(超过64M),会因为Protobuf自身对数据大小的限制,导致保存失败。这时可通过设置环境变量`PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python`解除限制。
> - 当执行分布式并行训练任务时,每个进程需要设置不同`directory`参数,用以保存CheckPoint文件到不同的目录,以防文件发生读写错乱。
### CheckPoint配置策略
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册