提交 ff2dc0b6 编写于 作者: J JunYuLiu

fix links

上级 f2ade57a
...@@ -313,7 +313,7 @@ User process: ...@@ -313,7 +313,7 @@ User process:
3. Execute stage 2 training: There are two devices in stage 2 training environment. The weight shape of the MatMul operator on each device is \[4, 8]. Load the initialized model parameter data from the integrated checkpoint file and then perform training. 3. Execute stage 2 training: There are two devices in stage 2 training environment. The weight shape of the MatMul operator on each device is \[4, 8]. Load the initialized model parameter data from the integrated checkpoint file and then perform training.
> For details about the distributed environment configuration and training code, see [Distributed Training](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training.html). > For details about the distributed environment configuration and training code, see [Distributed Training](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training_ascend.html).
> >
> This document provides the example code for integrating checkpoint files and loading checkpoint files before distributed training. The code is for reference only. > This document provides the example code for integrating checkpoint files and loading checkpoint files before distributed training. The code is for reference only.
......
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
## Overview ## Overview
In deep learning, one usually has to deal with the huge model problem, in which the total size of parameters in the model is beyond the device memory capacity. To efficiently train a huge model, one solution is to employ homogenous accelerators (*e.g.*, Ascend 910 AI Accelerator and GPU) for [distributed training](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training.html). When the size of a model is hundreds of GBs or several TBs, In deep learning, one usually has to deal with the huge model problem, in which the total size of parameters in the model is beyond the device memory capacity. To efficiently train a huge model, one solution is to employ homogenous accelerators (*e.g.*, Ascend 910 AI Accelerator and GPU) for distributed training. When the size of a model is hundreds of GBs or several TBs,
the number of required accelerators is too overwhelming for people to access, resulting in this solution inapplicable. One alternative is Host+Device hybrid training. This solution simultaneously leveraging the huge memory in hosts and fast computation in accelerators, is a promisingly the number of required accelerators is too overwhelming for people to access, resulting in this solution inapplicable. One alternative is Host+Device hybrid training. This solution simultaneously leveraging the huge memory in hosts and fast computation in accelerators, is a promisingly
efficient method for addressing huge model problem. efficient method for addressing huge model problem.
......
...@@ -10,7 +10,7 @@ ...@@ -10,7 +10,7 @@
<!-- /TOC --> <!-- /TOC -->
<a href="https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/advanced_use/mixed_precision.m" target="_blank"><img src="../_static/logo_source.png"></a> <a href="https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/advanced_use/mixed_precision.md" target="_blank"><img src="../_static/logo_source.png"></a>
## Overview ## Overview
......
...@@ -74,7 +74,7 @@ Compared with common training, the quantization aware training requires addition ...@@ -74,7 +74,7 @@ Compared with common training, the quantization aware training requires addition
Next, the LeNet network is used as an example to describe steps 3 and 6. Next, the LeNet network is used as an example to describe steps 3 and 6.
> You can obtain the complete executable sample code at <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/lenet_quant>. > You can obtain the complete executable sample code at <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/lenet_quant>.
### Defining a Fusion Network ### Defining a Fusion Network
...@@ -196,4 +196,4 @@ For details, see <https://www.mindspore.cn/tutorial/en/master/use/multi_platform ...@@ -196,4 +196,4 @@ For details, see <https://www.mindspore.cn/tutorial/en/master/use/multi_platform
[2] Krishnamoorthi R. Quantizing deep convolutional networks for efficient inference: A whitepaper[J]. arXiv preprint arXiv:1806.08342, 2018. [2] Krishnamoorthi R. Quantizing deep convolutional networks for efficient inference: A whitepaper[J]. arXiv preprint arXiv:1806.08342, 2018.
[3] Jacob B, Kligys S, Chen B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2704-2713. [3] Jacob B, Kligys S, Chen B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2704-2713.
\ No newline at end of file
...@@ -316,7 +316,7 @@ load_param_into_net(opt, param_dict) ...@@ -316,7 +316,7 @@ load_param_into_net(opt, param_dict)
3. 执行阶段2训练:阶段2为2卡训练环境,每卡上MatMul算子weight的shape为[4, 8],从合并后的CheckPoint文件加载初始化模型参数数据,之后执行训练。 3. 执行阶段2训练:阶段2为2卡训练环境,每卡上MatMul算子weight的shape为[4, 8],从合并后的CheckPoint文件加载初始化模型参数数据,之后执行训练。
> 具体分布式环境配置和训练部分代码,此处不做详细说明,可以参考[分布式并行训练](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/distributed_training.html) > 具体分布式环境配置和训练部分代码,此处不做详细说明,可以参考[分布式并行训练](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/distributed_training_ascend.html)
章节。 章节。
> >
> 本文档附上对CheckPoint文件做合并处理以及分布式训练前加载CheckPoint文件的示例代码,仅作为参考,实际请参考具体情况实现。 > 本文档附上对CheckPoint文件做合并处理以及分布式训练前加载CheckPoint文件的示例代码,仅作为参考,实际请参考具体情况实现。
......
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
## 概述 ## 概述
在深度学习中,工作人员时常会遇到超大模型的训练问题,即模型参数所占内存超过了设备内存上限。为高效地训练超大模型,一种方案便是[分布式并行训练](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/distributed_training.html),也就是将工作交由同构的多个加速器(如Ascend 910 AI处理器,GPU等)共同完成。但是这种方式在面对几百GB甚至几TB级别的模型时,所需的加速器过多。而当从业者实际难以获取大规模集群时,这种方式难以应用。另一种可行的方案是使用主机端(Host)和加速器(Device)的混合训练模式。此方案同时发挥了主机端内存大和加速器端计算快的优势,是一种解决超大模型训练较有效的方式。 在深度学习中,工作人员时常会遇到超大模型的训练问题,即模型参数所占内存超过了设备内存上限。为高效地训练超大模型,一种方案便是分布式并行训练,也就是将工作交由同构的多个加速器(如Ascend 910 AI处理器,GPU等)共同完成。但是这种方式在面对几百GB甚至几TB级别的模型时,所需的加速器过多。而当从业者实际难以获取大规模集群时,这种方式难以应用。另一种可行的方案是使用主机端(Host)和加速器(Device)的混合训练模式。此方案同时发挥了主机端内存大和加速器端计算快的优势,是一种解决超大模型训练较有效的方式。
在MindSpore中,用户可以将待训练的参数放在主机,同时将必要算子的执行位置配置为主机,其余算子的执行位置配置为加速器,从而方便地实现混合训练。此教程以推荐模型[Wide&Deep](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/recommend/wide_and_deep)为例,讲解MindSpore在主机和Ascend 910 AI处理器的混合训练。 在MindSpore中,用户可以将待训练的参数放在主机,同时将必要算子的执行位置配置为主机,其余算子的执行位置配置为加速器,从而方便地实现混合训练。此教程以推荐模型[Wide&Deep](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/recommend/wide_and_deep)为例,讲解MindSpore在主机和Ascend 910 AI处理器的混合训练。
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册