From 3beec229b2e308b7085923bf974a92db73d2cdab Mon Sep 17 00:00:00 2001 From: Xiaoda Zhang Date: Mon, 27 Jul 2020 19:42:06 +0800 Subject: [PATCH] fix some erros in host+device training --- .../advanced_use/host_device_training.md | 32 ++++++------------- .../advanced_use/host_device_training.md | 32 ++++++------------- 2 files changed, 20 insertions(+), 44 deletions(-) diff --git a/tutorials/source_en/advanced_use/host_device_training.md b/tutorials/source_en/advanced_use/host_device_training.md index 4865b2b5..36ab97be 100644 --- a/tutorials/source_en/advanced_use/host_device_training.md +++ b/tutorials/source_en/advanced_use/host_device_training.md @@ -25,7 +25,7 @@ This tutorial introduces how to train [Wide&Deep](https://gitee.com/mindspore/mi 1. Prepare the model. The Wide&Deep code can be found at: , in which `train_and_eval_auto_parallel.py` is the main function for training, `src/` directory contains the model definition, data processing and configuration files, `script/` directory contains the launch scripts in different modes. -2. Prepare the dataset. The dataset can be found at: . Use the script `/src/preprocess_data.py` to transform dataset into MindRecord format. +2. Prepare the dataset. The dataset can be found at: . Use the script `src/preprocess_data.py` to transform dataset into MindRecord format. 3. Configure the device information. When performing training in the bare-metal environment, the network information file needs to be configured. This example only employs one accelerator, thus `rank_table_1p_0.json` containing #0 accelerator is configured as follows (you need to check the server's IP first): @@ -47,32 +47,20 @@ This tutorial introduces how to train [Wide&Deep](https://gitee.com/mindspore/mi ## Configuring for Hybrid Training -1. Configure the place of trainable parameters. In the file `train_and_eval_auto_parallel.py`, add a configuration `dataset_sink_mode=False` in `model.train` to indicate that parameters are placed on hosts instead of accelerators. In the file `train_and_eval_auto_parallel.py`, change the configuration `context.set_auto_parallel_context(parallel_mode=ParallelMode.AUTO_PARALLEL, mirror_mean=True)` to `context.set_auto_parallel_context(parallel_mode=ParallelMode.SEMI_AUTO_PARALLEL, mirror_mean=True)`, in order to adapt for the host+device mode. - -2. Configure the sparsity of parameters. The actual values that involve in computation are indices, instead of entire parameters. - - In the file `train_and_eval_auto_parallel.py`, add a configuration `context.set_context(enable_sparse=True)`. - - In the `construct` function of `class WideDeepModel(nn.Cell)` of file `src/wide_and_deep.py`, to adapt for sparse parameters, replace the return value as: - - ``` - return out, deep_id_embs - ``` - -3. Configure the place of operators and optimizers. In `class WideDeepModel(nn.Cell)` of file `src/wide_and_deep.py`, add the attribute of running on host for `EmbeddingLookup`: - +1. Configure the flag of hybrid training. In the function `argparse_init` of file `src/config.py`, change the default value of `host_device_mix` to be `1`; change `self.host_device_mix` in function `__init__` of `class WideDeepConfig` to be `1`: ```python - self.embeddinglookup = nn.EmbeddingLookup(target='CPU') + self.host_device_mix = 1 ``` - - In `class TrainStepWrap(nn.Cell)` of file `src/wide_and_deep.py`, add the attribute of running on host for two optimizers: - + +2. Check placement of necessary operators and optimizers. In class `WideDeepModel` of file `src/wide_and_deep.py`, check the placement of `EmbeddingLookup` is at host: ```python - self.optimizer_w.sparse_opt.add_prim_attr('primitive_target', 'CPU') + self.deep_embeddinglookup = nn.EmbeddingLookup() + self.wide_embeddinglookup = nn.EmbeddingLookup() ``` - + In `class TrainStepWrap(nn.Cell)` of file `src/wide_and_deep.py`, check two optimizer are also at host: ```python - self.optimizer_d.sparse_opt.add_prim_attr('primitive_target', 'CPU') + self.optimizer_w.sparse_opt.add_prim_attr("primitive_target", "CPU") + self.optimizer_d.sparse_opt.add_prim_attr("primitive_target", "CPU") ``` ## Training the Model diff --git a/tutorials/source_zh_cn/advanced_use/host_device_training.md b/tutorials/source_zh_cn/advanced_use/host_device_training.md index ea132dee..ec902d9a 100644 --- a/tutorials/source_zh_cn/advanced_use/host_device_training.md +++ b/tutorials/source_zh_cn/advanced_use/host_device_training.md @@ -22,7 +22,7 @@ 1. 准备模型代码。Wide&Deep的代码可参见:,其中,`train_and_eval_auto_parallel.py`为训练的主函数所在,`src/`目录中包含Wide&Deep模型的定义、数据处理和配置信息等,`script/`目录中包含不同配置下的训练脚本。 -2. 准备数据集。数据集下载链接:。利用脚本`/src/preprocess_data.py`将数据集转换为MindRecord格式。 +2. 准备数据集。数据集下载链接:。利用脚本`src/preprocess_data.py`将数据集转换为MindRecord格式。 3. 配置处理器信息。在裸机环境(即本地有Ascend 910 AI 处理器)进行分布式训练时,需要配置加速器信息文件。此样例只使用一个加速器,故只需配置包含0号卡的`rank_table_1p_0.json`文件(每台机器的具体的IP信息不同,需要查看网络配置来设定,此为示例),如下所示: @@ -44,32 +44,20 @@ ## 配置混合执行 -1. 配置待训练参数的存储位置。在`train_and_eval_auto_parallel.py`文件`train_and_eval`函数的`model.train`调用中,增加配置`dataset_sink_mode=False`,以指示参数数据保持在主机端,而非加速器端。在`train_and_eval_auto_parallel.py`文件中改变配置`context.set_auto_parallel_context(parallel_mode=ParallelMode.AUTO_PARALLEL, mirror_mean=True)`为`context.set_auto_parallel_context(parallel_mode=ParallelMode.SEMI_AUTO_PARALLEL, mirror_mean=True)`,即配置为半自动并行,以适配混合并行模式。 - -2. 配置待训练参数的稀疏性质。由于待训练参数的规模大,需将参数配置为稀疏,也就是:真正参与计算的并非全量的参数,而是其索引值。 - - 在`train_and_eval_auto_parallel.py`文件中增加配置`context.set_context(enable_sparse=True)`。 - - 在`src/wide_and_deep.py`文件的`class WideDeepModel(nn.Cell)`类的`construct`函数中,将函数的返回值替换为如下值,以适配参数的稀疏性: - - ``` - return out, deep_id_embs - ``` - -3. 配置必要算子和优化器的执行位置。在`src/wide_and_deep.py`的`class WideDeepModel(nn.Cell)`中,为`EmbeddingLookup`设置主机端执行的属性, - +1. 配置混合训练标识。在`src/config.py`文件中,设置`argparse_init`函数中的`host_device_mix`默认值为`1`,设置`WideDeepConfig`类的`__init__`函数中`self.host_device_mix`为`1`: ```python - self.embeddinglookup = nn.EmbeddingLookup(target='CPU') + self.host_device_mix = 1 ``` - - 在`src/wide_and_deep.py`文件的`class TrainStepWrap(nn.Cell)`中,为两个优化器增加配置主机端执行的属性。 - + +2. 检查必要算子和优化器的执行位置。在`src/wide_and_deep.py`的`WideDeepModel`类中,检查`EmbeddingLookup`为主机端执行: ```python - self.optimizer_w.sparse_opt.add_prim_attr('primitive_target', 'CPU') + self.deep_embeddinglookup = nn.EmbeddingLookup() + self.wide_embeddinglookup = nn.EmbeddingLookup() ``` - + 在`src/wide_and_deep.py`文件的`class TrainStepWrap(nn.Cell)`中,检查两个优化器主机端执行的属性。 ```python - self.optimizer_d.sparse_opt.add_prim_attr('primitive_target', 'CPU') + self.optimizer_w.sparse_opt.add_prim_attr("primitive_target", "CPU") + self.optimizer_d.sparse_opt.add_prim_attr("primitive_target", "CPU") ``` ## 训练模型 -- GitLab