提交 af841e78 编写于 作者: M mindspore-ci-bot 提交者: Gitee

!250 Unify code formats in tutorials

Merge pull request !250 from lvmingfu/lmf-docs
...@@ -95,8 +95,8 @@ load_param_into_net(net, param_dict) ...@@ -95,8 +95,8 @@ load_param_into_net(net, param_dict)
In the preceding information: In the preceding information:
- `load_checkpoint()`: loads the checkpoint model parameter file and returns a parameter dictionary. - `load_checkpoint`: loads the checkpoint model parameter file and returns a parameter dictionary.
- `load_param_into_net()`: loads model parameter data to the network. - `load_param_into_net`: loads model parameter data to the network.
- `CKP_1-4_32.ckpt`: name of the saved checkpoint model parameter file. - `CKP_1-4_32.ckpt`: name of the saved checkpoint model parameter file.
> If a new checkpoint file is directly saved in the training environment based on the current training data and the parameter values already exist on the network, skip this step and you do not need to import the checkpoint files. > If a new checkpoint file is directly saved in the training environment based on the current training data and the parameter values already exist on the network, skip this step and you do not need to import the checkpoint files.
...@@ -132,7 +132,7 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2 ...@@ -132,7 +132,7 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2
> To ensure that the parameter update speed remains unchanged, you need to integrate the parameters saved in the optimizer, for example, moments.model\_parallel\_weight. > To ensure that the parameter update speed remains unchanged, you need to integrate the parameters saved in the optimizer, for example, moments.model\_parallel\_weight.
2. Define, instantiate, and execute the AllGather Cell, and obtain data on all devices. 2. Define, instantiate, and execute the `AllGather` Cell, and obtain data on all devices.
``` ```
from mindspore.nn.cell import Cell from mindspore.nn.cell import Cell
...@@ -156,16 +156,16 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2 ...@@ -156,16 +156,16 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2
param_data_moments = allgather_net(param_data_moments) param_data_moments = allgather_net(param_data_moments)
``` ```
The value of param\_data is the integration of data on each device in dimension 0. The data value is \[\[1, 2], \[3, 4], \[5, 6], \[7, 8]], and the shape is \[4, 2]. The raw data value of param\_data is \[\[1, 2, 3, 4], \[5, 6, 7, 8]], and the shape is \[2, 4]. The data needs to be redivided and integrated. The value of `param_data` is the integration of data on each device in dimension 0. The data value is \[\[1, 2], \[3, 4], \[5, 6], \[7, 8]], and the shape is \[4, 2]. The raw data value of `param_data` is \[\[1, 2, 3, 4], \[5, 6, 7, 8]], and the shape is \[2, 4]. The data needs to be redivided and integrated.
3. Divide the data obtained from AllGather. 3. Divide the data obtained from `AllGather`.
``` ```
slice_list = np.split(param_data.asnumpy(), 4, axis=0) # 4:group_size, number of nodes in cluster slice_list = np.split(param_data.asnumpy(), 4, axis=0) # 4:group_size, number of nodes in cluster
slice_lis_moments = np.split(param_data_moments.asnumpy(), 4, axis=0) # 4: group_size, number of nodes in cluster slice_lis_moments = np.split(param_data_moments.asnumpy(), 4, axis=0) # 4: group_size, number of nodes in cluster
``` ```
The result of param\_data is as follows: The result of `param_data` is as follows:
slice_list[0] --- [1, 2] Slice data on device0 slice_list[0] --- [1, 2] Slice data on device0
slice_list[1] --- [3, 4] Slice data on device1 slice_list[1] --- [3, 4] Slice data on device1
...@@ -198,7 +198,7 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2 ...@@ -198,7 +198,7 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2
### Saving the Data and Generating a New Checkpoint File ### Saving the Data and Generating a New Checkpoint File
1. Convert param\_dict to param\_list. 1. Convert `param_dict` to `param_list`.
``` ```
param_list = [] param_list = []
...@@ -238,7 +238,7 @@ Call the `load_checkpoint` API to load model parameter data from the checkpoint ...@@ -238,7 +238,7 @@ Call the `load_checkpoint` API to load model parameter data from the checkpoint
param_dict = load_checkpoint("./CKP-Integrated_1-4_32.ckpt") param_dict = load_checkpoint("./CKP-Integrated_1-4_32.ckpt")
``` ```
- `load_checkpoint()`: loads the checkpoint model parameter file and returns a parameter dictionary. - `load_checkpoint`: loads the checkpoint model parameter file and returns a parameter dictionary.
- `CKP-Integrated_1-4_32.ckpt`: name of the checkpoint model parameter file to be loaded. - `CKP-Integrated_1-4_32.ckpt`: name of the checkpoint model parameter file to be loaded.
### Step 2: Dividing a Model Parallel Parameter ### Step 2: Dividing a Model Parallel Parameter
...@@ -422,7 +422,7 @@ User process: ...@@ -422,7 +422,7 @@ User process:
- `mode=context.GRAPH_MODE`: sets the running mode to graph mode for distributed training. (The PyNative mode does not support parallel running.) - `mode=context.GRAPH_MODE`: sets the running mode to graph mode for distributed training. (The PyNative mode does not support parallel running.)
- `device_id`: physical sequence number of a device, that is, the actual sequence number of the device on a computer where the device is located. - `device_id`: physical sequence number of a device, that is, the actual sequence number of the device on a computer where the device is located.
- `init()`: completes the distributed training initialization. - `init`: completes the distributed training initialization.
The command output is as follows. The command output is as follows.
......
...@@ -180,11 +180,11 @@ opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01, 0. ...@@ -180,11 +180,11 @@ opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01, 0.
### Calling the High-level `Model` API To Train and Save the Model File ### Calling the High-level `Model` API To Train and Save the Model File
After data preprocessing, network definition, and loss function and optimizer definition are complete, model training can be performed. Model training involves two iterations: multi-round iteration (epoch) of datasets and single-step iteration based on the batch size of datasets. The single-step iteration refers to extracting data from a dataset by batch, inputting the data to a network to calculate a loss function, and then calculating and updating a gradient of training parameters by using an optimizer. After data preprocessing, network definition, and loss function and optimizer definition are complete, model training can be performed. Model training involves two iterations: multi-round iteration (`epoch`) of datasets and single-step iteration based on the batch size of datasets. The single-step iteration refers to extracting data from a dataset by `batch`, inputting the data to a network to calculate a loss function, and then calculating and updating a gradient of training parameters by using an optimizer.
To simplify the training process, MindSpore encapsulates the high-level `Model` API. You can enter the network, loss function, and optimizer to complete the `Model` initialization, and then call the `train` API for training. The `train` API parameters include the number of iterations (`epoch`) and dataset (`dataset`). To simplify the training process, MindSpore encapsulates the high-level `Model` API. You can enter the network, loss function, and optimizer to complete the `Model` initialization, and then call the `train` API for training. The `train` API parameters include the number of iterations (`epoch`) and dataset (`dataset`).
Model saving is a process of persisting training parameters. In the `Model` class, the model is saved using the callback function, as shown in the following code: You can set the parameters of the callback function by using `CheckpointConfig`. `save_checkpoint_steps` indicates that the model is saved once every fixed number of single-step iterations, and `keep_checkpoint_max` indicates the maximum number of saved models. Model saving is a process of persisting training parameters. In the `Model` class, the model is saved using the `callback` function, as shown in the following code: You can set the parameters of the `callback` function by using `CheckpointConfig`. `save_checkpoint_steps` indicates that the model is saved once every fixed number of single-step iterations, and `keep_checkpoint_max` indicates the maximum number of saved models.
```python ```python
''' '''
...@@ -204,7 +204,7 @@ model.train(epoch_size, dataset, callbacks=[ckpoint_cb, loss_cb]) ...@@ -204,7 +204,7 @@ model.train(epoch_size, dataset, callbacks=[ckpoint_cb, loss_cb])
### Loading and Validating the Saved Model ### Loading and Validating the Saved Model
The trained model file (such as resnet.ckpt) can be used to predict the class of a new image. Run the `load_checkpoint` command to load the model file. Then call the `eval` API of `Model` to predict the new image class. The trained model file (such as `resnet.ckpt`) can be used to predict the class of a new image. Run the `load_checkpoint` command to load the model file. Then call the `eval` API of `Model` to predict the new image class.
```python ```python
param_dict = load_checkpoint(args_opt.checkpoint_path) param_dict = load_checkpoint(args_opt.checkpoint_path)
......
...@@ -18,7 +18,7 @@ ...@@ -18,7 +18,7 @@
## Overview ## Overview
This section describes how to use the customized capabilities provided by MindSpore, such as callback, metrics, and log printing, to help you quickly debug the training network. This section describes how to use the customized capabilities provided by MindSpore, such as `callback`, `metrics`,`Print` operator and log printing, to help you quickly debug the training network.
## Introduction to Callback ## Introduction to Callback
...@@ -29,10 +29,10 @@ For example, you can monitor the loss, save model parameters, dynamically adjust ...@@ -29,10 +29,10 @@ For example, you can monitor the loss, save model parameters, dynamically adjust
MindSpore provides the callback capabilities to allow users to insert customized operations in a specific phase of training or inference, including: MindSpore provides the callback capabilities to allow users to insert customized operations in a specific phase of training or inference, including:
- Callback functions such as ModelCheckpoint, LossMonitor, and SummaryStep provided by the MindSpore framework - Callback functions such as `ModelCheckpoint`, `LossMonitor`, and `SummaryStep` provided by the MindSpore framework
- Custom callback functions - Custom callback functions
Usage: Transfer the callback object in the model.train method. The callback object can be a list, for example: Usage: Transfer the callback object in the `model.train` method. The callback object can be a list, for example:
```python ```python
ckpt_cb = ModelCheckpoint() ckpt_cb = ModelCheckpoint()
...@@ -41,14 +41,14 @@ summary_cb = SummaryStep() ...@@ -41,14 +41,14 @@ summary_cb = SummaryStep()
model.train(epoch, dataset, callbacks=[ckpt_cb, loss_cb, summary_cb]) model.train(epoch, dataset, callbacks=[ckpt_cb, loss_cb, summary_cb])
``` ```
ModelCheckpoint can save model parameters for retraining or inference. `ModelCheckpoint` can save model parameters for retraining or inference.
LossMonitor can output loss information in logs for users to view. In addition, LossMonitor monitors the loss value change during training. When the loss value is `Nan` or `Inf`, the training terminates. `LossMonitor` can output loss information in logs for users to view. In addition, `LossMonitor` monitors the loss value change during training. When the loss value is `Nan` or `Inf`, the training terminates.
SummaryStep can save the training information to a file for later use. SummaryStep can save the training information to a file for later use.
During the training process, the callback list will execute the callback function in the defined order. Therefore, in the definition process, the dependency between callbacks needs to be considered. During the training process, the callback list will execute the callback function in the defined order. Therefore, in the definition process, the dependency between callbacks needs to be considered.
### Custom Callback ### Custom Callback
You can customize callback based on the callback base class as required. You can customize callback based on the `callback` base class as required.
The callback base class is defined as follows: The callback base class is defined as follows:
...@@ -127,8 +127,8 @@ The output is as follows: ...@@ -127,8 +127,8 @@ The output is as follows:
epoch: 20 step: 32 loss: 2.298344373703003 epoch: 20 step: 32 loss: 2.298344373703003
``` ```
This callback function is used to terminate the training within a specified period. You can use the `run_context.original_args()` method to obtain the `cb_params` dictionary, which contains the main attribute information described above. This callback function is used to terminate the training within a specified period. You can use the `run_context.original_args` method to obtain the `cb_params` dictionary, which contains the main attribute information described above.
In addition, you can modify and add values in the dictionary. In the preceding example, an `init_time` object is defined in `begin()` and transferred to the `cb_params` dictionary. In addition, you can modify and add values in the dictionary. In the preceding example, an `init_time` object is defined in `begin` and transferred to the `cb_params` dictionary.
A decision is made at each `step_end`. When the training time is greater than the configured time threshold, a training termination signal will be sent to the `run_context` to terminate the training in advance and the current values of epoch, step, and loss will be printed. A decision is made at each `step_end`. When the training time is greater than the configured time threshold, a training termination signal will be sent to the `run_context` to terminate the training in advance and the current values of epoch, step, and loss will be printed.
## MindSpore Metrics ## MindSpore Metrics
...@@ -155,16 +155,16 @@ ds_eval = create_dataset() ...@@ -155,16 +155,16 @@ ds_eval = create_dataset()
output = model.eval(ds_eval) output = model.eval(ds_eval)
``` ```
The `model.eval()` method returns a dictionary that contains the metrics and results transferred to the metrics. The `model.eval` method returns a dictionary that contains the metrics and results transferred to the metrics.
You can also define your own metrics class by inheriting the `Metric` base class and rewriting the `clear`, `update`, and `eval` methods. You can also define your own metrics class by inheriting the `Metric` base class and rewriting the `clear`, `update`, and `eval` methods.
The `accuracy` operator is used as an example to describe the internal implementation principle. The `accuracy` operator is used as an example to describe the internal implementation principle.
The `accuracy` inherits the `EvaluationBase` base class and rewrites the preceding three methods. The `accuracy` inherits the `EvaluationBase` base class and rewrites the preceding three methods.
The `clear()` method initializes related calculation parameters in the class. The `clear` method initializes related calculation parameters in the class.
The `update()` method accepts the predicted value and tag value and updates the internal variables of accuracy. The `update` method accepts the predicted value and tag value and updates the internal variables of accuracy.
The `eval()` method calculates related indicators and returns the calculation result. The `eval` method calculates related indicators and returns the calculation result.
By invoking the `eval` method of `accuracy`, you will obtain the calculation result. By invoking the `eval` method of `accuracy`, you will obtain the calculation result.
You can understand how `accuracy` runs by using the following code: You can understand how `accuracy` runs by using the following code:
...@@ -184,8 +184,8 @@ The output is as follows: ...@@ -184,8 +184,8 @@ The output is as follows:
Accuracy is 0.6667 Accuracy is 0.6667
``` ```
## MindSpore Print Operator ## MindSpore Print Operator
MindSpore-developed print operator is used to print the tensors or character strings input by users. Multiple strings, multiple tensors, and a combination of tensors and strings are supported, which are separated by comma (,). MindSpore-developed `Print` operator is used to print the tensors or character strings input by users. Multiple strings, multiple tensors, and a combination of tensors and strings are supported, which are separated by comma (,).
The use method of MindSpore print operator is the same that of other operators. You need to assert MindSpore print operator in `__init__`() and invoke using `construct()`. The following is an example. The use method of MindSpore `Print` operator is the same that of other operators. You need to assert MindSpore `Print` operator in `__init__` and invoke using `construct`. The following is an example.
```python ```python
import numpy as np import numpy as np
from mindspore import Tensor from mindspore import Tensor
...@@ -224,12 +224,10 @@ val:[[1 1] ...@@ -224,12 +224,10 @@ val:[[1 1]
## Log-related Environment Variables and Configurations ## Log-related Environment Variables and Configurations
MindSpore uses glog to output logs. The following environment variables are commonly used: MindSpore uses glog to output logs. The following environment variables are commonly used:
1. GLOG_v specifies the log level. The default value is 2, indicating the WARNING level. The values are as follows: 0: DEBUG; 1: INFO; 2: WARNING; 3: ERROR. 1. `GLOG_v` specifies the log level. The default value is 2, indicating the WARNING level. The values are as follows: 0: DEBUG; 1: INFO; 2: WARNING; 3: ERROR.
2. When GLOG_logtostderr is set to 1, logs are output to the screen. If the value is set to 0, logs are output to a file. Default value: 1 2. When `GLOG_logtostderr` is set to 1, logs are output to the screen. If the value is set to 0, logs are output to a file. Default value: 1
3. GLOG_log_dir=YourPath specifies the log output path. If GLOG_logtostderr is set to 0, value of this variable must be specified. If GLOG_log_dir is specified and the value of GLOG_logtostderr is 1, logs are output to the screen but not to a file. Logs of C++ and Python will be output to different files. The file name of C++ log complies with the naming rule of GLOG log file. Here, the name is `mindspore.MachineName.UserName.log.LogLevel.Timestamp`. The file name of Python log is `mindspore.log`. 3. GLOG_log_dir=*YourPath* specifies the log output path. If `GLOG_logtostderr` is set to 0, value of this variable must be specified. If `GLOG_log_dir is` specified and the value of `GLOG_logtostderr` is 1, logs are output to the screen but not to a file. Logs of C++ and Python will be output to different files. The file name of C++ log complies with the naming rule of GLOG log file. Here, the name is `mindspore.MachineName.UserName.log.LogLevel.Timestamp`. The file name of Python log is `mindspore.log`.
4. MS_SUBMODULE_LOG_v="{SubModule1:LogLevel1,SubModule2:LogLevel2,...}" specifies log levels of C++ sub modules of MindSpore. The specified sub module log level will overwrite the global log level. The meaning of submodule log level is same as GLOG_v, the sub modules of MindSpore grouped by source directory is as the bellow table. E.g. when set `GLOG_v=1 MS_SUBMODULE_LOG_v="{PARSER:2,ANALYZER:2}"` then log levels of `PARSER` and `ANALYZER` are WARNING, other modules' log levels are INFO. 4. `MS_SUBMODULE_LOG_v="{SubModule1:LogLevel1,SubModule2:LogLevel2,...}"` specifies log levels of C++ sub modules of MindSpore. The specified sub module log level will overwrite the global log level. The meaning of submodule log level is same as `GLOG_v`, the sub modules of MindSpore grouped by source directory is as the bellow table. E.g. when set `GLOG_v=1 MS_SUBMODULE_LOG_v="{PARSER:2,ANALYZER:2}"` then log levels of `PARSER` and `ANALYZER` are WARNING, other modules' log levels are INFO.
> The glog does not support to rotate the log files. If you need to control the disk space usage for log files, you can use the log file management tools provided by the operating system, such as Linux logrotate.
Sub moudles of MindSpore grouped by source directory: Sub moudles of MindSpore grouped by source directory:
......
...@@ -377,4 +377,4 @@ print(loss) ...@@ -377,4 +377,4 @@ print(loss)
2.3050091 2.3050091
``` ```
In the preceding execution, an intermediate result of network execution can be obtained at any required place in construct function, and the network can be debugged by using the Python Debugger (pdb). In the preceding execution, an intermediate result of network execution can be obtained at any required place in `construt` function, and the network can be debugged by using the Python Debugger (pdb).
...@@ -119,7 +119,7 @@ if __name__ == "__main__": ...@@ -119,7 +119,7 @@ if __name__ == "__main__":
In the preceding code: In the preceding code:
- `mode=context.GRAPH_MODE`: sets the running mode to graph mode for distributed training. (The PyNative mode does not support parallel running.) - `mode=context.GRAPH_MODE`: sets the running mode to graph mode for distributed training. (The PyNative mode does not support parallel running.)
- `device_id`: physical sequence number of a device, that is, the actual sequence number of the device on the corresponding host. - `device_id`: physical sequence number of a device, that is, the actual sequence number of the device on the corresponding host.
- `init()`: enables HCCL communication and completes the distributed training initialization. - `init`: enables HCCL communication and completes the distributed training initialization.
## Loading the Dataset in Data Parallel Mode ## Loading the Dataset in Data Parallel Mode
...@@ -233,7 +233,7 @@ The `Momentum` optimizer is used as the parameter update tool. The definition is ...@@ -233,7 +233,7 @@ The `Momentum` optimizer is used as the parameter update tool. The definition is
## Training the Network ## Training the Network
`context.set_auto_parallel_context()` is an API for users to set parallel training parameters and must be called before the initialization of `Model`. If no parameters are specified, MindSpore will automatically set parameters to the empirical values based on the parallel mode. For example, in data parallel mode, `parameter_broadcast` is enabled by default. The related parameters are as follows: `context.set_auto_parallel_context` is an API for users to set parallel training parameters and must be called before the initialization of `Model`. If no parameters are specified, MindSpore will automatically set parameters to the empirical values based on the parallel mode. For example, in data parallel mode, `parameter_broadcast` is enabled by default. The related parameters are as follows:
- `parallel_mode`: parallel distributed mode. The default value is `ParallelMode.STAND_ALONE`. The options are `ParallelMode.DATA_PARALLEL` and `ParallelMode.AUTO_PARALLEL`. - `parallel_mode`: parallel distributed mode. The default value is `ParallelMode.STAND_ALONE`. The options are `ParallelMode.DATA_PARALLEL` and `ParallelMode.AUTO_PARALLEL`.
- `parameter_broadcast`: whether to broadcast initialized parameters. The default value is `True` in `DATA_PARALLEL` and `HYBRID_PARALLEL` mode. - `parameter_broadcast`: whether to broadcast initialized parameters. The default value is `True` in `DATA_PARALLEL` and `HYBRID_PARALLEL` mode.
...@@ -341,7 +341,7 @@ For details about other environment variables, see configuration items in the in ...@@ -341,7 +341,7 @@ For details about other environment variables, see configuration items in the in
The running time is about 5 minutes, which is mainly occupied by operator compilation. The actual training time is within 20 seconds. You can use `ps -ef | grep pytest` to monitor task processes. The running time is about 5 minutes, which is mainly occupied by operator compilation. The actual training time is within 20 seconds. You can use `ps -ef | grep pytest` to monitor task processes.
Log files are saved in the device directory. The env.log file records environment variable information. The train.log file records the loss function information. The following is an example: Log files are saved in the `device` directory. The `env.log` file records environment variable information. The `train.log` file records the loss function information. The following is an example:
``` ```
epoch: 1 step: 156, loss is 2.0084016 epoch: 1 step: 156, loss is 2.0084016
......
...@@ -37,14 +37,14 @@ This document describes the computation process by using examples of automatic a ...@@ -37,14 +37,14 @@ This document describes the computation process by using examples of automatic a
## Automatic Mixed Precision ## Automatic Mixed Precision
To use the automatic mixed precision, you need to invoke the corresponding API, which takes the network to be trained and the optimizer as the input. This API converts the operators of the entire network into FP16 operators (except the BatchNorm and Loss operators). To use the automatic mixed precision, you need to invoke the corresponding API, which takes the network to be trained and the optimizer as the input. This API converts the operators of the entire network into FP16 operators (except the `BatchNorm` and Loss operators).
The procedure is as follows: The procedure is as follows:
1. Introduce the MindSpore mixed precision API. 1. Introduce the MindSpore mixed precision API.
2. Define the network. This step is the same as the common network definition. (You do not need to manually configure the precision of any specific operator.) 2. Define the network. This step is the same as the common network definition. (You do not need to manually configure the precision of any specific operator.)
3. Use the amp.build_train_network() API to encapsulate the network model and optimizer. In this step, MindSpore automatically converts the operators to the required format. 3. Use the `amp.build_train_network` API to encapsulate the network model and optimizer. In this step, MindSpore automatically converts the operators to the required format.
A code example is as follows: A code example is as follows:
...@@ -98,7 +98,7 @@ MindSpore also supports manual mixed precision. It is assumed that only one dens ...@@ -98,7 +98,7 @@ MindSpore also supports manual mixed precision. It is assumed that only one dens
The following is the procedure for implementing manual mixed precision: The following is the procedure for implementing manual mixed precision:
1. Define the network. This step is similar to step 2 in the automatic mixed precision. 1. Define the network. This step is similar to step 2 in the automatic mixed precision.
2. Configure the mixed precision. Use net.to_float(mstype.float16) to set all operators of the cell and its sub-cells to FP16. Then, configure the dense to FP32. 2. Configure the mixed precision. Use `net.to_float(mstype.float16)` to set all operators of the cell and its sub-cells to FP16. Then, configure the dense to FP32.
3. Use TrainOneStepCell to encapsulate the network model and optimizer. 3. Use TrainOneStepCell to encapsulate the network model and optimizer.
......
...@@ -30,8 +30,8 @@ At the beginning of AI algorithm design, related security threats are sometimes ...@@ -30,8 +30,8 @@ At the beginning of AI algorithm design, related security threats are sometimes
This section describes how to use MindArmour in adversarial attack and defense by taking the Fast Gradient Sign Method (FGSM) attack algorithm and Natural Adversarial Defense (NAD) algorithm as examples. This section describes how to use MindArmour in adversarial attack and defense by taking the Fast Gradient Sign Method (FGSM) attack algorithm and Natural Adversarial Defense (NAD) algorithm as examples.
> The current sample is for CPU, GPU and Ascend 910 AI processor. You can find the complete executable sample code at:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/model_safety> > The current sample is for CPU, GPU and Ascend 910 AI processor. You can find the complete executable sample code at:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/model_safety>
> - mnist_attack_fgsm.py: contains attack code. > - `mnist_attack_fgsm.py`: contains attack code.
> - mnist_defense_nad.py: contains defense code. > - `mnist_defense_nad.py`: contains defense code.
## Creating an Target Model ## Creating an Target Model
...@@ -69,7 +69,7 @@ TAG = 'demo' ...@@ -69,7 +69,7 @@ TAG = 'demo'
### Loading the Dataset ### Loading the Dataset
Use the MnistDataset API provided by the MindSpore dataset to load the MNIST dataset. Use the `MnistDataset` API provided by the MindSpore dataset to load the MNIST dataset.
```python ```python
# generate training data # generate training data
......
...@@ -229,7 +229,7 @@ The ResNet-50 network migration and training on the Ascend 910 is used as an exa ...@@ -229,7 +229,7 @@ The ResNet-50 network migration and training on the Ascend 910 is used as an exa
Similar to the `Estimator` API of TensorFlow, the defined network prototype, loss function, and optimizer are transferred to the `Model` API of MindSpore and automatically combined into a network that can be used for training. Similar to the `Estimator` API of TensorFlow, the defined network prototype, loss function, and optimizer are transferred to the `Model` API of MindSpore and automatically combined into a network that can be used for training.
To use loss scale in training, define a loss\_scale\_manager and transfer it to the `Model` API. To use loss scale in training, define a `loss_scale_manager` and transfer it to the `Model` API.
```python ```python
loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False) loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False)
...@@ -241,7 +241,7 @@ The ResNet-50 network migration and training on the Ascend 910 is used as an exa ...@@ -241,7 +241,7 @@ The ResNet-50 network migration and training on the Ascend 910 is used as an exa
model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale, metrics={'acc'}) model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale, metrics={'acc'})
``` ```
Similar to `estimator.train()` of TensorFlow, you can call the `model.train` API to perform training. Functions such as CheckPoint and intermediate result printing can be defined on the `model.train` API in Callback mode. Similar to `estimator.train` of TensorFlow, you can call the `model.train` API to perform training. Functions such as CheckPoint and intermediate result printing can be defined on the `model.train` API in Callback mode.
```python ```python
time_cb = TimeMonitor(data_size=step_size) time_cb = TimeMonitor(data_size=step_size)
......
...@@ -86,8 +86,8 @@ Currently, MindSpore GPU supports the long short-term memory (LSTM) network for ...@@ -86,8 +86,8 @@ Currently, MindSpore GPU supports the long short-term memory (LSTM) network for
3. After the model is obtained, use the validation dataset to check the accuracy of model. 3. After the model is obtained, use the validation dataset to check the accuracy of model.
> The current sample is for the Ascend 910 AI processor. You can find the complete executable sample code at:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/lstm> > The current sample is for the Ascend 910 AI processor. You can find the complete executable sample code at:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/lstm>
> - main.py: code file, including code for data preprocessing, network definition, and model training. > - `main.py`: code file, including code for data preprocessing, network definition, and model training.
> - config.py: some configurations on the network, including the batch size and number of training epochs. > - `config.py`: some configurations on the network, including the `batch size` and number of training epochs.
## Implementation ## Implementation
......
...@@ -41,7 +41,7 @@ The environment requirements are as follows: ...@@ -41,7 +41,7 @@ The environment requirements are as follows:
- decorator - decorator
- scipy - scipy
> numpy, decorator and scipy can be installed through pip. The reference command is as following. > `numpy`, `decorator` and `scipy` can be installed through `pip`. The reference command is as following.
```bash ```bash
pip3 install numpy==1.16 decorator scipy pip3 install numpy==1.16 decorator scipy
...@@ -71,7 +71,7 @@ The compilation procedure is as follows: ...@@ -71,7 +71,7 @@ The compilation procedure is as follows:
4. Obtain the compilation result. 4. Obtain the compilation result.
Go to the predict/output directory of the source code to view the generated package. The package name is MSPredict-{Version number}-{Host platform}_{Device platform}.tar.gz, for example, MSPredict-0.1.0-linux_aarch64.tar.gz. The package contains the following directories: Go to the `predict/output` directory of the source code to view the generated package. The package name is MSPredict-*Version number*-*Host platform*_*Device platform*.tar.gz, for example, MSPredict-0.1.0-linux_aarch64.tar.gz. The package contains the following directories:
- include: MindSpore Predict header file. - include: MindSpore Predict header file.
- lib: MindSpore Predict dynamic library. - lib: MindSpore Predict dynamic library.
...@@ -90,12 +90,12 @@ To perform on-device model inference using MindSpore, perform the following step ...@@ -90,12 +90,12 @@ To perform on-device model inference using MindSpore, perform the following step
param_dict = load_checkpoint(ckpoint_file_name=ckpt_file_path) param_dict = load_checkpoint(ckpoint_file_name=ckpt_file_path)
load_param_into_net(net, param_dict) load_param_into_net(net, param_dict)
``` ```
2. Call the `export` API to export the .ms model file on the device. 2. Call the `export` API to export the `.ms` model file on the device.
```python ```python
export(net, input_data, file_name="./lenet.ms", file_format='LITE') export(net, input_data, file_name="./lenet.ms", file_format='LITE')
``` ```
Take the LeNet network as an example. The generated on-device model file is `lenet.ms`. The complete sample code lenet.py is as follows: Take the LeNet network as an example. The generated on-device model file is `lenet.ms`. The complete sample code `lenet.py` is as follows:
```python ```python
import os import os
import numpy as np import numpy as np
...@@ -155,12 +155,12 @@ if __name__ == '__main__': ...@@ -155,12 +155,12 @@ if __name__ == '__main__':
### Implementing On-Device Inference ### Implementing On-Device Inference
Use the .ms model file and image data as input to create a session and implement inference on the device. Use the `.ms` model file and image data as input to create a session and implement inference on the device.
![](./images/side_infer_process.png) ![](./images/side_infer_process.png)
Figure 1 On-device inference sequence diagram Figure 1 On-device inference sequence diagram
1. Load the .ms model file to the memory buffer. The ReadFile function needs to be implemented by users, according to the [C++ tutorial](http://www.cplusplus.com/doc/tutorial/files/). 1. Load the `.ms` model file to the memory buffer. The ReadFile function needs to be implemented by users, according to the [C++ tutorial](http://www.cplusplus.com/doc/tutorial/files/).
```cpp ```cpp
// read model file // read model file
std::string modelPath = "./models/lenet/lenet.ms"; std::string modelPath = "./models/lenet/lenet.ms";
...@@ -178,7 +178,7 @@ Figure 1 On-device inference sequence diagram ...@@ -178,7 +178,7 @@ Figure 1 On-device inference sequence diagram
free(graphBuf); free(graphBuf);
``` ```
3. Read the input data for inference from the memory buffer and call the SetData() API to set the input data to input tensor. 3. Read the input data for inference from the memory buffer and call the `SetData` API to set the input data to `input tensor`.
```cpp ```cpp
// load input buffer // load input buffer
size_t inputSize = 0; size_t inputSize = 0;
...@@ -191,19 +191,19 @@ Figure 1 On-device inference sequence diagram ...@@ -191,19 +191,19 @@ Figure 1 On-device inference sequence diagram
inputs[0]->SetData(inputBuf); inputs[0]->SetData(inputBuf);
``` ```
4. Call the Run() API in the session to perform inference. 4. Call the `Run` API in the `session` to perform inference.
```cpp ```cpp
// session run // session run
int ret = session->Run(inputs); int ret = session->Run(inputs);
``` ```
5. Call the GetAllOutput() API to obtain the output. 5. Call the `GetAllOutput` API to obtain the output.
```cpp ```cpp
// get output // get output
std::map<std::string, std::vector<Tensor *>> outputs = session->GetAllOutput(); std::map<std::string, std::vector<Tensor *>> outputs = session->GetAllOutput();
``` ```
6. Call the Getdata() API to get the output data. 6. Call the `Getdata` API to get the output data.
```cpp ```cpp
// get output data // get output data
float *data = nullptr; float *data = nullptr;
...@@ -230,10 +230,10 @@ Figure 1 On-device inference sequence diagram ...@@ -230,10 +230,10 @@ Figure 1 On-device inference sequence diagram
outputs.clear(); outputs.clear();
``` ```
Select the LeNet network and set the inference input to lenet.bin. The complete sample code lenet.cpp is as follows: Select the LeNet network and set the inference input to `lenet.bin`. The complete sample code `lenet.cpp` is as follows:
> MindSpore Predict uses FlatBuffers to define models. The FlatBuffers header file is required for parsing models. Therefore, you need to configure the FlatBuffers header file. > MindSpore Predict uses `FlatBuffers` to define models. The `FlatBuffers` header file is required for parsing models. Therefore, you need to configure the `FlatBuffers` header file.
> >
> Method: Copy the flatbuffers folder in MindSpore root directory/third_party/flatbuffers/include to the directory at the same level as session.h. > Method: Copy the `flatbuffers` folder in MindSpore root directory`/third_party/flatbuffers/include` to the directory at the same level as `session.h`.
```cpp ```cpp
#include <string> #include <string>
......
...@@ -179,8 +179,8 @@ Use the `save_graphs` option of `context` to record the computational graph afte ...@@ -179,8 +179,8 @@ Use the `save_graphs` option of `context` to record the computational graph afte
### Collect Performance Profile Data ### Collect Performance Profile Data
To enable the performance profiling of neural networks, MindInsight Profiler APIs should be added into the script. At first, the MindInsight `Profiler` object need To enable the performance profiling of neural networks, `MindInsight Profiler` APIs should be added into the script. At first, the `MindInsight Profiler` object need
to be set after set context and before the network initialization. Then, at the end of the training, `Profiler.analyse()` should be called to finish profiling and generate the perforamnce to be set after set context and before the network initialization. Then, at the end of the training, `Profiler.analyse` should be called to finish profiling and generate the perforamnce
analyse results. analyse results.
The sample code is as follows: The sample code is as follows:
...@@ -265,13 +265,13 @@ MindInsight provides user with web services. Run the following command to view t ...@@ -265,13 +265,13 @@ MindInsight provides user with web services. Run the following command to view t
ps -ef | grep mindinsight ps -ef | grep mindinsight
``` ```
Run the following command to access the working directory WORKSPACE corresponding to the service process based on the service process ID: Run the following command to access the working directory `WORKSPACE` corresponding to the service process based on the service process ID:
```bash ```bash
lsof -p <PID> | grep access lsof -p <PID> | grep access
``` ```
Output with the working directory WORKSPACE as follows: Output with the working directory `WORKSPACE` as follows:
```bash ```bash
gunicorn <PID> <USER> <FD> <TYPE> <DEVICE> <SIZE/OFF> <NODE> <WORKSPACE>/log/gunicorn/access.log gunicorn <PID> <USER> <FD> <TYPE> <DEVICE> <SIZE/OFF> <NODE> <WORKSPACE>/log/gunicorn/access.log
......
...@@ -89,7 +89,7 @@ For details about MindSpore modules, search on the [MindSpore API Page](https:// ...@@ -89,7 +89,7 @@ For details about MindSpore modules, search on the [MindSpore API Page](https://
Before compiling code, you need to learn basic information about the hardware and backend required for MindSpore running. Before compiling code, you need to learn basic information about the hardware and backend required for MindSpore running.
You can use `context.set_context()` to configure the information required for running, such as the running mode, backend information, and hardware information. You can use `context.set_context` to configure the information required for running, such as the running mode, backend information, and hardware information.
Import the `context` module and configure the required information. Import the `context` module and configure the required information.
...@@ -107,7 +107,7 @@ if __name__ == "__main__": ...@@ -107,7 +107,7 @@ if __name__ == "__main__":
... ...
``` ```
This example runs in graph mode. You can configure hardware information based on site requirements. For example, if the code runs on the Ascend AI processor, set `--device_target` to `Ascend`. This rule also applies to the code running on the CPU and GPU. For details about parameters, see the API description for `context.set_context()`. This example runs in graph mode. You can configure hardware information based on site requirements. For example, if the code runs on the Ascend AI processor, set `--device_target` to `Ascend`. This rule also applies to the code running on the CPU and GPU. For details about parameters, see the API description for `context.set_context`.
## Processing Data ## Processing Data
...@@ -115,12 +115,12 @@ Datasets are important for training. A good dataset can effectively improve trai ...@@ -115,12 +115,12 @@ Datasets are important for training. A good dataset can effectively improve trai
### Defining the Dataset and Data Operations ### Defining the Dataset and Data Operations
Define the `create_dataset()` function to create a dataset. In this function, define the data augmentation and processing operations to be performed. Define the `create_dataset` function to create a dataset. In this function, define the data augmentation and processing operations to be performed.
1. Define the dataset. 1. Define the dataset.
2. Define parameters required for data augmentation and processing. 2. Define parameters required for data augmentation and processing.
3. Generate corresponding data augmentation operations according to the parameters. 3. Generate corresponding data augmentation operations according to the parameters.
4. Use the `map()` mapping function to apply data operations to the dataset. 4. Use the `map` mapping function to apply data operations to the dataset.
5. Process the generated dataset. 5. Process the generated dataset.
```python ```python
...@@ -226,7 +226,7 @@ def fc_with_initialize(input_channels, out_channels): ...@@ -226,7 +226,7 @@ def fc_with_initialize(input_channels, out_channels):
To use MindSpore for neural network definition, inherit `mindspore.nn.cell.Cell`. `Cell` is the base class of all neural networks (such as `Conv2d`). To use MindSpore for neural network definition, inherit `mindspore.nn.cell.Cell`. `Cell` is the base class of all neural networks (such as `Conv2d`).
Define each layer of a neural network in the `__init__()` method in advance, and then define the `construct()` method to complete the forward construction of the neural network. According to the structure of the LeNet network, define the network layers as follows: Define each layer of a neural network in the `__init__` method in advance, and then define the `construct` method to complete the forward construction of the neural network. According to the structure of the LeNet network, define the network layers as follows:
```python ```python
import mindspore.ops.operations as P import mindspore.ops.operations as P
...@@ -399,7 +399,7 @@ checkpoint_lenet-1_1875.ckpt ...@@ -399,7 +399,7 @@ checkpoint_lenet-1_1875.ckpt
``` ```
In the preceding information: In the preceding information:
`checkpoint_lenet-1_1875.ckpt`: saved model parameter file. The following refers to saved files as well. The file name format is checkpoint_{network name}-{epoch No.}_{step No.}.ckpt. `checkpoint_lenet-1_1875.ckpt`: saved model parameter file. The following refers to saved files as well. The file name format is checkpoint_*network name*-*epoch No.*_*step No.*.ckpt.
## Validating the Model ## Validating the Model
...@@ -427,7 +427,7 @@ if __name__ == "__main__": ...@@ -427,7 +427,7 @@ if __name__ == "__main__":
``` ```
In the preceding information: In the preceding information:
`load_checkpoint()`: This API is used to load the CheckPoint model parameter file and return a parameter dictionary. `load_checkpoint`: This API is used to load the CheckPoint model parameter file and return a parameter dictionary.
`checkpoint_lenet-3_1404.ckpt`: name of the saved CheckPoint model file. `checkpoint_lenet-3_1404.ckpt`: name of the saved CheckPoint model file.
`load_param_into_net`: This API is used to load parameters to the network. `load_param_into_net`: This API is used to load parameters to the network.
......
...@@ -34,11 +34,11 @@ This section takes a Square operator as an example to describe how to customize ...@@ -34,11 +34,11 @@ This section takes a Square operator as an example to describe how to customize
The primitive of an operator is a subclass inherited from `PrimitiveWithInfer`. The type name of the subclass is the operator name. The primitive of an operator is a subclass inherited from `PrimitiveWithInfer`. The type name of the subclass is the operator name.
The definition of the custom operator primitive is the same as that of the built-in operator primitive. The definition of the custom operator primitive is the same as that of the built-in operator primitive.
- The attribute is defined by the input parameter of the constructor function `__init__()`. The operator in this test case has no attribute. Therefore, `__init__()` has only one input parameter. For details about test cases in which operators have attributes, see [custom add3](https://gitee.com/mindspore/mindspore/tree/master/tests/st/ops/custom_ops_tbe/cus_add3.py) in the MindSpore source code. - The attribute is defined by the input parameter of the constructor function `__init__`. The operator in this test case has no attribute. Therefore, `__init__` has only one input parameter. For details about test cases in which operators have attributes, see [custom add3](https://gitee.com/mindspore/mindspore/tree/master/tests/st/ops/custom_ops_tbe/cus_add3.py) in the MindSpore source code.
- The input and output names are defined by the `init_prim_io_names()` function. - The input and output names are defined by the `init_prim_io_names` function.
- The shape inference method of the output tensor is defined in the `infer_shape()` function, and the dtype inference method of the output tensor is defined in the `infer_dtype()` function. - The shape inference method of the output tensor is defined in the `infer_shape` function, and the dtype inference method of the output tensor is defined in the `infer_dtype` function.
The only difference between a custom operator and a built-in operator is that the operator implementation function (`from square_impl import CusSquareImpl`) needs to be imported to the `__init__()` function to register the operator implementation with the backend for the custom operator. In this test case, the operator implementation and information are defined in `square_impl.py`, and the definition will be described in the following parts. The only difference between a custom operator and a built-in operator is that the operator implementation function (`from square_impl import CusSquareImpl`) needs to be imported to the `__init__` function to register the operator implementation with the backend for the custom operator. In this test case, the operator implementation and information are defined in `square_impl.py`, and the definition will be described in the following parts.
The following code takes the Square operator primitive `cus_square.py` as an example: The following code takes the Square operator primitive `cus_square.py` as an example:
...@@ -74,8 +74,8 @@ The entry function of an operator describes the internal process of compiling th ...@@ -74,8 +74,8 @@ The entry function of an operator describes the internal process of compiling th
1. Prepare placeholders to be input. A placeholder will return a tensor object that represents a group of input data. 1. Prepare placeholders to be input. A placeholder will return a tensor object that represents a group of input data.
2. Call the computable function. The computable function uses the API provided by the TBE to describe the computation logic of the operator. 2. Call the computable function. The computable function uses the API provided by the TBE to describe the computation logic of the operator.
3. Call the scheduling module. The model tiles the operator data based on the scheduling description and specifies the data transfer process to ensure optimal hardware execution. By default, the automatic scheduling module (`auto_schedule`) can be used. 3. Call the scheduling module. The model tiles the operator data based on the scheduling description and specifies the data transfer process to ensure optimal hardware execution. By default, the automatic scheduling module (`auto_schedule`) can be used.
4. Call `cce_build_code()` to compile and generate an operator binary file. 4. Call `cce_build_code` to compile and generate an operator binary file.
> The input parameters of the entry function require the input information of each operator, output information of each operator, operator attributes (optional), and kernel_name (name of the generated operator binary file). The input and output information is encapsulated in dictionaries, including the input and output shape and dtype when the operator is called on the network. > The input parameters of the entry function require the input information of each operator, output information of each operator, operator attributes (optional), and `kernel_name` (name of the generated operator binary file). The input and output information is encapsulated in dictionaries, including the input and output shape and dtype when the operator is called on the network.
For details about TBE operator development, visit the [TBE website](https://www.huaweicloud.com/ascend/tbe). For details about how to debug and optimize the TBE operator, visit the [Mind Studio website](https://www.huaweicloud.com/intl/en-us/ascend/mindstudio). For details about TBE operator development, visit the [TBE website](https://www.huaweicloud.com/ascend/tbe). For details about how to debug and optimize the TBE operator, visit the [Mind Studio website](https://www.huaweicloud.com/intl/en-us/ascend/mindstudio).
...@@ -85,7 +85,7 @@ The operator information is key for the backend to select the operator implement ...@@ -85,7 +85,7 @@ The operator information is key for the backend to select the operator implement
> The numbers and sequences of the input and output information defined in the operator information must be the same as those in the parameters of the entry function of the operator implementation and those listed in the operator primitive. > The numbers and sequences of the input and output information defined in the operator information must be the same as those in the parameters of the entry function of the operator implementation and those listed in the operator primitive.
> If an operator has attributes, use `attr()` to describe the attribute information in the operator information. The attribute names must be the same as those in the operator primitive definition. > If an operator has attributes, use `attr` to describe the attribute information in the operator information. The attribute names must be the same as those in the operator primitive definition.
### Example ### Example
......
...@@ -47,7 +47,7 @@ MindSpore provides write operation tools to write user-defined raw data in MindS ...@@ -47,7 +47,7 @@ MindSpore provides write operation tools to write user-defined raw data in MindS
The field type can be int32, int64, float32, float64, string, or bytes. The field type can be int32, int64, float32, float64, string, or bytes.
The field shape can be a one-dimensional array represented by [-1], a two-dimensional array represented by [m, n], or a three-dimensional array represented by [x, y, z]. The field shape can be a one-dimensional array represented by [-1], a two-dimensional array represented by [m, n], or a three-dimensional array represented by [x, y, z].
> 1. The type of a field with the shape attribute can only be int32, int64, float32, or float64. > 1. The type of a field with the shape attribute can only be int32, int64, float32, or float64.
> 2. If the field has the shape attribute, prepare the data of numpy.ndarray type and transfer the data to the write_raw_data API. > 2. If the field has the shape attribute, prepare the data of `numpy.ndarray` type and transfer the data to the `write_raw_data` API.
Examples: Examples:
- Image classification - Image classification
......
...@@ -41,7 +41,7 @@ The operations can be performed separately. In practice, they are often used tog ...@@ -41,7 +41,7 @@ The operations can be performed separately. In practice, they are often used tog
![avatar](../images/dataset_pipeline.png) ![avatar](../images/dataset_pipeline.png)
In the following example, the shuffle, batch, and repeat operations are performed when the MNIST dataset is read. In the following example, the `shuffle`, `batch`, and `repeat` operations are performed when the MNIST dataset is read.
```python ```python
import mindspore.dataset as ds import mindspore.dataset as ds
...@@ -59,7 +59,7 @@ The following describes how to construct a simple dataset `ds1` and perform data ...@@ -59,7 +59,7 @@ The following describes how to construct a simple dataset `ds1` and perform data
```python ```python
import mindspore.dataset as ds import mindspore.dataset as ds
``` ```
2. Define the `generator_func()` function for dataset generating. 2. Define the `generator_func` function for dataset generating.
```python ```python
def generator_func(): def generator_func():
for i in range(5): for i in range(5):
...@@ -88,7 +88,7 @@ In limited datasets, to optimize the network, a dataset is usually trained for m ...@@ -88,7 +88,7 @@ In limited datasets, to optimize the network, a dataset is usually trained for m
> In machine learning, an epoch refers to one cycle through the full training dataset. > In machine learning, an epoch refers to one cycle through the full training dataset.
During multiple epochs, `repeat()` can be used to increase the data size. The definition of `repeat()` is as follows: During multiple epochs, `repeat` can be used to increase the data size. The definition of `repeat` is as follows:
```python ```python
def repeat(self, count=None): def repeat(self, count=None):
``` ```
...@@ -118,7 +118,7 @@ ds2: ...@@ -118,7 +118,7 @@ ds2:
[4 5 6] [4 5 6]
``` ```
### batch ### batch
Combine data records in datasets into batches. In practice, data can be processed in batches. Training data in batches can reduce training steps and accelerate the training process. MindSpore uses the `batch()` function to implement the batch operation. The function is defined as follows: Combine data records in datasets into batches. In practice, data can be processed in batches. Training data in batches can reduce training steps and accelerate the training process. MindSpore uses the `batch` function to implement the batch operation. The function is defined as follows:
![avatar](../images/batch.png) ![avatar](../images/batch.png)
...@@ -167,11 +167,11 @@ You can shuffle ordered or repeated datasets. ...@@ -167,11 +167,11 @@ You can shuffle ordered or repeated datasets.
![avatar](../images/shuffle.png) ![avatar](../images/shuffle.png)
The shuffle operation is used to shuffle data. A larger value of buffer_size indicates a higher shuffling degree, consuming more time and computing resources. The shuffle operation is used to shuffle data. A larger value of buffer_size indicates a higher shuffling degree, consuming more time and computing resources.
The definition of `shuffle()` is as follows: The definition of `shuffle` is as follows:
```python ```python
def shuffle(self, buffer_size): def shuffle(self, buffer_size):
``` ```
Call `shuffle()` to shuffle the dataset `ds1`. The sample code is as follows: Call `shuffle` to shuffle the dataset `ds1`. The sample code is as follows:
```python ```python
print("Before shuffle:") print("Before shuffle:")
...@@ -200,19 +200,19 @@ After shuffle: ...@@ -200,19 +200,19 @@ After shuffle:
``` ```
### map ### map
The map operation is used to process data. For example, convert the dataset of color images into the dataset of grayscale images. You can flexibly perform the operation as required. The map operation is used to process data. For example, convert the dataset of color images into the dataset of grayscale images. You can flexibly perform the operation as required.
MindSpore provides the `map()` function to map datasets. You can apply the provided functions or operators to the specified column data. MindSpore provides the `map` function to map datasets. You can apply the provided functions or operators to the specified column data.
You can customize the function or use `c_transforms` or `py_transforms` for data augmentation. You can customize the function or use `c_transforms` or `py_transforms` for data augmentation.
> For details about data augmentation operations, see Data Augmentation section. > For details about data augmentation operations, see Data Augmentation section.
![avatar](../images/map.png) ![avatar](../images/map.png)
The definition of `map()` is as follows: The definition of `map` is as follows:
```python ```python
def map(self, input_columns=None, operations=None, output_columns=None, columns_order=None, def map(self, input_columns=None, operations=None, output_columns=None, columns_order=None,
num_parallel_workers=None): num_parallel_workers=None):
``` ```
In the following example, the `map()` function is used to apply the defined anonymous function (lambda function) to the dataset `ds1` so that the data values in the dataset are multiplied by 2. In the following example, the `map` function is used to apply the defined anonymous function (lambda function) to the dataset `ds1` so that the data values in the dataset are multiplied by 2.
```python ```python
func = lambda x : x*2 # Define lambda function to multiply each element by 2. func = lambda x : x*2 # Define lambda function to multiply each element by 2.
ds2 = ds1.map(input_columns="data", operations=func) ds2 = ds1.map(input_columns="data", operations=func)
...@@ -228,7 +228,7 @@ The code output is as follows. Data values in each row of the dataset `ds2` is m ...@@ -228,7 +228,7 @@ The code output is as follows. Data values in each row of the dataset `ds2` is m
[8 10 12] [8 10 12]
``` ```
### zip ### zip
MindSpore provides the `zip()` function to combine multiple datasets into one dataset. MindSpore provides the `zip` function to combine multiple datasets into one dataset.
> If the column names in the two datasets are the same, the two datasets are not combined. Therefore, pay attention to column names. > If the column names in the two datasets are the same, the two datasets are not combined. Therefore, pay attention to column names.
> If the number of rows in the two datasets is different, the number of rows after combination is the same as the smaller number. > If the number of rows in the two datasets is different, the number of rows after combination is the same as the smaller number.
```python ```python
...@@ -267,7 +267,7 @@ MindSpore provides the `c_transforms` and `py_transforms` module functions for u ...@@ -267,7 +267,7 @@ MindSpore provides the `c_transforms` and `py_transforms` module functions for u
| `py_transforms` | Python-based [PIL](https://pypi.org/project/Pillow/) implementation | This module provides multiple image augmentation functions and the method for converting between PIL images and NumPy arrays. | | `py_transforms` | Python-based [PIL](https://pypi.org/project/Pillow/) implementation | This module provides multiple image augmentation functions and the method for converting between PIL images and NumPy arrays. |
For users who would like to use Python PIL in image learning tasks, the `py_transforms` module is a good tool for image augmentation. You can use Python PIL to customize extensions. For users who would like to use Python PIL in image learning tasks, the `py_transforms` module is a good tool for image augmentation. You can use Python PIL to customize extensions.
Data augmentation requires the `map()` function. For details about how to use the `map()` function, see [map](#map). Data augmentation requires the `map` function. For details about how to use the `map` function, see [map](#map).
### Using the `c_transforms` Module ### Using the `c_transforms` Module
...@@ -287,7 +287,7 @@ Data augmentation requires the `map()` function. For details about how to use th ...@@ -287,7 +287,7 @@ Data augmentation requires the `map()` function. For details about how to use th
imgplot_resized = plt.imshow(data["image"]) imgplot_resized = plt.imshow(data["image"])
plt.show() plt.show()
``` ```
The running result shows that the original image is changed from 1024 x 683 pixels to 500 x 500 pixels after data processing by using `Resize()`. The running result shows that the original image is changed from 1024 x 683 pixels to 500 x 500 pixels after data processing by using `Resize`.
![avatar](../images/image.png) ![avatar](../images/image.png)
Figure 1: Original image Figure 1: Original image
...@@ -321,7 +321,7 @@ Figure 2: Image after its size is reset ...@@ -321,7 +321,7 @@ Figure 2: Image after its size is reset
plt.show() plt.show()
``` ```
The running result shows that the original image is changed from 1024 x 683 pixels to 500 x 500 pixels after data processing by using `RandomCrop()`. The running result shows that the original image is changed from 1024 x 683 pixels to 500 x 500 pixels after data processing by using `RandomCrop`.
![avatar](../images/image.png) ![avatar](../images/image.png)
Figure 1: Original image Figure 1: Original image
......
...@@ -16,13 +16,13 @@ Models based on MindSpore training can be used for inference on different hardwa ...@@ -16,13 +16,13 @@ Models based on MindSpore training can be used for inference on different hardwa
1. Inference on the Ascend 910 AI processor 1. Inference on the Ascend 910 AI processor
MindSpore provides the `model.eval()` API for model validation. You only need to import the validation dataset. The processing method of the validation dataset is the same as that of the training dataset. For details about the complete code, see <https://gitee.com/mindspore/mindspore/blob/master/example/resnet50_cifar10/eval.py>. MindSpore provides the `model.eval` API for model validation. You only need to import the validation dataset. The processing method of the validation dataset is the same as that of the training dataset. For details about the complete code, see <https://gitee.com/mindspore/mindspore/blob/master/example/resnet50_cifar10/eval.py>.
```python ```python
res = model.eval(dataset) res = model.eval(dataset)
``` ```
In addition, the` model.predict ()` interface can be used for inference. For detailed usage, please refer to API description. In addition, the` model.predict` interface can be used for inference. For detailed usage, please refer to API description.
2. Inference on the Ascend 310 AI processor 2. Inference on the Ascend 310 AI processor
......
...@@ -38,7 +38,7 @@ During model training, use the callback mechanism to transfer the object of the ...@@ -38,7 +38,7 @@ During model training, use the callback mechanism to transfer the object of the
You can use the `CheckpointConfig` object to set the CheckPoint saving policies. You can use the `CheckpointConfig` object to set the CheckPoint saving policies.
The saved parameters are classified into network parameters and optimizer parameters. The saved parameters are classified into network parameters and optimizer parameters.
`ModelCheckpoint()` provides default configuration policies for users to quickly get started. `ModelCheckpoint` provides default configuration policies for users to quickly get started.
The following describes the usage: The following describes the usage:
```python ```python
from mindspore.train.callback import ModelCheckpoint from mindspore.train.callback import ModelCheckpoint
......
...@@ -102,8 +102,8 @@ load_param_into_net(net, param_dict) ...@@ -102,8 +102,8 @@ load_param_into_net(net, param_dict)
其中, 其中,
- `load_checkpoint()`:通过该接口加载CheckPoint模型参数文件,返回一个参数字典。 - `load_checkpoint`:通过该接口加载CheckPoint模型参数文件,返回一个参数字典。
- `load_param_into_net()`:模型参数数据加载到网络中。 - `load_param_into_net`:模型参数数据加载到网络中。
- `CKP_1-4_32.ckpt`:之前保存的CheckPoint模型参数文件名称。 - `CKP_1-4_32.ckpt`:之前保存的CheckPoint模型参数文件名称。
> 如果直接在训练环境上,基于当前训练得到的数据直接保存新的CheckPoint文件,参数值已经存在在网络中,则可以省略该步骤,无需导入CheckPoint文件。 > 如果直接在训练环境上,基于当前训练得到的数据直接保存新的CheckPoint文件,参数值已经存在在网络中,则可以省略该步骤,无需导入CheckPoint文件。
...@@ -138,7 +138,7 @@ for _, param in net.parameters_and_names(): ...@@ -138,7 +138,7 @@ for _, param in net.parameters_and_names():
``` ```
> 如果要保证参数更新速度不变,需要对优化器中保存的参数,如“moments.model_parallel_weight”,同样做合并处理。 > 如果要保证参数更新速度不变,需要对优化器中保存的参数,如“moments.model_parallel_weight”,同样做合并处理。
2. 定义AllGather类型子图,并实例化和执行,获取所有卡上的数据。 2. 定义`AllGather`类型子图,并实例化和执行,获取所有卡上的数据。
``` ```
from mindspore.nn.cell import Cell from mindspore.nn.cell import Cell
...@@ -162,17 +162,17 @@ for _, param in net.parameters_and_names(): ...@@ -162,17 +162,17 @@ for _, param in net.parameters_and_names():
param_data_moments = allgather_net(param_data_moments) param_data_moments = allgather_net(param_data_moments)
``` ```
​得到的数据param_data为每卡上的数据在维度0上的合并,数据值为 [[1, 2], [3, 4], [5, 6], [7, 8]],shape为[4, 2]。 ​得到的数据`param_data`为每卡上的数据在维度0上的合并,数据值为 [[1, 2], [3, 4], [5, 6], [7, 8]],shape为[4, 2]。
param_data原始数据值为[[1, 2, 3, 4], [5, 6, 7, 8]],shape为[2, 4],需要对数据重新切分合并。 `param_data`原始数据值为[[1, 2, 3, 4], [5, 6, 7, 8]],shape为[2, 4],需要对数据重新切分合并。
3. 切分通过AllGather得到的数据。 3. 切分通过`AllGather`得到的数据。
``` ```
slice_list = np.split(param_data.asnumpy(), 4, axis=0) # 4:group_size, number of nodes in cluster slice_list = np.split(param_data.asnumpy(), 4, axis=0) # 4:group_size, number of nodes in cluster
slice_lis_moments = np.split(param_data_moments.asnumpy(), 4, axis=0) # 4: group_size, number of nodes in cluster slice_lis_moments = np.split(param_data_moments.asnumpy(), 4, axis=0) # 4: group_size, number of nodes in cluster
``` ```
得到结果param_data为: 得到结果`param_data`为:
slice_list[0] --- [1, 2] device0上的切片数据 slice_list[0] --- [1, 2] device0上的切片数据
slice_list[1] --- [3, 4] device1上的切片数据 slice_list[1] --- [3, 4] device1上的切片数据
...@@ -200,12 +200,12 @@ for _, param in net.parameters_and_names(): ...@@ -200,12 +200,12 @@ for _, param in net.parameters_and_names():
``` ```
> 1. 如果存在多个模型并行的参数,则需要重复步骤1到步骤5循环逐个处理。 > 1. 如果存在多个模型并行的参数,则需要重复步骤1到步骤5循环逐个处理。
> 2. 如果步骤2执行allgather子图获取的数据,已经是最终的数据,则后面的步骤可省略。 > 2. 如果步骤2执行`allgather`子图获取的数据,已经是最终的数据,则后面的步骤可省略。
> 即本身切分逻辑是仅在shape0上切分,每个卡加载不同切片数据。 > 即本身切分逻辑是仅在shape0上切分,每个卡加载不同切片数据。
### 保存数据生成新的CheckPoint文件 ### 保存数据生成新的CheckPoint文件
1.param_dict转换为list类型数据。 1.`param_dict`转换为list类型数据。
``` ```
param_list = [] param_list = []
...@@ -244,7 +244,7 @@ for _, param in net.parameters_and_names(): ...@@ -244,7 +244,7 @@ for _, param in net.parameters_and_names():
param_dict = load_checkpoint("./CKP-Integrated_1-4_32.ckpt") param_dict = load_checkpoint("./CKP-Integrated_1-4_32.ckpt")
``` ```
- `load_checkpoint()`:通过该接口加载CheckPoint模型参数文件,返回一个参数字典。 - `load_checkpoint`:通过该接口加载CheckPoint模型参数文件,返回一个参数字典。
- `CKP-Integrated_1-4_32.ckpt`:需要加载的CheckPoint模型参数文件名称。 - `CKP-Integrated_1-4_32.ckpt`:需要加载的CheckPoint模型参数文件名称。
### 步骤2:对模型并行参数做切分处理 ### 步骤2:对模型并行参数做切分处理
...@@ -425,7 +425,7 @@ load_param_into_net(opt, param_dict) ...@@ -425,7 +425,7 @@ load_param_into_net(opt, param_dict)
- `mode=context.GRAPH_MODE`:使用分布式训练需要指定运行模式为图模式(PyNative模式不支持并行)。 - `mode=context.GRAPH_MODE`:使用分布式训练需要指定运行模式为图模式(PyNative模式不支持并行)。
- `device_id`:卡物理序号,即卡所在机器中的实际序号。 - `device_id`:卡物理序号,即卡所在机器中的实际序号。
- `init()`:完成分布式训练初始化操作。 - `init`:完成分布式训练初始化操作。
执行结果: 执行结果:
......
...@@ -130,7 +130,7 @@ tar -zvxf cifar-10-binary.tar.gz ...@@ -130,7 +130,7 @@ tar -zvxf cifar-10-binary.tar.gz
3. 数据混洗和批处理 3. 数据混洗和批处理
最后通过数据混洗(shuffle)随机打乱数据的顺序,并按batch读取数据,进行模型训练: 最后通过数据混洗(`shuffle`)随机打乱数据的顺序,并按`batch`读取数据,进行模型训练:
```python ```python
# apply shuffle operations # apply shuffle operations
...@@ -182,11 +182,11 @@ opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01, 0. ...@@ -182,11 +182,11 @@ opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01, 0.
### 调用`Model`高阶API进行训练和保存模型文件 ### 调用`Model`高阶API进行训练和保存模型文件
完成数据预处理、网络定义、损失函数和优化器定义之后,就可以进行模型训练了。模型训练包含两层迭代,数据集的多轮迭代(epoch)和一轮数据集内按分组(batch)大小进行的单步迭代。其中,单步迭代指的是按分组从数据集中抽取数据,输入到网络中计算得到损失函数,然后通过优化器计算和更新训练参数的梯度。 完成数据预处理、网络定义、损失函数和优化器定义之后,就可以进行模型训练了。模型训练包含两层迭代,数据集的多轮迭代(`epoch`)和一轮数据集内按分组(`batch`)大小进行的单步迭代。其中,单步迭代指的是按分组从数据集中抽取数据,输入到网络中计算得到损失函数,然后通过优化器计算和更新训练参数的梯度。
为了简化训练过程,MindSpore封装了`Model`高阶接口。用户输入网络、损失函数和优化器完成`Model`的初始化,然后调用`train`接口进行训练,`train`接口参数包括迭代次数(`epoch`)和数据集(`dataset`)。 为了简化训练过程,MindSpore封装了`Model`高阶接口。用户输入网络、损失函数和优化器完成`Model`的初始化,然后调用`train`接口进行训练,`train`接口参数包括迭代次数(`epoch`)和数据集(`dataset`)。
模型保存是对训练参数进行持久化的过程。`Model`类中通过回调函数(callback)的方式进行模型保存,如下面代码所示。用户通过`CheckpointConfig`设置回调函数的参数,其中,`save_checkpoint_steps`指每经过固定的单步迭代次数保存一次模型,`keep_checkpoint_max`指最多保存的模型个数。 模型保存是对训练参数进行持久化的过程。`Model`类中通过回调函数(`callback`)的方式进行模型保存,如下面代码所示。用户通过`CheckpointConfig`设置回调函数的参数,其中,`save_checkpoint_steps`指每经过固定的单步迭代次数保存一次模型,`keep_checkpoint_max`指最多保存的模型个数。
```python ```python
''' '''
...@@ -206,7 +206,7 @@ model.train(epoch_size, dataset, callbacks=[ckpoint_cb, loss_cb]) ...@@ -206,7 +206,7 @@ model.train(epoch_size, dataset, callbacks=[ckpoint_cb, loss_cb])
### 加载保存的模型,并进行验证 ### 加载保存的模型,并进行验证
训练得到的模型文件(如resnet.ckpt)可以用来预测新图像的类别。首先通过`load_checkpoint`加载模型文件。然后调用`Model``eval`接口预测新图像类别。 训练得到的模型文件(如`resnet.ckpt`)可以用来预测新图像的类别。首先通过`load_checkpoint`加载模型文件。然后调用`Model``eval`接口预测新图像类别。
```python ```python
param_dict = load_checkpoint(args_opt.checkpoint_path) param_dict = load_checkpoint(args_opt.checkpoint_path)
......
...@@ -17,7 +17,7 @@ ...@@ -17,7 +17,7 @@
## 概述 ## 概述
本文介绍如何使用MindSpore提供的Callback、metrics、print算子、日志打印等自定义能力,帮助用户快速调试训练网络。 本文介绍如何使用MindSpore提供的`Callback``metrics``Print`算子、日志打印等自定义能力,帮助用户快速调试训练网络。
## Callback介绍 ## Callback介绍
...@@ -28,10 +28,10 @@ Callback是回调函数的意思,但它其实不是一个函数而是一个类 ...@@ -28,10 +28,10 @@ Callback是回调函数的意思,但它其实不是一个函数而是一个类
MindSpore提供Callback能力,支持用户在训练/推理的特定阶段,插入自定义的操作。包括: MindSpore提供Callback能力,支持用户在训练/推理的特定阶段,插入自定义的操作。包括:
- MindSpore框架提供的ModelCheckpoint、LossMonitor、SummaryStep等Callback函数。 - MindSpore框架提供的`ModelCheckpoint``LossMonitor``SummaryStep`等Callback函数。
- MindSpore支持用户自定义Callback。 - MindSpore支持用户自定义Callback。
使用方法:在model.train方法中传入Callback对象,它可以是一个Callback列表,例: 使用方法:在`model.train`方法中传入Callback对象,它可以是一个Callback列表,例:
```python ```python
ckpt_cb = ModelCheckpoint() ckpt_cb = ModelCheckpoint()
...@@ -40,14 +40,14 @@ summary_cb = SummaryStep() ...@@ -40,14 +40,14 @@ summary_cb = SummaryStep()
model.train(epoch, dataset, callbacks=[ckpt_cb, loss_cb, summary_cb]) model.train(epoch, dataset, callbacks=[ckpt_cb, loss_cb, summary_cb])
``` ```
ModelCheckpoint可以保存模型参数,以便进行再训练或推理。 `ModelCheckpoint`可以保存模型参数,以便进行再训练或推理。
LossMonitor可以在日志中输出loss,方便用户查看,同时它还会监控训练过程中的loss值变化情况,当loss值为`Nan``Inf`时终止训练。 `LossMonitor`可以在日志中输出loss,方便用户查看,同时它还会监控训练过程中的loss值变化情况,当loss值为`Nan``Inf`时终止训练。
SummaryStep可以把训练过程中的信息存储到文件中,以便后续进行查看或可视化展示。 SummaryStep可以把训练过程中的信息存储到文件中,以便后续进行查看或可视化展示。
在训练过程中,Callback列表会按照定义的顺序执行Callback函数。因此在定义过程中,需考虑Callback之间的依赖关系。 在训练过程中,Callback列表会按照定义的顺序执行Callback函数。因此在定义过程中,需考虑Callback之间的依赖关系。
### 自定义Callback ### 自定义Callback
用户可以基于Callback基类,根据自身的需求,实现自定义Callback 用户可以基于`Callback`基类,根据自身的需求,实现自定义`Callback`
Callback基类定义如下所示: Callback基类定义如下所示:
...@@ -126,9 +126,9 @@ model.train(100, dataset, callbacks=stop_cb) ...@@ -126,9 +126,9 @@ model.train(100, dataset, callbacks=stop_cb)
epoch: 20 step: 32 loss: 2.298344373703003 epoch: 20 step: 32 loss: 2.298344373703003
``` ```
此Callback的功能为:在规定时间内终止训练。通过`run_context.original_args()`方法可以获取到`cb_params`字典,字典里会包含前文描述的主要属性信息。 此Callback的功能为:在规定时间内终止训练。通过`run_context.original_args`方法可以获取到`cb_params`字典,字典里会包含前文描述的主要属性信息。
同时可以对字典内的值进行修改和添加,上述用例中,在`begin()`中定义一个`init_time`对象传递给`cb_params`字典。 同时可以对字典内的值进行修改和添加,上述用例中,在`begin`中定义一个`init_time`对象传递给`cb_params`字典。
在每次`step_end`会做出判断,当训练时间大于设置的时间阈值时,会向`run_context`传递终止训练的信号,提前终止训练,并打印当前的epoch、step、loss的值。 在每次`step_end`会做出判断,当训练时间大于设置的时间阈值时,会向`run_context`传递终止训练的信号,提前终止训练,并打印当前的`epoch``step``loss`的值。
## MindSpore metrics功能介绍 ## MindSpore metrics功能介绍
...@@ -154,16 +154,16 @@ ds_eval = create_dataset() ...@@ -154,16 +154,16 @@ ds_eval = create_dataset()
output = model.eval(ds_eval) output = model.eval(ds_eval)
``` ```
`model.eval()`方法会返回一个字典,里面是传入metrics的指标和结果。 `model.eval`方法会返回一个字典,里面是传入metrics的指标和结果。
用户也可以定义自己的metrics类,通过继承`Metric`基类,并重写`clear``update``eval`三个方法即可实现。 用户也可以定义自己的`metrics`类,通过继承`Metric`基类,并重写`clear``update``eval`三个方法即可实现。
`accuracy`算子举例说明其内部实现原理: `accuracy`算子举例说明其内部实现原理:
`accuracy`继承了`EvaluationBase`基类,重写了上述三个方法。 `accuracy`继承了`EvaluationBase`基类,重写了上述三个方法。
`clear()`方法会把类中相关计算参数初始化。 `clear`方法会把类中相关计算参数初始化。
`update()`方法接受预测值和标签值,更新accuracy内部变量。 `update`方法接受预测值和标签值,更新accuracy内部变量。
`eval()`方法会计算相关指标,返回计算结果。 `eval`方法会计算相关指标,返回计算结果。
调用`accuracy``eval`方法,即可得到计算结果。 调用`accuracy``eval`方法,即可得到计算结果。
通过如下代码可以更清楚了解到`accuracy`是如何运行的: 通过如下代码可以更清楚了解到`accuracy`是如何运行的:
...@@ -182,10 +182,10 @@ print('Accuracy is ', accuracy) ...@@ -182,10 +182,10 @@ print('Accuracy is ', accuracy)
``` ```
Accuracy is 0.6667 Accuracy is 0.6667
``` ```
## print算子功能介绍 ## Print算子功能介绍
MindSpore的自研print算子可以将用户输入的Tensor或字符串信息打印出来,支持多字符串输入,多Tensor输入和字符串与Tensor的混合输入,输入参数以逗号隔开。 MindSpore的自研`Print`算子可以将用户输入的Tensor或字符串信息打印出来,支持多字符串输入,多Tensor输入和字符串与Tensor的混合输入,输入参数以逗号隔开。
print算子使用方法与其他算子相同,在网络中的`__init__`()声明算子并在`construct()`进行调用,具体使用实例及输出结果如下: `Print`算子使用方法与其他算子相同,在网络中的`__init__`声明算子并在`construct`进行调用,具体使用实例及输出结果如下:
```python ```python
import numpy as np import numpy as np
from mindspore import Tensor from mindspore import Tensor
...@@ -223,10 +223,10 @@ val:[[1 1] ...@@ -223,10 +223,10 @@ val:[[1 1]
## 日志相关的环境变量和配置 ## 日志相关的环境变量和配置
MindSpore采用glog来输出日志,常用的几个环境变量如下: MindSpore采用glog来输出日志,常用的几个环境变量如下:
1. GLOG_v 控制日志的级别,默认值为2,即WARNING级别,对应关系如下:0-DEBUG、1-INFO、2-WARNING、3-ERROR。 1. `GLOG_v` 控制日志的级别,默认值为2,即WARNING级别,对应关系如下:0-DEBUG、1-INFO、2-WARNING、3-ERROR。
2. GLOG_logtostderr 值设置为1时,日志输出到屏幕;值设置为0时,日志输出到文件。默认值为1。 2. `GLOG_logtostderr` 值设置为1时,日志输出到屏幕;值设置为0时,日志输出到文件。默认值为1。
3. GLOG_log_dir=YourPath 指定日志输出的路径。若GLOG_logtostderr的值为0,则必须设置此变量。若指定了GLOG_log_dir且GLOG_logtostderr的值为1时,则日志输出到屏幕,不输出到文件。C++和Python的日志会被输出到不同的文件中,C++日志的文件名遵从GLOG日志文件的命名规则,这里是`mindspore.机器名.用户名.log.日志级别.时间戳`,Python日志的文件名为`mindspore.log` 3. GLOG_log_dir=*YourPath* 指定日志输出的路径。若`GLOG_logtostderr`的值为0,则必须设置此变量。若指定了`GLOG_log_dir``GLOG_logtostderr`的值为1时,则日志输出到屏幕,不输出到文件。C++和Python的日志会被输出到不同的文件中,C++日志的文件名遵从GLOG日志文件的命名规则,这里是`mindspore.机器名.用户名.log.日志级别.时间戳`,Python日志的文件名为`mindspore.log`
4. MS_SUBMODULE_LOG_v="{SubModule1:LogLevel1,SubModule2:LogLevel2,...}" 指定MindSpore C++各子模块的日志级别,被指定的子模块的日志级别将覆盖GLOG_v在此模块内的设置,此处子模块的日志级别LogLevel与GLOG_v的日志级别含义相同,MindSpore子模块的划分如下表。如可以通过`GLOG_v=1 MS_SUBMODULE_LOG_v="{PARSER:2,ANALYZER:2}"``PARSER``ANALYZER`模块的日志级别设为WARNING,其他模块的日志级别设为INFO。 4. `MS_SUBMODULE_LOG_v="{SubModule1:LogLevel1,SubModule2:LogLevel2,...}"` 指定MindSpore C++各子模块的日志级别,被指定的子模块的日志级别将覆盖GLOG_v在此模块内的设置,此处子模块的日志级别LogLevel与`GLOG_v`的日志级别含义相同,MindSpore子模块的划分如下表。如可以通过`GLOG_v=1 MS_SUBMODULE_LOG_v="{PARSER:2,ANALYZER:2}"``PARSER``ANALYZER`模块的日志级别设为WARNING,其他模块的日志级别设为INFO。
> glog不支持日志文件的绕接,如果需要控制日志文件对磁盘空间的占用,可选用操作系统提供的日志文件管理工具,例如:Linux的logrotate。 > glog不支持日志文件的绕接,如果需要控制日志文件对磁盘空间的占用,可选用操作系统提供的日志文件管理工具,例如:Linux的logrotate。
......
...@@ -117,7 +117,7 @@ if __name__ == "__main__": ...@@ -117,7 +117,7 @@ if __name__ == "__main__":
其中, 其中,
- `mode=context.GRAPH_MODE`:使用分布式训练需要指定运行模式为图模式(PyNative模式不支持并行)。 - `mode=context.GRAPH_MODE`:使用分布式训练需要指定运行模式为图模式(PyNative模式不支持并行)。
- `device_id`:卡的物理序号,即卡所在机器中的实际序号。 - `device_id`:卡的物理序号,即卡所在机器中的实际序号。
- `init()`:使能HCCL通信,并完成分布式训练初始化操作。 - `init`:使能HCCL通信,并完成分布式训练初始化操作。
## 数据并行模式加载数据集 ## 数据并行模式加载数据集
...@@ -231,7 +231,7 @@ class SoftmaxCrossEntropyExpand(nn.Cell): ...@@ -231,7 +231,7 @@ class SoftmaxCrossEntropyExpand(nn.Cell):
## 训练网络 ## 训练网络
`context.set_auto_parallel_context()`是配置并行训练参数的接口,必须在`Model`初始化前调用。如用户未指定参数,框架会自动根据并行模式为用户设置参数的经验值。如数据并行模式下,`parameter_broadcast`默认打开。主要参数包括: `context.set_auto_parallel_context`是配置并行训练参数的接口,必须在`Model`初始化前调用。如用户未指定参数,框架会自动根据并行模式为用户设置参数的经验值。如数据并行模式下,`parameter_broadcast`默认打开。主要参数包括:
- `parallel_mode`:分布式并行模式,默认为单机模式`ParallelMode.STAND_ALONE`。可选数据并行`ParallelMode.DATA_PARALLEL`及自动并行`ParallelMode.AUTO_PARALLEL` - `parallel_mode`:分布式并行模式,默认为单机模式`ParallelMode.STAND_ALONE`。可选数据并行`ParallelMode.DATA_PARALLEL`及自动并行`ParallelMode.AUTO_PARALLEL`
- `parameter_broadcast`: 参数初始化广播开关,`DATA_PARALLEL``HYBRID_PARALLEL`模式下,默认值为`True` - `parameter_broadcast`: 参数初始化广播开关,`DATA_PARALLEL``HYBRID_PARALLEL`模式下,默认值为`True`
...@@ -239,7 +239,7 @@ class SoftmaxCrossEntropyExpand(nn.Cell): ...@@ -239,7 +239,7 @@ class SoftmaxCrossEntropyExpand(nn.Cell):
> `device_num`和`global_rank`建议采用默认值,框架内会调用HCCL接口获取。 > `device_num`和`global_rank`建议采用默认值,框架内会调用HCCL接口获取。
如脚本中存在多个网络用例,请在执行下个用例前调用`context.reset_auto_parallel_context()`将所有参数还原到默认值。 如脚本中存在多个网络用例,请在执行下个用例前调用`context.reset_auto_parallel_context`将所有参数还原到默认值。
在下面的样例中我们指定并行模式为自动并行,用户如需切换为数据并行模式,只需将`parallel_mode`改为`DATA_PARALLEL` 在下面的样例中我们指定并行模式为自动并行,用户如需切换为数据并行模式,只需将`parallel_mode`改为`DATA_PARALLEL`
...@@ -339,7 +339,7 @@ cd ../ ...@@ -339,7 +339,7 @@ cd ../
运行时间大约在5分钟内,主要时间是用于算子的编译,实际训练时间在20秒内。用户可以通过`ps -ef | grep pytest`来监控任务进程。 运行时间大约在5分钟内,主要时间是用于算子的编译,实际训练时间在20秒内。用户可以通过`ps -ef | grep pytest`来监控任务进程。
日志文件保存device目录下,env.log中记录了环境变量的相关信息,关于Loss部分结果保存在train.log中,示例如下: 日志文件保存`device`目录下,`env.log`中记录了环境变量的相关信息,关于Loss部分结果保存在`train.log`中,示例如下:
``` ```
epoch: 1 step: 156, loss is 2.0084016 epoch: 1 step: 156, loss is 2.0084016
......
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
## 概述 ## 概述
混合精度训练方法是通过混合使用单精度和半精度数据格式来加速深度神经网络训练的过程,同时保持了单精度训练所能达到的网络精度。混合精度训练能够加速计算过程,同时减少内存使用和存取,并使得在特定的硬件上可以训练更大的模型或batch size 混合精度训练方法是通过混合使用单精度和半精度数据格式来加速深度神经网络训练的过程,同时保持了单精度训练所能达到的网络精度。混合精度训练能够加速计算过程,同时减少内存使用和存取,并使得在特定的硬件上可以训练更大的模型或`batch size`
对于FP16的算子,若给定的数据类型是FP32,MindSpore框架的后端会进行降精度处理。用户可以开启INFO日志,并通过搜索关键字“reduce precision”查看降精度处理的算子。 对于FP16的算子,若给定的数据类型是FP32,MindSpore框架的后端会进行降精度处理。用户可以开启INFO日志,并通过搜索关键字“reduce precision”查看降精度处理的算子。
...@@ -36,14 +36,14 @@ MindSpore混合精度典型的计算流程如下图所示: ...@@ -36,14 +36,14 @@ MindSpore混合精度典型的计算流程如下图所示:
## 自动混合精度 ## 自动混合精度
使用自动混合精度,需要调用相应的接口,将待训练网络和优化器作为输入传进去;该接口会将整张网络的算子转换成FP16算子(除BatchNorm算子和Loss涉及到的算子外)。 使用自动混合精度,需要调用相应的接口,将待训练网络和优化器作为输入传进去;该接口会将整张网络的算子转换成FP16算子(除`BatchNorm`算子和Loss涉及到的算子外)。
具体的实现步骤为: 具体的实现步骤为:
1. 引入MindSpore的混合精度的接口amp 1. 引入MindSpore的混合精度的接口`amp`
2. 定义网络:该步骤和普通的网络定义没有区别(无需手动配置某个算子的精度); 2. 定义网络:该步骤和普通的网络定义没有区别(无需手动配置某个算子的精度);
3. 使用amp.build_train_network()接口封装网络模型、优化器和损失函数,在该步骤中MindSpore会将有需要的算子自动进行类型转换。 3. 使用`amp.build_train_network`接口封装网络模型、优化器和损失函数,在该步骤中MindSpore会将有需要的算子自动进行类型转换。
代码样例如下: 代码样例如下:
...@@ -97,7 +97,7 @@ MindSpore还支持手动混合精度。假定在网络中只有一个Dense Layer ...@@ -97,7 +97,7 @@ MindSpore还支持手动混合精度。假定在网络中只有一个Dense Layer
以下是一个手动混合精度的实现步骤: 以下是一个手动混合精度的实现步骤:
1. 定义网络: 该步骤与自动混合精度中的步骤2类似; 1. 定义网络: 该步骤与自动混合精度中的步骤2类似;
2. 配置混合精度: 通过net.to_float(mstype.float16),把该Cell及其子Cell中所有的算子都配置成FP16;然后,将模型中的dense算子手动配置成FP32; 2. 配置混合精度: 通过`net.to_float(mstype.float16)`,把该Cell及其子Cell中所有的算子都配置成FP16;然后,将模型中的dense算子手动配置成FP32;
3. 使用TrainOneStepCell封装网络模型和优化器。 3. 使用TrainOneStepCell封装网络模型和优化器。
......
...@@ -29,8 +29,8 @@ AI算法设计之初普遍未考虑相关的安全威胁,使得AI算法的判 ...@@ -29,8 +29,8 @@ AI算法设计之初普遍未考虑相关的安全威胁,使得AI算法的判
这里通过图像分类任务上的对抗性攻防,以攻击算法FGSM和防御算法NAD为例,介绍MindArmour在对抗攻防上的使用方法。 这里通过图像分类任务上的对抗性攻防,以攻击算法FGSM和防御算法NAD为例,介绍MindArmour在对抗攻防上的使用方法。
> 本例面向CPU、GPU、Ascend 910 AI处理器,你可以在这里下载完整的样例代码:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/model_safety> > 本例面向CPU、GPU、Ascend 910 AI处理器,你可以在这里下载完整的样例代码:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/model_safety>
> - mnist_attack_fgsm.py:包含攻击代码。 > - `mnist_attack_fgsm.py`:包含攻击代码。
> - mnist_defense_nad.py:包含防御代码。 > - `mnist_defense_nad.py`:包含防御代码。
## 建立被攻击模型 ## 建立被攻击模型
...@@ -68,7 +68,7 @@ TAG = 'demo' ...@@ -68,7 +68,7 @@ TAG = 'demo'
### 加载数据集 ### 加载数据集
利用MindSpore的dataset提供的MnistDataset接口加载MNIST数据集。 利用MindSpore的dataset提供的`MnistDataset`接口加载MNIST数据集。
```python ```python
# generate training data # generate training data
......
...@@ -225,7 +225,7 @@ MindSpore与TensorFlow、PyTorch在网络结构组织方式上,存在一定差 ...@@ -225,7 +225,7 @@ MindSpore与TensorFlow、PyTorch在网络结构组织方式上,存在一定差
类似于TensorFlow的`Estimator`接口,将定义好的网络原型、损失函数、优化器传入MindSpore的`Model`接口,内部会自动将其组合成一个可用于训练的网络。 类似于TensorFlow的`Estimator`接口,将定义好的网络原型、损失函数、优化器传入MindSpore的`Model`接口,内部会自动将其组合成一个可用于训练的网络。
如果需要在训练中使用Loss Scale,则可以单独定义一个loss_scale_manager,一同传入`Model`接口。 如果需要在训练中使用Loss Scale,则可以单独定义一个`loss_scale_manager`,一同传入`Model`接口。
```python ```python
loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False) loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False)
...@@ -237,7 +237,7 @@ MindSpore与TensorFlow、PyTorch在网络结构组织方式上,存在一定差 ...@@ -237,7 +237,7 @@ MindSpore与TensorFlow、PyTorch在网络结构组织方式上,存在一定差
model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale, metrics={'acc'}) model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale, metrics={'acc'})
``` ```
类似于TensorFlow的`estimator.train()`,可以通过调用`model.train`接口来进行训练。CheckPoint和中间结果打印等功能,可通过Callback的方式定义到`model.train`接口上。 类似于TensorFlow的`estimator.train`,可以通过调用`model.train`接口来进行训练。CheckPoint和中间结果打印等功能,可通过`Callback`的方式定义到`model.train`接口上。
```python ```python
time_cb = TimeMonitor(data_size=step_size) time_cb = TimeMonitor(data_size=step_size)
......
...@@ -86,8 +86,8 @@ $F1分数 = (2 * Precision * Recall) / (Precision + Recall)$ ...@@ -86,8 +86,8 @@ $F1分数 = (2 * Precision * Recall) / (Precision + Recall)$
3. 得到模型之后,使用验证数据集,查看模型精度情况。 3. 得到模型之后,使用验证数据集,查看模型精度情况。
> 本例面向GPU硬件平台,你可以在这里下载完整的样例代码:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/lstm> > 本例面向GPU硬件平台,你可以在这里下载完整的样例代码:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/lstm>
> - main.py:代码文件,包括数据预处理、网络定义、模型训练等代码。 > - `main.py`:代码文件,包括数据预处理、网络定义、模型训练等代码。
> - config.py:网络中的一些配置,包括batch size、进行几次epoch训练等。 > - `config.py`:网络中的一些配置,包括`batch size`、进行几次epoch训练等。
## 实现阶段 ## 实现阶段
### 导入需要的库文件 ### 导入需要的库文件
......
...@@ -41,7 +41,7 @@ MindSpore Predict是一个轻量级的深度神经网络推理引擎,提供了 ...@@ -41,7 +41,7 @@ MindSpore Predict是一个轻量级的深度神经网络推理引擎,提供了
- decorator - decorator
- scipy - scipy
> numpy, decorator和scipy可以通过pip安装,参考命令如下。 > `numpy`、 `decorator`和`scipy`可以通过`pip`安装,参考命令如下。
```bash ```bash
pip3 install numpy==1.16 decorator scipy pip3 install numpy==1.16 decorator scipy
...@@ -70,14 +70,14 @@ MindSpore Predict是一个轻量级的深度神经网络推理引擎,提供了 ...@@ -70,14 +70,14 @@ MindSpore Predict是一个轻量级的深度神经网络推理引擎,提供了
4. 获取编译结果。 4. 获取编译结果。
进入源码的predict/output目录,即可查看生成的压缩包,包名为MSPredict-{版本号}-{HOST平台}_{DEVICE平台}.tar.gz,例如:MSPredict-0.1.0-linux_aarch64.tar.gz。 该压缩包包含以下目录: 进入源码的`predict/output`目录,即可查看生成的压缩包,包名为MSPredict-*版本号*-*HOST平台*_*DEVICE平台*.tar.gz,例如:MSPredict-0.1.0-linux_aarch64.tar.gz。 该压缩包包含以下目录:
- include:MindSpore Predict的头文件。 - include:MindSpore Predict的头文件。
- lib:MindSpore Predict的动态库。 - lib:MindSpore Predict的动态库。
## 端侧推理使用 ## 端侧推理使用
在APP的APK工程中使用MindSpore进行模型推理前,需要对输入进行必要的前处理,比如将图片转换成MindSpore推理要求的tensor格式、对图片进行resize等处理。在MindSpore完成模型推理后,对模型推理的结果进行后处理,并将处理的输出发送给APP应用。 在APP的APK工程中使用MindSpore进行模型推理前,需要对输入进行必要的前处理,比如将图片转换成MindSpore推理要求的`tensor`格式、对图片进行`resize`等处理。在MindSpore完成模型推理后,对模型推理的结果进行后处理,并将处理的输出发送给APP应用。
本章主要描述用户如何使用MindSpore进行模型推理,APK工程的搭建和模型推理的前后处理,不在此列举。 本章主要描述用户如何使用MindSpore进行模型推理,APK工程的搭建和模型推理的前后处理,不在此列举。
...@@ -89,12 +89,12 @@ MindSpore进行端侧模型推理的步骤如下。 ...@@ -89,12 +89,12 @@ MindSpore进行端侧模型推理的步骤如下。
param_dict = load_checkpoint(ckpoint_file_name=ckpt_file_path) param_dict = load_checkpoint(ckpoint_file_name=ckpt_file_path)
load_param_into_net(net, param_dict) load_param_into_net(net, param_dict)
``` ```
2. 调用`export`接口,导出端侧模型文件(.ms)。 2. 调用`export`接口,导出端侧模型文件(`.ms`)。
```python ```python
export(net, input_data, file_name="./lenet.ms", file_format='LITE') export(net, input_data, file_name="./lenet.ms", file_format='LITE')
``` ```
以LeNet网络为例,生成的端侧模型文件为`lenet.ms`,完整示例代码lenet.py如下。 以LeNet网络为例,生成的端侧模型文件为`lenet.ms`,完整示例代码`lenet.py`如下。
```python ```python
import os import os
import numpy as np import numpy as np
...@@ -154,12 +154,12 @@ if __name__ == '__main__': ...@@ -154,12 +154,12 @@ if __name__ == '__main__':
### 在端侧实现推理 ### 在端侧实现推理
.ms模型文件和图片数据作为输入,创建session在端侧实现推理。 `.ms`模型文件和图片数据作为输入,创建`session`在端侧实现推理。
![](./images/side_infer_process.png) ![](./images/side_infer_process.png)
图1:端侧推理时序图 图1:端侧推理时序图
1. 加载.ms模型文件到内存缓冲区,ReadFile函数功能需要用户参考[C++教程](http://www.cplusplus.com/doc/tutorial/files/)自行实现。 1. 加载`.ms`模型文件到内存缓冲区,ReadFile函数功能需要用户参考[C++教程](http://www.cplusplus.com/doc/tutorial/files/)自行实现。
```cpp ```cpp
// read model file // read model file
std::string modelPath = "./models/lenet/lenet.ms"; std::string modelPath = "./models/lenet/lenet.ms";
...@@ -169,7 +169,7 @@ if __name__ == '__main__': ...@@ -169,7 +169,7 @@ if __name__ == '__main__':
char *graphBuf = ReadFile(modelPath.c_str(), graphSize); char *graphBuf = ReadFile(modelPath.c_str(), graphSize);
``` ```
2. 调用CreateSession接口创建Session,创建完成后可释放内存缓冲区中的模型文件。 2. 调用`CreateSession`接口创建`Session`,创建完成后可释放内存缓冲区中的模型文件。
```cpp ```cpp
// create session // create session
Context ctx; Context ctx;
...@@ -177,7 +177,7 @@ if __name__ == '__main__': ...@@ -177,7 +177,7 @@ if __name__ == '__main__':
free(graphBuf); free(graphBuf);
``` ```
3. 从内存缓冲区中读取推理的输入数据,调用SetData()接口将输入数据设置到input tensor中。 3. 从内存缓冲区中读取推理的输入数据,调用`SetData`接口将输入数据设置到`input tensor`中。
```cpp ```cpp
// load input buffer // load input buffer
size_t inputSize = 0; size_t inputSize = 0;
...@@ -190,19 +190,19 @@ if __name__ == '__main__': ...@@ -190,19 +190,19 @@ if __name__ == '__main__':
inputs[0]->SetData(inputBuf); inputs[0]->SetData(inputBuf);
``` ```
4. 调用Session中的Run()接口执行推理。 4. 调用`Session`中的`Run`接口执行推理。
```cpp ```cpp
// session run // session run
int ret = session->Run(inputs); int ret = session->Run(inputs);
``` ```
5. 调用GetAllOutput()接口获取输出。 5. 调用`GetAllOutput`接口获取输出。
```cpp ```cpp
// get output // get output
std::map<std::string, std::vector<Tensor *>> outputs = session->GetAllOutput(); std::map<std::string, std::vector<Tensor *>> outputs = session->GetAllOutput();
``` ```
6. 调用Tensor的GetData()接口获取输出数据。 6. 调用`Tensor``GetData`接口获取输出数据。
```cpp ```cpp
// get output data // get output data
float *data = nullptr; float *data = nullptr;
...@@ -214,7 +214,7 @@ if __name__ == '__main__': ...@@ -214,7 +214,7 @@ if __name__ == '__main__':
} }
``` ```
7. 推理结束释放input tensor和output tensor 7. 推理结束释放`input tensor``output tensor`
```cpp ```cpp
// free inputs and outputs // free inputs and outputs
for (auto &input : inputs) { for (auto &input : inputs) {
...@@ -229,10 +229,10 @@ if __name__ == '__main__': ...@@ -229,10 +229,10 @@ if __name__ == '__main__':
outputs.clear(); outputs.clear();
``` ```
选取LeNet网络,推理输入为“lenet.bin”,完整示例代码lenet.cpp如下。 选取LeNet网络,推理输入为“`lenet.bin`”,完整示例代码`lenet.cpp`如下。
> MindSpore Predict使用FlatBuffers定义模型,解析模型需要使用到FlatBuffers头文件,因此用户需要自行配置FlatBuffers头文件。 > MindSpore Predict使用`FlatBuffers`定义模型,解析模型需要使用到`FlatBuffers`头文件,因此用户需要自行配置`FlatBuffers`头文件。
> >
> 具体做法:将MindSpore根目录/third_party/flatbuffers/include下的flatbuffers文件夹拷贝到session.h的同级目录。 > 具体做法:将MindSpore根目录`/third_party/flatbuffers/include`下的`flatbuffers`文件夹拷贝到`session.h`的同级目录。
```cpp ```cpp
#include <string> #include <string>
......
...@@ -68,7 +68,7 @@ ModelArts使用对象存储服务(Object Storage Service,简称OBS)进行 ...@@ -68,7 +68,7 @@ ModelArts使用对象存储服务(Object Storage Service,简称OBS)进行
### 执行脚本准备 ### 执行脚本准备
新建一个自己的OBS桶(例如:resnet50-train),在桶中创建代码目录(例如:resnet50_cifar10_train),并将以下目录中的所有脚本上传至代码目录: 新建一个自己的OBS桶(例如:`resnet50-train`),在桶中创建代码目录(例如:`resnet50_cifar10_train`),并将以下目录中的所有脚本上传至代码目录:
> <https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/sample_for_cloud/>脚本使用ResNet-50网络在CIFAR-10数据集上进行训练,并在训练结束后验证精度。脚本可以在ModelArts采用`1*Ascend`或`8*Ascend`两种不同规格进行训练任务。 > <https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/sample_for_cloud/>脚本使用ResNet-50网络在CIFAR-10数据集上进行训练,并在训练结束后验证精度。脚本可以在ModelArts采用`1*Ascend`或`8*Ascend`两种不同规格进行训练任务。
为了方便后续创建训练作业,先创建训练输出目录和日志输出目录,本示例创建的目录结构如下: 为了方便后续创建训练作业,先创建训练输出目录和日志输出目录,本示例创建的目录结构如下:
......
...@@ -185,8 +185,8 @@ def test_summary(): ...@@ -185,8 +185,8 @@ def test_summary():
### 性能数据收集 ### 性能数据收集
为了收集神经网络的性能数据,需要在训练脚本中添加MindInsight Profiler接口。首先,在set context之后和初始化网络之前,需要初始化MindInsight `Profiler`对象; 为了收集神经网络的性能数据,需要在训练脚本中添加`MindInsight Profiler`接口。首先,在set context之后和初始化网络之前,需要初始化`MindInsight Profiler`对象;
然后在训练结束后,调用`Profiler.analyse()`停止性能数据收集并生成性能分析结果。 然后在训练结束后,调用`Profiler.analyse`停止性能数据收集并生成性能分析结果。
样例代码如下: 样例代码如下:
...@@ -270,13 +270,13 @@ MindInsight向用户提供Web服务,可通过以下命令,查看当前运行 ...@@ -270,13 +270,13 @@ MindInsight向用户提供Web服务,可通过以下命令,查看当前运行
ps -ef | grep mindinsight ps -ef | grep mindinsight
``` ```
根据服务进程PID,可通过以下命令,查看当前服务进程对应的工作目录WORKSPACE 根据服务进程PID,可通过以下命令,查看当前服务进程对应的工作目录`WORKSPACE`
```bash ```bash
lsof -p <PID> | grep access lsof -p <PID> | grep access
``` ```
输出如下,可查看WORKSPACE 输出如下,可查看`WORKSPACE`
```bash ```bash
gunicorn <PID> <USER> <FD> <TYPE> <DEVICE> <SIZE/OFF> <NODE> <WORKSPACE>/log/gunicorn/access.log gunicorn <PID> <USER> <FD> <TYPE> <DEVICE> <SIZE/OFF> <NODE> <WORKSPACE>/log/gunicorn/access.log
......
...@@ -92,7 +92,7 @@ import os ...@@ -92,7 +92,7 @@ import os
在正式编写代码前,需要了解MindSpore运行所需要的硬件、后端等基本信息。 在正式编写代码前,需要了解MindSpore运行所需要的硬件、后端等基本信息。
可以通过`context.set_context()`来配置运行需要的信息,譬如运行模式、后端信息、硬件等信息。 可以通过`context.set_context`来配置运行需要的信息,譬如运行模式、后端信息、硬件等信息。
导入`context`模块,配置运行需要的信息。 导入`context`模块,配置运行需要的信息。
...@@ -110,7 +110,7 @@ if __name__ == "__main__": ...@@ -110,7 +110,7 @@ if __name__ == "__main__":
... ...
``` ```
在样例中我们配置样例运行使用图模式。根据实际情况配置硬件信息,譬如代码运行在Ascend AI处理器上,则`--device_target`选择`Ascend`,代码运行在CPU、GPU同理。详细参数说明,请参见`context.set_context()`接口说明。 在样例中我们配置样例运行使用图模式。根据实际情况配置硬件信息,譬如代码运行在Ascend AI处理器上,则`--device_target`选择`Ascend`,代码运行在CPU、GPU同理。详细参数说明,请参见`context.set_context`接口说明。
## 数据处理 ## 数据处理
...@@ -118,12 +118,12 @@ if __name__ == "__main__": ...@@ -118,12 +118,12 @@ if __name__ == "__main__":
### 定义数据集及数据操作 ### 定义数据集及数据操作
我们定义一个函数`create_dataset()`来创建数据集。在这个函数中,我们定义好需要进行的数据增强和处理操作: 我们定义一个函数`create_dataset`来创建数据集。在这个函数中,我们定义好需要进行的数据增强和处理操作:
1. 定义数据集。 1. 定义数据集。
2. 定义进行数据增强和处理所需要的一些参数。 2. 定义进行数据增强和处理所需要的一些参数。
3. 根据参数,生成对应的数据增强操作。 3. 根据参数,生成对应的数据增强操作。
4. 使用`map()`映射函数,将数据操作应用到数据集。 4. 使用`map`映射函数,将数据操作应用到数据集。
5. 对生成的数据集进行处理。 5. 对生成的数据集进行处理。
```python ```python
...@@ -229,7 +229,7 @@ def fc_with_initialize(input_channels, out_channels): ...@@ -229,7 +229,7 @@ def fc_with_initialize(input_channels, out_channels):
使用MindSpore定义神经网络需要继承`mindspore.nn.cell.Cell``Cell`是所有神经网络(`Conv2d`等)的基类。 使用MindSpore定义神经网络需要继承`mindspore.nn.cell.Cell``Cell`是所有神经网络(`Conv2d`等)的基类。
神经网络的各层需要预先在`__init__()`方法中定义,然后通过定义`construct()`方法来完成神经网络的前向构造。按照LeNet的网络结构,定义网络各层如下: 神经网络的各层需要预先在`__init__`方法中定义,然后通过定义`construct`方法来完成神经网络的前向构造。按照LeNet的网络结构,定义网络各层如下:
```python ```python
class LeNet5(nn.Cell): class LeNet5(nn.Cell):
...@@ -400,13 +400,13 @@ checkpoint_lenet-1_1875.ckpt ...@@ -400,13 +400,13 @@ checkpoint_lenet-1_1875.ckpt
``` ```
其中, 其中,
`checkpoint_lenet-1_1875.ckpt`:指保存的模型参数文件。名称具体含义checkpoint_{网络名称}-{第几个epoch}_{第几个step}.ckpt。 `checkpoint_lenet-1_1875.ckpt`:指保存的模型参数文件。名称具体含义checkpoint_*网络名称*-*第几个epoch*_*第几个step*.ckpt。
## 验证模型 ## 验证模型
在得到模型文件后,通过模型运行测试数据集得到的结果,验证模型的泛化能力。 在得到模型文件后,通过模型运行测试数据集得到的结果,验证模型的泛化能力。
1. 使用`model.eval()`接口读入测试数据集。 1. 使用`model.eval`接口读入测试数据集。
2. 使用保存后的模型参数进行推理。 2. 使用保存后的模型参数进行推理。
```python ```python
...@@ -431,7 +431,7 @@ if __name__ == "__main__": ...@@ -431,7 +431,7 @@ if __name__ == "__main__":
``` ```
其中, 其中,
`load_checkpoint()`:通过该接口加载CheckPoint模型参数文件,返回一个参数字典。 `load_checkpoint`:通过该接口加载CheckPoint模型参数文件,返回一个参数字典。
`checkpoint_lenet-1_1875.ckpt`:之前保存的CheckPoint模型文件名称。 `checkpoint_lenet-1_1875.ckpt`:之前保存的CheckPoint模型文件名称。
`load_param_into_net`:通过该接口把参数加载到网络中。 `load_param_into_net`:通过该接口把参数加载到网络中。
......
...@@ -34,11 +34,11 @@ ...@@ -34,11 +34,11 @@
每个算子的原语是一个继承于`PrimitiveWithInfer`的子类,其类型名称即是算子名称。 每个算子的原语是一个继承于`PrimitiveWithInfer`的子类,其类型名称即是算子名称。
自定义算子原语与内置算子原语的接口定义完全一致: 自定义算子原语与内置算子原语的接口定义完全一致:
- 属性由构造函数`__init__()`的入参定义。本用例的算子没有属性,因此`__init__()`没有额外的入参。带属性的用例可参考MindSpore源码中的[custom add3](https://gitee.com/mindspore/mindspore/tree/master/tests/st/ops/custom_ops_tbe/cus_add3.py)用例。 - 属性由构造函数`__init__`的入参定义。本用例的算子没有属性,因此`__init__`没有额外的入参。带属性的用例可参考MindSpore源码中的[custom add3](https://gitee.com/mindspore/mindspore/tree/master/tests/st/ops/custom_ops_tbe/cus_add3.py)用例。
- 输入输出的名称通过`init_prim_io_names()`函数定义。 - 输入输出的名称通过`init_prim_io_names`函数定义。
- 输出Tensor的shape推理方法在`infer_shape()`函数中定义,输出Tensor的dtype推理方法在`infer_dtype()`函数中定义。 - 输出Tensor的shape推理方法在`infer_shape`函数中定义,输出Tensor的dtype推理方法在`infer_dtype`函数中定义。
自定义算子与内置算子的唯一区别是需要通过在`__init__()`函数中导入算子实现函数(`from square_impl import CusSquareImpl`)来将算子实现注册到后端。本用例在`square_impl.py`中定义了算子实现和算子信息,将在后文中说明。 自定义算子与内置算子的唯一区别是需要通过在`__init__`函数中导入算子实现函数(`from square_impl import CusSquareImpl`)来将算子实现注册到后端。本用例在`square_impl.py`中定义了算子实现和算子信息,将在后文中说明。
以Square算子原语`cus_square.py`为例,给出如下示例代码。 以Square算子原语`cus_square.py`为例,给出如下示例代码。
...@@ -74,8 +74,8 @@ class CusSquare(PrimitiveWithInfer): ...@@ -74,8 +74,8 @@ class CusSquare(PrimitiveWithInfer):
1. 准备输入的placeholder,placeholder是一个占位符,返回一个Tensor对象,表示一组输入数据。 1. 准备输入的placeholder,placeholder是一个占位符,返回一个Tensor对象,表示一组输入数据。
2. 调用计算函数,计算函数使用TBE提供的API接口描述了算子内部的计算逻辑。 2. 调用计算函数,计算函数使用TBE提供的API接口描述了算子内部的计算逻辑。
3. 调用Schedule调度模块,调度模块对算子中的数据按照调度模块的调度描述进行切分,同时指定好数据的搬运流程,确保在硬件上的执行达到最优。默认可以采用自动调度模块(`auto_schedule`)。 3. 调用Schedule调度模块,调度模块对算子中的数据按照调度模块的调度描述进行切分,同时指定好数据的搬运流程,确保在硬件上的执行达到最优。默认可以采用自动调度模块(`auto_schedule`)。
4. 调用`cce_build_code()`编译生成算子二进制。 4. 调用`cce_build_code`编译生成算子二进制。
> 入口函数的输入参数有特殊要求,需要依次为:算子每个输入的信息、算子每个输出的信息、算子属性(可选)和kernel_name(生成算子二进制的名称)。输入和输出的信息用字典封装传入,其中包含该算子在网络中被调用时传入的实际输入和输出的shape和dtype。 > 入口函数的输入参数有特殊要求,需要依次为:算子每个输入的信息、算子每个输出的信息、算子属性(可选)和`kernel_name`(生成算子二进制的名称)。输入和输出的信息用字典封装传入,其中包含该算子在网络中被调用时传入的实际输入和输出的shape和dtype。
更多关于使用TBE开发算子的内容请参考[TBE文档](https://www.huaweicloud.com/ascend/tbe),关于TBE算子的调试和性能优化请参考[MindStudio文档](https://www.huaweicloud.com/ascend/mindstudio) 更多关于使用TBE开发算子的内容请参考[TBE文档](https://www.huaweicloud.com/ascend/tbe),关于TBE算子的调试和性能优化请参考[MindStudio文档](https://www.huaweicloud.com/ascend/mindstudio)
...@@ -84,7 +84,7 @@ class CusSquare(PrimitiveWithInfer): ...@@ -84,7 +84,7 @@ class CusSquare(PrimitiveWithInfer):
算子信息是指导后端选择算子实现的关键信息,同时也指导后端为算子插入合适的类型和格式转换。它通过`TBERegOp`接口定义,通过`op_info_register`装饰器将算子信息与算子实现入口函数绑定。当算子实现py文件被导入时,`op_info_register`装饰器会将算子信息注册到后端的算子信息库中。更多关于算子信息的使用方法请参考`TBERegOp`的成员方法的注释说明。 算子信息是指导后端选择算子实现的关键信息,同时也指导后端为算子插入合适的类型和格式转换。它通过`TBERegOp`接口定义,通过`op_info_register`装饰器将算子信息与算子实现入口函数绑定。当算子实现py文件被导入时,`op_info_register`装饰器会将算子信息注册到后端的算子信息库中。更多关于算子信息的使用方法请参考`TBERegOp`的成员方法的注释说明。
> - 算子信息中定义输入输出信息的个数和顺序、算子实现入口函数的参数中的输入输出信息的个数和顺序、算子原语中输入输出名称列表的个数和顺序,三者要完全一致。 > - 算子信息中定义输入输出信息的个数和顺序、算子实现入口函数的参数中的输入输出信息的个数和顺序、算子原语中输入输出名称列表的个数和顺序,三者要完全一致。
> - 算子如果带属性,在算子信息中需要用`attr()`描述属性信息,属性的名称与算子原语定义中的属性名称要一致。 > - 算子如果带属性,在算子信息中需要用`attr`描述属性信息,属性的名称与算子原语定义中的属性名称要一致。
### 示例 ### 示例
......
...@@ -47,7 +47,7 @@ MindSpore提供写操作工具,可将用户定义的原始数据写为MindSpor ...@@ -47,7 +47,7 @@ MindSpore提供写操作工具,可将用户定义的原始数据写为MindSpor
字段属性type:int32、int64、float32、float64、string、bytes。 字段属性type:int32、int64、float32、float64、string、bytes。
字段属性shape:[...], ...可以是一维数组,用[-1]表示; 可以是二维数组,用[m, n]表示;可以是三维数组,用[x, y, z]表示。 字段属性shape:[...], ...可以是一维数组,用[-1]表示; 可以是二维数组,用[m, n]表示;可以是三维数组,用[x, y, z]表示。
> 1. 如果字段有属性Shape,暂时只支持type为int32、int64、float32、float64类型。 > 1. 如果字段有属性Shape,暂时只支持type为int32、int64、float32、float64类型。
> 2. 如果字段有属性Shape,则用户在准备数据并传入write_raw_data接口时必须是numpy.ndarray类型。 > 2. 如果字段有属性Shape,则用户在准备数据并传入`write_raw_data`接口时必须是`numpy.ndarray`类型。
举例: 举例:
- 图片分类 - 图片分类
......
...@@ -41,7 +41,7 @@ MindSpore支持多种处理数据操作,包括复制、分批、洗牌、映 ...@@ -41,7 +41,7 @@ MindSpore支持多种处理数据操作,包括复制、分批、洗牌、映
![avatar](../images/dataset_pipeline.png) ![avatar](../images/dataset_pipeline.png)
如下示例中,读取MNIST数据集时,对数据进行shuffle、batch、repeat操作。 如下示例中,读取MNIST数据集时,对数据进行`shuffle``batch``repeat`操作。
```python ```python
import mindspore.dataset as ds import mindspore.dataset as ds
...@@ -88,7 +88,7 @@ ds1 = ds1.repeat(10) ...@@ -88,7 +88,7 @@ ds1 = ds1.repeat(10)
> 在机器学习中,每训练完一个完整的数据集,我们称为训练完了一个epoch。 > 在机器学习中,每训练完一个完整的数据集,我们称为训练完了一个epoch。
加倍数据集,通常用在多个epoch(迭代)训练中,通过`repeat()`来加倍数据量。`repeat()`定义如下: 加倍数据集,通常用在多个epoch(迭代)训练中,通过`repeat`来加倍数据量。`repeat`定义如下:
```python ```python
def repeat(self, count=None): def repeat(self, count=None):
``` ```
...@@ -118,7 +118,7 @@ ds2: ...@@ -118,7 +118,7 @@ ds2:
[4 5 6] [4 5 6]
``` ```
### batch ### batch
将数据集进行分批。在实际训练中,可将数据分批处理,将几个数据作为1组,进行训练,减少训练轮次,达到加速训练过程的目的。MindSpore通过`batch()`函数来实现数据集分批,函数定义如下: 将数据集进行分批。在实际训练中,可将数据分批处理,将几个数据作为1组,进行训练,减少训练轮次,达到加速训练过程的目的。MindSpore通过`batch`函数来实现数据集分批,函数定义如下:
![avatar](../images/batch.png) ![avatar](../images/batch.png)
...@@ -166,11 +166,11 @@ batch size:3 drop remainder:True ...@@ -166,11 +166,11 @@ batch size:3 drop remainder:True
![avatar](../images/shuffle.png) ![avatar](../images/shuffle.png)
shuffle操作主要用来将数据混洗,设定的buffer_size越大,混洗程度越大,但时间、计算资源消耗会大。 shuffle操作主要用来将数据混洗,设定的buffer_size越大,混洗程度越大,但时间、计算资源消耗会大。
`shuffle()`定义如下: `shuffle`定义如下:
```python ```python
def shuffle(self, buffer_size): def shuffle(self, buffer_size):
``` ```
调用`shuffle()`对数据集`ds1`进行混洗,示例代码如下: 调用`shuffle`对数据集`ds1`进行混洗,示例代码如下:
```python ```python
print("Before shuffle:") print("Before shuffle:")
...@@ -199,19 +199,19 @@ After shuffle: ...@@ -199,19 +199,19 @@ After shuffle:
``` ```
### map ### map
map(映射)即对数据进行处理,譬如将彩色图片的数据集转化为灰色图片的数据集等,应用非常灵活。 map(映射)即对数据进行处理,譬如将彩色图片的数据集转化为灰色图片的数据集等,应用非常灵活。
MindSpore提供`map()`函数对数据集进行映射操作,用户可以将提供的函数或算子作用于指定的列数据。 MindSpore提供`map`函数对数据集进行映射操作,用户可以将提供的函数或算子作用于指定的列数据。
用户可以自定义函数,也可以直接使用`c_transforms``py_transforms`做数据增强。 用户可以自定义函数,也可以直接使用`c_transforms``py_transforms`做数据增强。
> 详细的数据增强操作,将在文后数据增强章节进行介绍。 > 详细的数据增强操作,将在文后数据增强章节进行介绍。
![avatar](../images/map.png) ![avatar](../images/map.png)
`map()`函数定义如下: `map`函数定义如下:
```python ```python
def map(self, input_columns=None, operations=None, output_columns=None, columns_order=None, def map(self, input_columns=None, operations=None, output_columns=None, columns_order=None,
num_parallel_workers=None): num_parallel_workers=None):
``` ```
在以下示例中,使用`map()`函数,将定义的匿名函数(lambda函数)作用于数据集`ds1`,使数据集中数据乘以2。 在以下示例中,使用`map`函数,将定义的匿名函数(lambda函数)作用于数据集`ds1`,使数据集中数据乘以2。
```python ```python
func = lambda x : x*2 # Define lambda function to multiply each element by 2. func = lambda x : x*2 # Define lambda function to multiply each element by 2.
ds2 = ds1.map(input_columns="data", operations=func) ds2 = ds1.map(input_columns="data", operations=func)
...@@ -227,7 +227,7 @@ for data in ds2.create_dict_iterator(): ...@@ -227,7 +227,7 @@ for data in ds2.create_dict_iterator():
[8 10 12] [8 10 12]
``` ```
### zip ### zip
MindSpore提供`zip()`函数,可将多个数据集合并成1个数据集。 MindSpore提供`zip`函数,可将多个数据集合并成1个数据集。
> 如果两个数据集的列名相同,则不会合并,请注意列的命名。 > 如果两个数据集的列名相同,则不会合并,请注意列的命名。
> 如果两个数据集的行数不同,合并后的行数将和较小行数保持一致。 > 如果两个数据集的行数不同,合并后的行数将和较小行数保持一致。
```python ```python
...@@ -242,7 +242,7 @@ def zip(self, datasets): ...@@ -242,7 +242,7 @@ def zip(self, datasets):
ds2 = ds.GeneratorDataset(generator_func2, ["data2"]) ds2 = ds.GeneratorDataset(generator_func2, ["data2"])
``` ```
2. 通过`zip()`将数据集`ds1``data1`列和数据集`ds2``data2`列合并成数据集`ds3` 2. 通过`zip`将数据集`ds1``data1`列和数据集`ds2``data2`列合并成数据集`ds3`
```python ```python
ds3 = ds.zip((ds1, ds2)) ds3 = ds.zip((ds1, ds2))
for data in ds3.create_dict_iterator(): for data in ds3.create_dict_iterator():
...@@ -266,7 +266,7 @@ MindSpore提供`c_transforms`模块以及`py_transforms`模块函数供用户进 ...@@ -266,7 +266,7 @@ MindSpore提供`c_transforms`模块以及`py_transforms`模块函数供用户进
| `py_transforms` | 基于Python的[PIL](https://pypi.org/project/Pillow/)实现 | 该模块提供了多种图像增强功能,并提供了PIL Image和numpy数组之间的传输方法。 | | `py_transforms` | 基于Python的[PIL](https://pypi.org/project/Pillow/)实现 | 该模块提供了多种图像增强功能,并提供了PIL Image和numpy数组之间的传输方法。 |
对于喜欢在图像学习任务中使用Python PIL的用户,`py_transforms`模块是处理图像增强的好工具。用户还可以使用Python PIL自定义自己的扩展。 对于喜欢在图像学习任务中使用Python PIL的用户,`py_transforms`模块是处理图像增强的好工具。用户还可以使用Python PIL自定义自己的扩展。
数据增强需要使用`map()`函数,详细`map()`函数的使用,可参考[map](#map)章节。 数据增强需要使用`map`函数,详细`map`函数的使用,可参考[map](#map)章节。
### 使用`c_transforms`模块进行数据增强 ### 使用`c_transforms`模块进行数据增强
...@@ -286,7 +286,7 @@ MindSpore提供`c_transforms`模块以及`py_transforms`模块函数供用户进 ...@@ -286,7 +286,7 @@ MindSpore提供`c_transforms`模块以及`py_transforms`模块函数供用户进
imgplot_resized = plt.imshow(data["image"]) imgplot_resized = plt.imshow(data["image"])
plt.show() plt.show()
``` ```
运行结果可以看到,原始图片与进行数据处理(`Resize()`)后的图片对比,可以看到图片由原来的1024\*683像素,变化为500\*500像素。 运行结果可以看到,原始图片与进行数据处理(`Resize`)后的图片对比,可以看到图片由原来的1024\*683像素,变化为500\*500像素。
![avatar](../images/image.png) ![avatar](../images/image.png)
图1:原始图片 图1:原始图片
...@@ -320,7 +320,7 @@ MindSpore提供`c_transforms`模块以及`py_transforms`模块函数供用户进 ...@@ -320,7 +320,7 @@ MindSpore提供`c_transforms`模块以及`py_transforms`模块函数供用户进
plt.show() plt.show()
``` ```
运行结果可以看到,原始图片与进行数据处理(`RandomCrop()`)后的图片对比,可以看到图片由原来的1024\*683像素,变化为500\*500像素。 运行结果可以看到,原始图片与进行数据处理(`RandomCrop`)后的图片对比,可以看到图片由原来的1024\*683像素,变化为500\*500像素。
![avatar](../images/image.png) ![avatar](../images/image.png)
图1:原始图片 图1:原始图片
......
...@@ -19,13 +19,13 @@ ...@@ -19,13 +19,13 @@
## Ascend 910 AI处理器上推理 ## Ascend 910 AI处理器上推理
MindSpore提供了`model.eval()`接口来进行模型验证,你只需传入验证数据集即可,验证数据集的处理方式与训练数据集相同。完整代码请参考<https://gitee.com/mindspore/mindspore/blob/master/example/resnet50_cifar10/eval.py> MindSpore提供了`model.eval`接口来进行模型验证,你只需传入验证数据集即可,验证数据集的处理方式与训练数据集相同。完整代码请参考<https://gitee.com/mindspore/mindspore/blob/master/example/resnet50_cifar10/eval.py>
```python ```python
res = model.eval(dataset) res = model.eval(dataset)
``` ```
此外,也可以通过`model.predict()`接口来进行推理操作,详细用法可参考API说明。 此外,也可以通过`model.predict`接口来进行推理操作,详细用法可参考API说明。
## Ascend 310 AI处理器上推理 ## Ascend 310 AI处理器上推理
......
...@@ -38,7 +38,7 @@ CheckPoint的protocol格式定义在`mindspore/ccsrc/utils/checkpoint.proto`中 ...@@ -38,7 +38,7 @@ CheckPoint的protocol格式定义在`mindspore/ccsrc/utils/checkpoint.proto`中
通过`CheckpointConfig`对象可以设置CheckPoint的保存策略。 通过`CheckpointConfig`对象可以设置CheckPoint的保存策略。
保存的参数分为网络参数和优化器参数。 保存的参数分为网络参数和优化器参数。
`ModelCheckpoint()`提供默认配置策略,方便用户快速上手。 `ModelCheckpoint`提供默认配置策略,方便用户快速上手。
具体用法如下: 具体用法如下:
```python ```python
from mindspore.train.callback import ModelCheckpoint from mindspore.train.callback import ModelCheckpoint
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册