-`load_checkpoint()`: loads the checkpoint model parameter file and returns a parameter dictionary.
-`load_checkpoint`: loads the checkpoint model parameter file and returns a parameter dictionary.
-`load_param_into_net()`: loads model parameter data to the network.
-`load_param_into_net`: loads model parameter data to the network.
-`CKP_1-4_32.ckpt`: name of the saved checkpoint model parameter file.
-`CKP_1-4_32.ckpt`: name of the saved checkpoint model parameter file.
> If a new checkpoint file is directly saved in the training environment based on the current training data and the parameter values already exist on the network, skip this step and you do not need to import the checkpoint files.
> If a new checkpoint file is directly saved in the training environment based on the current training data and the parameter values already exist on the network, skip this step and you do not need to import the checkpoint files.
...
@@ -132,7 +132,7 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2
...
@@ -132,7 +132,7 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2
> To ensure that the parameter update speed remains unchanged, you need to integrate the parameters saved in the optimizer, for example, moments.model\_parallel\_weight.
> To ensure that the parameter update speed remains unchanged, you need to integrate the parameters saved in the optimizer, for example, moments.model\_parallel\_weight.
2. Define, instantiate, and execute the AllGather Cell, and obtain data on all devices.
2. Define, instantiate, and execute the `AllGather` Cell, and obtain data on all devices.
```
```
from mindspore.nn.cell import Cell
from mindspore.nn.cell import Cell
...
@@ -156,16 +156,16 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2
...
@@ -156,16 +156,16 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2
The value of param\_data is the integration of data on each device in dimension 0. The data value is \[\[1, 2], \[3, 4], \[5, 6], \[7, 8]], and the shape is \[4, 2]. The raw data value of param\_data is \[\[1, 2, 3, 4], \[5, 6, 7, 8]], and the shape is \[2, 4]. The data needs to be redivided and integrated.
The value of `param_data` is the integration of data on each device in dimension 0. The data value is \[\[1, 2], \[3, 4], \[5, 6], \[7, 8]], and the shape is \[4, 2]. The raw data value of `param_data` is \[\[1, 2, 3, 4], \[5, 6, 7, 8]], and the shape is \[2, 4]. The data needs to be redivided and integrated.
3. Divide the data obtained from AllGather.
3. Divide the data obtained from `AllGather`.
```
```
slice_list = np.split(param_data.asnumpy(), 4, axis=0) # 4:group_size, number of nodes in cluster
slice_list = np.split(param_data.asnumpy(), 4, axis=0) # 4:group_size, number of nodes in cluster
slice_lis_moments = np.split(param_data_moments.asnumpy(), 4, axis=0) # 4: group_size, number of nodes in cluster
slice_lis_moments = np.split(param_data_moments.asnumpy(), 4, axis=0) # 4: group_size, number of nodes in cluster
```
```
The result of param\_data is as follows:
The result of `param_data` is as follows:
slice_list[0] --- [1, 2] Slice data on device0
slice_list[0] --- [1, 2] Slice data on device0
slice_list[1] --- [3, 4] Slice data on device1
slice_list[1] --- [3, 4] Slice data on device1
...
@@ -198,7 +198,7 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2
...
@@ -198,7 +198,7 @@ The dividing strategy is to perform dividing in a 4-device scenario based on \[2
### Saving the Data and Generating a New Checkpoint File
### Saving the Data and Generating a New Checkpoint File
1. Convert param\_dict to param\_list.
1. Convert `param_dict` to `param_list`.
```
```
param_list = []
param_list = []
...
@@ -238,7 +238,7 @@ Call the `load_checkpoint` API to load model parameter data from the checkpoint
...
@@ -238,7 +238,7 @@ Call the `load_checkpoint` API to load model parameter data from the checkpoint
### Calling the High-level `Model` API To Train and Save the Model File
### Calling the High-level `Model` API To Train and Save the Model File
After data preprocessing, network definition, and loss function and optimizer definition are complete, model training can be performed. Model training involves two iterations: multi-round iteration (epoch) of datasets and single-step iteration based on the batch size of datasets. The single-step iteration refers to extracting data from a dataset by batch, inputting the data to a network to calculate a loss function, and then calculating and updating a gradient of training parameters by using an optimizer.
After data preprocessing, network definition, and loss function and optimizer definition are complete, model training can be performed. Model training involves two iterations: multi-round iteration (`epoch`) of datasets and single-step iteration based on the batch size of datasets. The single-step iteration refers to extracting data from a dataset by `batch`, inputting the data to a network to calculate a loss function, and then calculating and updating a gradient of training parameters by using an optimizer.
To simplify the training process, MindSpore encapsulates the high-level `Model` API. You can enter the network, loss function, and optimizer to complete the `Model` initialization, and then call the `train` API for training. The `train` API parameters include the number of iterations (`epoch`) and dataset (`dataset`).
To simplify the training process, MindSpore encapsulates the high-level `Model` API. You can enter the network, loss function, and optimizer to complete the `Model` initialization, and then call the `train` API for training. The `train` API parameters include the number of iterations (`epoch`) and dataset (`dataset`).
Model saving is a process of persisting training parameters. In the `Model` class, the model is saved using the callback function, as shown in the following code: You can set the parameters of the callback function by using `CheckpointConfig`. `save_checkpoint_steps` indicates that the model is saved once every fixed number of single-step iterations, and `keep_checkpoint_max` indicates the maximum number of saved models.
Model saving is a process of persisting training parameters. In the `Model` class, the model is saved using the `callback` function, as shown in the following code: You can set the parameters of the `callback` function by using `CheckpointConfig`. `save_checkpoint_steps` indicates that the model is saved once every fixed number of single-step iterations, and `keep_checkpoint_max` indicates the maximum number of saved models.
The trained model file (such as resnet.ckpt) can be used to predict the class of a new image. Run the `load_checkpoint` command to load the model file. Then call the `eval` API of `Model` to predict the new image class.
The trained model file (such as `resnet.ckpt`) can be used to predict the class of a new image. Run the `load_checkpoint` command to load the model file. Then call the `eval` API of `Model` to predict the new image class.
This section describes how to use the customized capabilities provided by MindSpore, such as callback, metrics, and log printing, to help you quickly debug the training network.
This section describes how to use the customized capabilities provided by MindSpore, such as `callback`, `metrics`,`Print` operator and log printing, to help you quickly debug the training network.
## Introduction to Callback
## Introduction to Callback
...
@@ -29,10 +29,10 @@ For example, you can monitor the loss, save model parameters, dynamically adjust
...
@@ -29,10 +29,10 @@ For example, you can monitor the loss, save model parameters, dynamically adjust
MindSpore provides the callback capabilities to allow users to insert customized operations in a specific phase of training or inference, including:
MindSpore provides the callback capabilities to allow users to insert customized operations in a specific phase of training or inference, including:
- Callback functions such as ModelCheckpoint, LossMonitor, and SummaryStep provided by the MindSpore framework
- Callback functions such as `ModelCheckpoint`, `LossMonitor`, and `SummaryStep` provided by the MindSpore framework
- Custom callback functions
- Custom callback functions
Usage: Transfer the callback object in the model.train method. The callback object can be a list, for example:
Usage: Transfer the callback object in the `model.train` method. The callback object can be a list, for example:
ModelCheckpoint can save model parameters for retraining or inference.
`ModelCheckpoint` can save model parameters for retraining or inference.
LossMonitor can output loss information in logs for users to view. In addition, LossMonitor monitors the loss value change during training. When the loss value is `Nan` or `Inf`, the training terminates.
`LossMonitor` can output loss information in logs for users to view. In addition, `LossMonitor` monitors the loss value change during training. When the loss value is `Nan` or `Inf`, the training terminates.
SummaryStep can save the training information to a file for later use.
SummaryStep can save the training information to a file for later use.
During the training process, the callback list will execute the callback function in the defined order. Therefore, in the definition process, the dependency between callbacks needs to be considered.
During the training process, the callback list will execute the callback function in the defined order. Therefore, in the definition process, the dependency between callbacks needs to be considered.
### Custom Callback
### Custom Callback
You can customize callback based on the callback base class as required.
You can customize callback based on the `callback` base class as required.
The callback base class is defined as follows:
The callback base class is defined as follows:
...
@@ -127,8 +127,8 @@ The output is as follows:
...
@@ -127,8 +127,8 @@ The output is as follows:
epoch: 20 step: 32 loss: 2.298344373703003
epoch: 20 step: 32 loss: 2.298344373703003
```
```
This callback function is used to terminate the training within a specified period. You can use the `run_context.original_args()` method to obtain the `cb_params` dictionary, which contains the main attribute information described above.
This callback function is used to terminate the training within a specified period. You can use the `run_context.original_args` method to obtain the `cb_params` dictionary, which contains the main attribute information described above.
In addition, you can modify and add values in the dictionary. In the preceding example, an `init_time` object is defined in `begin()` and transferred to the `cb_params` dictionary.
In addition, you can modify and add values in the dictionary. In the preceding example, an `init_time` object is defined in `begin` and transferred to the `cb_params` dictionary.
A decision is made at each `step_end`. When the training time is greater than the configured time threshold, a training termination signal will be sent to the `run_context` to terminate the training in advance and the current values of epoch, step, and loss will be printed.
A decision is made at each `step_end`. When the training time is greater than the configured time threshold, a training termination signal will be sent to the `run_context` to terminate the training in advance and the current values of epoch, step, and loss will be printed.
## MindSpore Metrics
## MindSpore Metrics
...
@@ -155,16 +155,16 @@ ds_eval = create_dataset()
...
@@ -155,16 +155,16 @@ ds_eval = create_dataset()
output=model.eval(ds_eval)
output=model.eval(ds_eval)
```
```
The `model.eval()` method returns a dictionary that contains the metrics and results transferred to the metrics.
The `model.eval` method returns a dictionary that contains the metrics and results transferred to the metrics.
You can also define your own metrics class by inheriting the `Metric` base class and rewriting the `clear`, `update`, and `eval` methods.
You can also define your own metrics class by inheriting the `Metric` base class and rewriting the `clear`, `update`, and `eval` methods.
The `accuracy` operator is used as an example to describe the internal implementation principle.
The `accuracy` operator is used as an example to describe the internal implementation principle.
The `accuracy` inherits the `EvaluationBase` base class and rewrites the preceding three methods.
The `accuracy` inherits the `EvaluationBase` base class and rewrites the preceding three methods.
The `clear()` method initializes related calculation parameters in the class.
The `clear` method initializes related calculation parameters in the class.
The `update()` method accepts the predicted value and tag value and updates the internal variables of accuracy.
The `update` method accepts the predicted value and tag value and updates the internal variables of accuracy.
The `eval()` method calculates related indicators and returns the calculation result.
The `eval` method calculates related indicators and returns the calculation result.
By invoking the `eval` method of `accuracy`, you will obtain the calculation result.
By invoking the `eval` method of `accuracy`, you will obtain the calculation result.
You can understand how `accuracy` runs by using the following code:
You can understand how `accuracy` runs by using the following code:
...
@@ -184,8 +184,8 @@ The output is as follows:
...
@@ -184,8 +184,8 @@ The output is as follows:
Accuracy is 0.6667
Accuracy is 0.6667
```
```
## MindSpore Print Operator
## MindSpore Print Operator
MindSpore-developed print operator is used to print the tensors or character strings input by users. Multiple strings, multiple tensors, and a combination of tensors and strings are supported, which are separated by comma (,).
MindSpore-developed `Print` operator is used to print the tensors or character strings input by users. Multiple strings, multiple tensors, and a combination of tensors and strings are supported, which are separated by comma (,).
The use method of MindSpore print operator is the same that of other operators. You need to assert MindSpore print operator in `__init__`() and invoke using `construct()`. The following is an example.
The use method of MindSpore `Print` operator is the same that of other operators. You need to assert MindSpore `Print` operator in `__init__` and invoke using `construct`. The following is an example.
```python
```python
importnumpyasnp
importnumpyasnp
frommindsporeimportTensor
frommindsporeimportTensor
...
@@ -224,12 +224,10 @@ val:[[1 1]
...
@@ -224,12 +224,10 @@ val:[[1 1]
## Log-related Environment Variables and Configurations
## Log-related Environment Variables and Configurations
MindSpore uses glog to output logs. The following environment variables are commonly used:
MindSpore uses glog to output logs. The following environment variables are commonly used:
1. GLOG_v specifies the log level. The default value is 2, indicating the WARNING level. The values are as follows: 0: DEBUG; 1: INFO; 2: WARNING; 3: ERROR.
1.`GLOG_v` specifies the log level. The default value is 2, indicating the WARNING level. The values are as follows: 0: DEBUG; 1: INFO; 2: WARNING; 3: ERROR.
2. When GLOG_logtostderr is set to 1, logs are output to the screen. If the value is set to 0, logs are output to a file. Default value: 1
2. When `GLOG_logtostderr` is set to 1, logs are output to the screen. If the value is set to 0, logs are output to a file. Default value: 1
3. GLOG_log_dir=YourPath specifies the log output path. If GLOG_logtostderr is set to 0, value of this variable must be specified. If GLOG_log_dir is specified and the value of GLOG_logtostderr is 1, logs are output to the screen but not to a file. Logs of C++ and Python will be output to different files. The file name of C++ log complies with the naming rule of GLOG log file. Here, the name is `mindspore.MachineName.UserName.log.LogLevel.Timestamp`. The file name of Python log is `mindspore.log`.
3. GLOG_log_dir=*YourPath* specifies the log output path. If `GLOG_logtostderr` is set to 0, value of this variable must be specified. If `GLOG_log_dir is` specified and the value of `GLOG_logtostderr` is 1, logs are output to the screen but not to a file. Logs of C++ and Python will be output to different files. The file name of C++ log complies with the naming rule of GLOG log file. Here, the name is `mindspore.MachineName.UserName.log.LogLevel.Timestamp`. The file name of Python log is `mindspore.log`.
4. MS_SUBMODULE_LOG_v="{SubModule1:LogLevel1,SubModule2:LogLevel2,...}" specifies log levels of C++ sub modules of MindSpore. The specified sub module log level will overwrite the global log level. The meaning of submodule log level is same as GLOG_v, the sub modules of MindSpore grouped by source directory is as the bellow table. E.g. when set `GLOG_v=1 MS_SUBMODULE_LOG_v="{PARSER:2,ANALYZER:2}"` then log levels of `PARSER` and `ANALYZER` are WARNING, other modules' log levels are INFO.
4.`MS_SUBMODULE_LOG_v="{SubModule1:LogLevel1,SubModule2:LogLevel2,...}"` specifies log levels of C++ sub modules of MindSpore. The specified sub module log level will overwrite the global log level. The meaning of submodule log level is same as `GLOG_v`, the sub modules of MindSpore grouped by source directory is as the bellow table. E.g. when set `GLOG_v=1 MS_SUBMODULE_LOG_v="{PARSER:2,ANALYZER:2}"` then log levels of `PARSER` and `ANALYZER` are WARNING, other modules' log levels are INFO.
> The glog does not support to rotate the log files. If you need to control the disk space usage for log files, you can use the log file management tools provided by the operating system, such as Linux logrotate.
Sub moudles of MindSpore grouped by source directory:
Sub moudles of MindSpore grouped by source directory:
In the preceding execution, an intermediate result of network execution can be obtained at any required place in construct function, and the network can be debugged by using the Python Debugger (pdb).
In the preceding execution, an intermediate result of network execution can be obtained at any required place in `construt` function, and the network can be debugged by using the Python Debugger (pdb).
-`mode=context.GRAPH_MODE`: sets the running mode to graph mode for distributed training. (The PyNative mode does not support parallel running.)
-`mode=context.GRAPH_MODE`: sets the running mode to graph mode for distributed training. (The PyNative mode does not support parallel running.)
-`device_id`: physical sequence number of a device, that is, the actual sequence number of the device on the corresponding host.
-`device_id`: physical sequence number of a device, that is, the actual sequence number of the device on the corresponding host.
-`init()`: enables HCCL communication and completes the distributed training initialization.
-`init`: enables HCCL communication and completes the distributed training initialization.
## Loading the Dataset in Data Parallel Mode
## Loading the Dataset in Data Parallel Mode
...
@@ -233,7 +233,7 @@ The `Momentum` optimizer is used as the parameter update tool. The definition is
...
@@ -233,7 +233,7 @@ The `Momentum` optimizer is used as the parameter update tool. The definition is
## Training the Network
## Training the Network
`context.set_auto_parallel_context()` is an API for users to set parallel training parameters and must be called before the initialization of `Model`. If no parameters are specified, MindSpore will automatically set parameters to the empirical values based on the parallel mode. For example, in data parallel mode, `parameter_broadcast` is enabled by default. The related parameters are as follows:
`context.set_auto_parallel_context` is an API for users to set parallel training parameters and must be called before the initialization of `Model`. If no parameters are specified, MindSpore will automatically set parameters to the empirical values based on the parallel mode. For example, in data parallel mode, `parameter_broadcast` is enabled by default. The related parameters are as follows:
-`parallel_mode`: parallel distributed mode. The default value is `ParallelMode.STAND_ALONE`. The options are `ParallelMode.DATA_PARALLEL` and `ParallelMode.AUTO_PARALLEL`.
-`parallel_mode`: parallel distributed mode. The default value is `ParallelMode.STAND_ALONE`. The options are `ParallelMode.DATA_PARALLEL` and `ParallelMode.AUTO_PARALLEL`.
-`parameter_broadcast`: whether to broadcast initialized parameters. The default value is `True` in `DATA_PARALLEL` and `HYBRID_PARALLEL` mode.
-`parameter_broadcast`: whether to broadcast initialized parameters. The default value is `True` in `DATA_PARALLEL` and `HYBRID_PARALLEL` mode.
...
@@ -341,7 +341,7 @@ For details about other environment variables, see configuration items in the in
...
@@ -341,7 +341,7 @@ For details about other environment variables, see configuration items in the in
The running time is about 5 minutes, which is mainly occupied by operator compilation. The actual training time is within 20 seconds. You can use `ps -ef | grep pytest` to monitor task processes.
The running time is about 5 minutes, which is mainly occupied by operator compilation. The actual training time is within 20 seconds. You can use `ps -ef | grep pytest` to monitor task processes.
Log files are saved in the device directory. The env.log file records environment variable information. The train.log file records the loss function information. The following is an example:
Log files are saved in the `device` directory. The `env.log` file records environment variable information. The `train.log` file records the loss function information. The following is an example:
@@ -37,14 +37,14 @@ This document describes the computation process by using examples of automatic a
...
@@ -37,14 +37,14 @@ This document describes the computation process by using examples of automatic a
## Automatic Mixed Precision
## Automatic Mixed Precision
To use the automatic mixed precision, you need to invoke the corresponding API, which takes the network to be trained and the optimizer as the input. This API converts the operators of the entire network into FP16 operators (except the BatchNorm and Loss operators).
To use the automatic mixed precision, you need to invoke the corresponding API, which takes the network to be trained and the optimizer as the input. This API converts the operators of the entire network into FP16 operators (except the `BatchNorm` and Loss operators).
The procedure is as follows:
The procedure is as follows:
1. Introduce the MindSpore mixed precision API.
1. Introduce the MindSpore mixed precision API.
2. Define the network. This step is the same as the common network definition. (You do not need to manually configure the precision of any specific operator.)
2. Define the network. This step is the same as the common network definition. (You do not need to manually configure the precision of any specific operator.)
3. Use the amp.build_train_network() API to encapsulate the network model and optimizer. In this step, MindSpore automatically converts the operators to the required format.
3. Use the `amp.build_train_network` API to encapsulate the network model and optimizer. In this step, MindSpore automatically converts the operators to the required format.
A code example is as follows:
A code example is as follows:
...
@@ -98,7 +98,7 @@ MindSpore also supports manual mixed precision. It is assumed that only one dens
...
@@ -98,7 +98,7 @@ MindSpore also supports manual mixed precision. It is assumed that only one dens
The following is the procedure for implementing manual mixed precision:
The following is the procedure for implementing manual mixed precision:
1. Define the network. This step is similar to step 2 in the automatic mixed precision.
1. Define the network. This step is similar to step 2 in the automatic mixed precision.
2. Configure the mixed precision. Use net.to_float(mstype.float16) to set all operators of the cell and its sub-cells to FP16. Then, configure the dense to FP32.
2. Configure the mixed precision. Use `net.to_float(mstype.float16)` to set all operators of the cell and its sub-cells to FP16. Then, configure the dense to FP32.
3. Use TrainOneStepCell to encapsulate the network model and optimizer.
3. Use TrainOneStepCell to encapsulate the network model and optimizer.
@@ -30,8 +30,8 @@ At the beginning of AI algorithm design, related security threats are sometimes
...
@@ -30,8 +30,8 @@ At the beginning of AI algorithm design, related security threats are sometimes
This section describes how to use MindArmour in adversarial attack and defense by taking the Fast Gradient Sign Method (FGSM) attack algorithm and Natural Adversarial Defense (NAD) algorithm as examples.
This section describes how to use MindArmour in adversarial attack and defense by taking the Fast Gradient Sign Method (FGSM) attack algorithm and Natural Adversarial Defense (NAD) algorithm as examples.
> The current sample is for CPU, GPU and Ascend 910 AI processor. You can find the complete executable sample code at:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/model_safety>
> The current sample is for CPU, GPU and Ascend 910 AI processor. You can find the complete executable sample code at:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/model_safety>
@@ -229,7 +229,7 @@ The ResNet-50 network migration and training on the Ascend 910 is used as an exa
...
@@ -229,7 +229,7 @@ The ResNet-50 network migration and training on the Ascend 910 is used as an exa
Similar to the `Estimator` API of TensorFlow, the defined network prototype, loss function, and optimizer are transferred to the `Model` API of MindSpore and automatically combined into a network that can be used for training.
Similar to the `Estimator` API of TensorFlow, the defined network prototype, loss function, and optimizer are transferred to the `Model` API of MindSpore and automatically combined into a network that can be used for training.
To use loss scale in training, define a loss\_scale\_manager and transfer it to the `Model` API.
To use loss scale in training, define a `loss_scale_manager` and transfer it to the `Model` API.
Similar to `estimator.train()` of TensorFlow, you can call the `model.train` API to perform training. Functions such as CheckPoint and intermediate result printing can be defined on the `model.train` API in Callback mode.
Similar to `estimator.train` of TensorFlow, you can call the `model.train` API to perform training. Functions such as CheckPoint and intermediate result printing can be defined on the `model.train` API in Callback mode.
@@ -86,8 +86,8 @@ Currently, MindSpore GPU supports the long short-term memory (LSTM) network for
...
@@ -86,8 +86,8 @@ Currently, MindSpore GPU supports the long short-term memory (LSTM) network for
3. After the model is obtained, use the validation dataset to check the accuracy of model.
3. After the model is obtained, use the validation dataset to check the accuracy of model.
> The current sample is for the Ascend 910 AI processor. You can find the complete executable sample code at:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/lstm>
> The current sample is for the Ascend 910 AI processor. You can find the complete executable sample code at:<https://gitee.com/mindspore/docs/tree/master/tutorials/tutorial_code/lstm>
> - main.py: code file, including code for data preprocessing, network definition, and model training.
> - `main.py`: code file, including code for data preprocessing, network definition, and model training.
> - config.py: some configurations on the network, including the batch size and number of training epochs.
> - `config.py`: some configurations on the network, including the `batch size` and number of training epochs.
@@ -41,7 +41,7 @@ The environment requirements are as follows:
...
@@ -41,7 +41,7 @@ The environment requirements are as follows:
- decorator
- decorator
- scipy
- scipy
> numpy, decorator and scipy can be installed through pip. The reference command is as following.
> `numpy`, `decorator` and `scipy` can be installed through `pip`. The reference command is as following.
```bash
```bash
pip3 install numpy==1.16 decorator scipy
pip3 install numpy==1.16 decorator scipy
...
@@ -71,7 +71,7 @@ The compilation procedure is as follows:
...
@@ -71,7 +71,7 @@ The compilation procedure is as follows:
4. Obtain the compilation result.
4. Obtain the compilation result.
Go to the predict/output directory of the source code to view the generated package. The package name is MSPredict-{Version number}-{Host platform}_{Device platform}.tar.gz, for example, MSPredict-0.1.0-linux_aarch64.tar.gz. The package contains the following directories:
Go to the `predict/output` directory of the source code to view the generated package. The package name is MSPredict-*Version number*-*Host platform*_*Device platform*.tar.gz, for example, MSPredict-0.1.0-linux_aarch64.tar.gz. The package contains the following directories:
- include: MindSpore Predict header file.
- include: MindSpore Predict header file.
- lib: MindSpore Predict dynamic library.
- lib: MindSpore Predict dynamic library.
...
@@ -90,12 +90,12 @@ To perform on-device model inference using MindSpore, perform the following step
...
@@ -90,12 +90,12 @@ To perform on-device model inference using MindSpore, perform the following step
Take the LeNet network as an example. The generated on-device model file is `lenet.ms`. The complete sample code lenet.py is as follows:
Take the LeNet network as an example. The generated on-device model file is `lenet.ms`. The complete sample code `lenet.py` is as follows:
```python
```python
importos
importos
importnumpyasnp
importnumpyasnp
...
@@ -155,12 +155,12 @@ if __name__ == '__main__':
...
@@ -155,12 +155,12 @@ if __name__ == '__main__':
### Implementing On-Device Inference
### Implementing On-Device Inference
Use the .ms model file and image data as input to create a session and implement inference on the device.
Use the `.ms` model file and image data as input to create a session and implement inference on the device.


Figure 1 On-device inference sequence diagram
Figure 1 On-device inference sequence diagram
1. Load the .ms model file to the memory buffer. The ReadFile function needs to be implemented by users, according to the [C++ tutorial](http://www.cplusplus.com/doc/tutorial/files/).
1. Load the `.ms` model file to the memory buffer. The ReadFile function needs to be implemented by users, according to the [C++ tutorial](http://www.cplusplus.com/doc/tutorial/files/).
Select the LeNet network and set the inference input to lenet.bin. The complete sample code lenet.cpp is as follows:
Select the LeNet network and set the inference input to `lenet.bin`. The complete sample code `lenet.cpp` is as follows:
> MindSpore Predict uses FlatBuffers to define models. The FlatBuffers header file is required for parsing models. Therefore, you need to configure the FlatBuffers header file.
> MindSpore Predict uses `FlatBuffers` to define models. The `FlatBuffers` header file is required for parsing models. Therefore, you need to configure the `FlatBuffers` header file.
>
>
> Method: Copy the flatbuffers folder in MindSpore root directory/third_party/flatbuffers/include to the directory at the same level as session.h.
> Method: Copy the `flatbuffers` folder in MindSpore root directory`/third_party/flatbuffers/include` to the directory at the same level as `session.h`.
@@ -179,8 +179,8 @@ Use the `save_graphs` option of `context` to record the computational graph afte
...
@@ -179,8 +179,8 @@ Use the `save_graphs` option of `context` to record the computational graph afte
### Collect Performance Profile Data
### Collect Performance Profile Data
To enable the performance profiling of neural networks, MindInsight Profiler APIs should be added into the script. At first, the MindInsight `Profiler` object need
To enable the performance profiling of neural networks, `MindInsight Profiler` APIs should be added into the script. At first, the `MindInsight Profiler` object need
to be set after set context and before the network initialization. Then, at the end of the training, `Profiler.analyse()` should be called to finish profiling and generate the perforamnce
to be set after set context and before the network initialization. Then, at the end of the training, `Profiler.analyse` should be called to finish profiling and generate the perforamnce
analyse results.
analyse results.
The sample code is as follows:
The sample code is as follows:
...
@@ -265,13 +265,13 @@ MindInsight provides user with web services. Run the following command to view t
...
@@ -265,13 +265,13 @@ MindInsight provides user with web services. Run the following command to view t
ps -ef | grep mindinsight
ps -ef | grep mindinsight
```
```
Run the following command to access the working directory WORKSPACE corresponding to the service process based on the service process ID:
Run the following command to access the working directory `WORKSPACE` corresponding to the service process based on the service process ID:
```bash
```bash
lsof -p <PID> | grep access
lsof -p <PID> | grep access
```
```
Output with the working directory WORKSPACE as follows:
Output with the working directory `WORKSPACE` as follows:
@@ -89,7 +89,7 @@ For details about MindSpore modules, search on the [MindSpore API Page](https://
...
@@ -89,7 +89,7 @@ For details about MindSpore modules, search on the [MindSpore API Page](https://
Before compiling code, you need to learn basic information about the hardware and backend required for MindSpore running.
Before compiling code, you need to learn basic information about the hardware and backend required for MindSpore running.
You can use `context.set_context()` to configure the information required for running, such as the running mode, backend information, and hardware information.
You can use `context.set_context` to configure the information required for running, such as the running mode, backend information, and hardware information.
Import the `context` module and configure the required information.
Import the `context` module and configure the required information.
...
@@ -107,7 +107,7 @@ if __name__ == "__main__":
...
@@ -107,7 +107,7 @@ if __name__ == "__main__":
...
...
```
```
This example runs in graph mode. You can configure hardware information based on site requirements. For example, if the code runs on the Ascend AI processor, set `--device_target` to `Ascend`. This rule also applies to the code running on the CPU and GPU. For details about parameters, see the API description for `context.set_context()`.
This example runs in graph mode. You can configure hardware information based on site requirements. For example, if the code runs on the Ascend AI processor, set `--device_target` to `Ascend`. This rule also applies to the code running on the CPU and GPU. For details about parameters, see the API description for `context.set_context`.
## Processing Data
## Processing Data
...
@@ -115,12 +115,12 @@ Datasets are important for training. A good dataset can effectively improve trai
...
@@ -115,12 +115,12 @@ Datasets are important for training. A good dataset can effectively improve trai
### Defining the Dataset and Data Operations
### Defining the Dataset and Data Operations
Define the `create_dataset()` function to create a dataset. In this function, define the data augmentation and processing operations to be performed.
Define the `create_dataset` function to create a dataset. In this function, define the data augmentation and processing operations to be performed.
1. Define the dataset.
1. Define the dataset.
2. Define parameters required for data augmentation and processing.
2. Define parameters required for data augmentation and processing.
3. Generate corresponding data augmentation operations according to the parameters.
3. Generate corresponding data augmentation operations according to the parameters.
4. Use the `map()` mapping function to apply data operations to the dataset.
4. Use the `map` mapping function to apply data operations to the dataset.
To use MindSpore for neural network definition, inherit `mindspore.nn.cell.Cell`. `Cell` is the base class of all neural networks (such as `Conv2d`).
To use MindSpore for neural network definition, inherit `mindspore.nn.cell.Cell`. `Cell` is the base class of all neural networks (such as `Conv2d`).
Define each layer of a neural network in the `__init__()` method in advance, and then define the `construct()` method to complete the forward construction of the neural network. According to the structure of the LeNet network, define the network layers as follows:
Define each layer of a neural network in the `__init__` method in advance, and then define the `construct` method to complete the forward construction of the neural network. According to the structure of the LeNet network, define the network layers as follows:
```python
```python
importmindspore.ops.operationsasP
importmindspore.ops.operationsasP
...
@@ -399,7 +399,7 @@ checkpoint_lenet-1_1875.ckpt
...
@@ -399,7 +399,7 @@ checkpoint_lenet-1_1875.ckpt
```
```
In the preceding information:
In the preceding information:
`checkpoint_lenet-1_1875.ckpt`: saved model parameter file. The following refers to saved files as well. The file name format is checkpoint_{network name}-{epoch No.}_{step No.}.ckpt.
`checkpoint_lenet-1_1875.ckpt`: saved model parameter file. The following refers to saved files as well. The file name format is checkpoint_*network name*-*epoch No.*_*step No.*.ckpt.
## Validating the Model
## Validating the Model
...
@@ -427,7 +427,7 @@ if __name__ == "__main__":
...
@@ -427,7 +427,7 @@ if __name__ == "__main__":
```
```
In the preceding information:
In the preceding information:
`load_checkpoint()`: This API is used to load the CheckPoint model parameter file and return a parameter dictionary.
`load_checkpoint`: This API is used to load the CheckPoint model parameter file and return a parameter dictionary.
`checkpoint_lenet-3_1404.ckpt`: name of the saved CheckPoint model file.
`checkpoint_lenet-3_1404.ckpt`: name of the saved CheckPoint model file.
`load_param_into_net`: This API is used to load parameters to the network.
`load_param_into_net`: This API is used to load parameters to the network.
@@ -34,11 +34,11 @@ This section takes a Square operator as an example to describe how to customize
...
@@ -34,11 +34,11 @@ This section takes a Square operator as an example to describe how to customize
The primitive of an operator is a subclass inherited from `PrimitiveWithInfer`. The type name of the subclass is the operator name.
The primitive of an operator is a subclass inherited from `PrimitiveWithInfer`. The type name of the subclass is the operator name.
The definition of the custom operator primitive is the same as that of the built-in operator primitive.
The definition of the custom operator primitive is the same as that of the built-in operator primitive.
- The attribute is defined by the input parameter of the constructor function `__init__()`. The operator in this test case has no attribute. Therefore, `__init__()` has only one input parameter. For details about test cases in which operators have attributes, see [custom add3](https://gitee.com/mindspore/mindspore/tree/master/tests/st/ops/custom_ops_tbe/cus_add3.py) in the MindSpore source code.
- The attribute is defined by the input parameter of the constructor function `__init__`. The operator in this test case has no attribute. Therefore, `__init__` has only one input parameter. For details about test cases in which operators have attributes, see [custom add3](https://gitee.com/mindspore/mindspore/tree/master/tests/st/ops/custom_ops_tbe/cus_add3.py) in the MindSpore source code.
- The input and output names are defined by the `init_prim_io_names()` function.
- The input and output names are defined by the `init_prim_io_names` function.
- The shape inference method of the output tensor is defined in the `infer_shape()` function, and the dtype inference method of the output tensor is defined in the `infer_dtype()` function.
- The shape inference method of the output tensor is defined in the `infer_shape` function, and the dtype inference method of the output tensor is defined in the `infer_dtype` function.
The only difference between a custom operator and a built-in operator is that the operator implementation function (`from square_impl import CusSquareImpl`) needs to be imported to the `__init__()` function to register the operator implementation with the backend for the custom operator. In this test case, the operator implementation and information are defined in `square_impl.py`, and the definition will be described in the following parts.
The only difference between a custom operator and a built-in operator is that the operator implementation function (`from square_impl import CusSquareImpl`) needs to be imported to the `__init__` function to register the operator implementation with the backend for the custom operator. In this test case, the operator implementation and information are defined in `square_impl.py`, and the definition will be described in the following parts.
The following code takes the Square operator primitive `cus_square.py` as an example:
The following code takes the Square operator primitive `cus_square.py` as an example:
...
@@ -74,8 +74,8 @@ The entry function of an operator describes the internal process of compiling th
...
@@ -74,8 +74,8 @@ The entry function of an operator describes the internal process of compiling th
1. Prepare placeholders to be input. A placeholder will return a tensor object that represents a group of input data.
1. Prepare placeholders to be input. A placeholder will return a tensor object that represents a group of input data.
2. Call the computable function. The computable function uses the API provided by the TBE to describe the computation logic of the operator.
2. Call the computable function. The computable function uses the API provided by the TBE to describe the computation logic of the operator.
3. Call the scheduling module. The model tiles the operator data based on the scheduling description and specifies the data transfer process to ensure optimal hardware execution. By default, the automatic scheduling module (`auto_schedule`) can be used.
3. Call the scheduling module. The model tiles the operator data based on the scheduling description and specifies the data transfer process to ensure optimal hardware execution. By default, the automatic scheduling module (`auto_schedule`) can be used.
4. Call `cce_build_code()` to compile and generate an operator binary file.
4. Call `cce_build_code` to compile and generate an operator binary file.
> The input parameters of the entry function require the input information of each operator, output information of each operator, operator attributes (optional), and kernel_name (name of the generated operator binary file). The input and output information is encapsulated in dictionaries, including the input and output shape and dtype when the operator is called on the network.
> The input parameters of the entry function require the input information of each operator, output information of each operator, operator attributes (optional), and `kernel_name` (name of the generated operator binary file). The input and output information is encapsulated in dictionaries, including the input and output shape and dtype when the operator is called on the network.
For details about TBE operator development, visit the [TBE website](https://www.huaweicloud.com/ascend/tbe). For details about how to debug and optimize the TBE operator, visit the [Mind Studio website](https://www.huaweicloud.com/intl/en-us/ascend/mindstudio).
For details about TBE operator development, visit the [TBE website](https://www.huaweicloud.com/ascend/tbe). For details about how to debug and optimize the TBE operator, visit the [Mind Studio website](https://www.huaweicloud.com/intl/en-us/ascend/mindstudio).
...
@@ -85,7 +85,7 @@ The operator information is key for the backend to select the operator implement
...
@@ -85,7 +85,7 @@ The operator information is key for the backend to select the operator implement
> The numbers and sequences of the input and output information defined in the operator information must be the same as those in the parameters of the entry function of the operator implementation and those listed in the operator primitive.
> The numbers and sequences of the input and output information defined in the operator information must be the same as those in the parameters of the entry function of the operator implementation and those listed in the operator primitive.
> If an operator has attributes, use `attr()` to describe the attribute information in the operator information. The attribute names must be the same as those in the operator primitive definition.
> If an operator has attributes, use `attr` to describe the attribute information in the operator information. The attribute names must be the same as those in the operator primitive definition.
@@ -47,7 +47,7 @@ MindSpore provides write operation tools to write user-defined raw data in MindS
...
@@ -47,7 +47,7 @@ MindSpore provides write operation tools to write user-defined raw data in MindS
The field type can be int32, int64, float32, float64, string, or bytes.
The field type can be int32, int64, float32, float64, string, or bytes.
The field shape can be a one-dimensional array represented by [-1], a two-dimensional array represented by [m, n], or a three-dimensional array represented by [x, y, z].
The field shape can be a one-dimensional array represented by [-1], a two-dimensional array represented by [m, n], or a three-dimensional array represented by [x, y, z].
> 1. The type of a field with the shape attribute can only be int32, int64, float32, or float64.
> 1. The type of a field with the shape attribute can only be int32, int64, float32, or float64.
> 2. If the field has the shape attribute, prepare the data of numpy.ndarray type and transfer the data to the write_raw_data API.
> 2. If the field has the shape attribute, prepare the data of `numpy.ndarray` type and transfer the data to the `write_raw_data` API.
@@ -41,7 +41,7 @@ The operations can be performed separately. In practice, they are often used tog
...
@@ -41,7 +41,7 @@ The operations can be performed separately. In practice, they are often used tog


In the following example, the shuffle, batch, and repeat operations are performed when the MNIST dataset is read.
In the following example, the `shuffle`, `batch`, and `repeat` operations are performed when the MNIST dataset is read.
```python
```python
importmindspore.datasetasds
importmindspore.datasetasds
...
@@ -59,7 +59,7 @@ The following describes how to construct a simple dataset `ds1` and perform data
...
@@ -59,7 +59,7 @@ The following describes how to construct a simple dataset `ds1` and perform data
```python
```python
importmindspore.datasetasds
importmindspore.datasetasds
```
```
2. Define the `generator_func()` function for dataset generating.
2. Define the `generator_func` function for dataset generating.
```python
```python
defgenerator_func():
defgenerator_func():
foriinrange(5):
foriinrange(5):
...
@@ -88,7 +88,7 @@ In limited datasets, to optimize the network, a dataset is usually trained for m
...
@@ -88,7 +88,7 @@ In limited datasets, to optimize the network, a dataset is usually trained for m
> In machine learning, an epoch refers to one cycle through the full training dataset.
> In machine learning, an epoch refers to one cycle through the full training dataset.
During multiple epochs, `repeat()` can be used to increase the data size. The definition of `repeat()` is as follows:
During multiple epochs, `repeat` can be used to increase the data size. The definition of `repeat` is as follows:
```python
```python
defrepeat(self,count=None):
defrepeat(self,count=None):
```
```
...
@@ -118,7 +118,7 @@ ds2:
...
@@ -118,7 +118,7 @@ ds2:
[4 5 6]
[4 5 6]
```
```
### batch
### batch
Combine data records in datasets into batches. In practice, data can be processed in batches. Training data in batches can reduce training steps and accelerate the training process. MindSpore uses the `batch()` function to implement the batch operation. The function is defined as follows:
Combine data records in datasets into batches. In practice, data can be processed in batches. Training data in batches can reduce training steps and accelerate the training process. MindSpore uses the `batch` function to implement the batch operation. The function is defined as follows:


...
@@ -167,11 +167,11 @@ You can shuffle ordered or repeated datasets.
...
@@ -167,11 +167,11 @@ You can shuffle ordered or repeated datasets.


The shuffle operation is used to shuffle data. A larger value of buffer_size indicates a higher shuffling degree, consuming more time and computing resources.
The shuffle operation is used to shuffle data. A larger value of buffer_size indicates a higher shuffling degree, consuming more time and computing resources.
The definition of `shuffle()` is as follows:
The definition of `shuffle` is as follows:
```python
```python
defshuffle(self,buffer_size):
defshuffle(self,buffer_size):
```
```
Call `shuffle()` to shuffle the dataset `ds1`. The sample code is as follows:
Call `shuffle` to shuffle the dataset `ds1`. The sample code is as follows:
```python
```python
print("Before shuffle:")
print("Before shuffle:")
...
@@ -200,19 +200,19 @@ After shuffle:
...
@@ -200,19 +200,19 @@ After shuffle:
```
```
### map
### map
The map operation is used to process data. For example, convert the dataset of color images into the dataset of grayscale images. You can flexibly perform the operation as required.
The map operation is used to process data. For example, convert the dataset of color images into the dataset of grayscale images. You can flexibly perform the operation as required.
MindSpore provides the `map()` function to map datasets. You can apply the provided functions or operators to the specified column data.
MindSpore provides the `map` function to map datasets. You can apply the provided functions or operators to the specified column data.
You can customize the function or use `c_transforms` or `py_transforms` for data augmentation.
You can customize the function or use `c_transforms` or `py_transforms` for data augmentation.
> For details about data augmentation operations, see Data Augmentation section.
> For details about data augmentation operations, see Data Augmentation section.
In the following example, the `map()` function is used to apply the defined anonymous function (lambda function) to the dataset `ds1` so that the data values in the dataset are multiplied by 2.
In the following example, the `map` function is used to apply the defined anonymous function (lambda function) to the dataset `ds1` so that the data values in the dataset are multiplied by 2.
```python
```python
func=lambdax:x*2# Define lambda function to multiply each element by 2.
func=lambdax:x*2# Define lambda function to multiply each element by 2.
ds2=ds1.map(input_columns="data",operations=func)
ds2=ds1.map(input_columns="data",operations=func)
...
@@ -228,7 +228,7 @@ The code output is as follows. Data values in each row of the dataset `ds2` is m
...
@@ -228,7 +228,7 @@ The code output is as follows. Data values in each row of the dataset `ds2` is m
[8 10 12]
[8 10 12]
```
```
### zip
### zip
MindSpore provides the `zip()` function to combine multiple datasets into one dataset.
MindSpore provides the `zip` function to combine multiple datasets into one dataset.
> If the column names in the two datasets are the same, the two datasets are not combined. Therefore, pay attention to column names.
> If the column names in the two datasets are the same, the two datasets are not combined. Therefore, pay attention to column names.
> If the number of rows in the two datasets is different, the number of rows after combination is the same as the smaller number.
> If the number of rows in the two datasets is different, the number of rows after combination is the same as the smaller number.
```python
```python
...
@@ -267,7 +267,7 @@ MindSpore provides the `c_transforms` and `py_transforms` module functions for u
...
@@ -267,7 +267,7 @@ MindSpore provides the `c_transforms` and `py_transforms` module functions for u
| `py_transforms` | Python-based [PIL](https://pypi.org/project/Pillow/) implementation | This module provides multiple image augmentation functions and the method for converting between PIL images and NumPy arrays. |
| `py_transforms` | Python-based [PIL](https://pypi.org/project/Pillow/) implementation | This module provides multiple image augmentation functions and the method for converting between PIL images and NumPy arrays. |
For users who would like to use Python PIL in image learning tasks, the `py_transforms` module is a good tool for image augmentation. You can use Python PIL to customize extensions.
For users who would like to use Python PIL in image learning tasks, the `py_transforms` module is a good tool for image augmentation. You can use Python PIL to customize extensions.
Data augmentation requires the `map()` function. For details about how to use the `map()` function, see [map](#map).
Data augmentation requires the `map` function. For details about how to use the `map` function, see [map](#map).
### Using the `c_transforms` Module
### Using the `c_transforms` Module
...
@@ -287,7 +287,7 @@ Data augmentation requires the `map()` function. For details about how to use th
...
@@ -287,7 +287,7 @@ Data augmentation requires the `map()` function. For details about how to use th
imgplot_resized=plt.imshow(data["image"])
imgplot_resized=plt.imshow(data["image"])
plt.show()
plt.show()
```
```
The running result shows that the original image is changed from 1024 x 683 pixels to 500 x 500 pixels after data processing by using `Resize()`.
The running result shows that the original image is changed from 1024 x 683 pixels to 500 x 500 pixels after data processing by using `Resize`.


Figure 1: Original image
Figure 1: Original image
...
@@ -321,7 +321,7 @@ Figure 2: Image after its size is reset
...
@@ -321,7 +321,7 @@ Figure 2: Image after its size is reset
plt.show()
plt.show()
```
```
The running result shows that the original image is changed from 1024 x 683 pixels to 500 x 500 pixels after data processing by using `RandomCrop()`.
The running result shows that the original image is changed from 1024 x 683 pixels to 500 x 500 pixels after data processing by using `RandomCrop`.
@@ -16,13 +16,13 @@ Models based on MindSpore training can be used for inference on different hardwa
...
@@ -16,13 +16,13 @@ Models based on MindSpore training can be used for inference on different hardwa
1. Inference on the Ascend 910 AI processor
1. Inference on the Ascend 910 AI processor
MindSpore provides the `model.eval()` API for model validation. You only need to import the validation dataset. The processing method of the validation dataset is the same as that of the training dataset. For details about the complete code, see <https://gitee.com/mindspore/mindspore/blob/master/example/resnet50_cifar10/eval.py>.
MindSpore provides the `model.eval` API for model validation. You only need to import the validation dataset. The processing method of the validation dataset is the same as that of the training dataset. For details about the complete code, see <https://gitee.com/mindspore/mindspore/blob/master/example/resnet50_cifar10/eval.py>.
```python
```python
res=model.eval(dataset)
res=model.eval(dataset)
```
```
In addition, the` model.predict ()` interface can be used for inference. For detailed usage, please refer to API description.
In addition, the` model.predict` interface can be used for inference. For detailed usage, please refer to API description.