提交 d3e26574 编写于 作者: C chengxiao

Add the English document of network_migration

上级 f5b5a0fc
......@@ -9,6 +9,7 @@
- [Support](#support)
- [Model Support](#model-support)
- [Backend Support](#backend-support)
- [System Support](#system-support)
- [Programming Language](#programming-language)
- [Others](#others)
- [Features](#features)
......@@ -69,6 +70,12 @@ Q: What types of model is currently supported by MindSpore for training ?
A: MindSpore has basic support for common training scenarios, please refer to [Release note](https://gitee.com/mindspore/mindspore/blob/master/RELEASE.md) for detailed information.
<br/>
Q: What are the available recommendation or text generation networks or models provided by MindSpore?
A: Currently, recommendation models such as Wide & Deep, DeepFM, and NCF are under development. In the natural language processing (NLP) field, Bert\_NEZHA is available and models such as MASS are under development. You can rebuild the network into a text generation network based on the scenario requirements. Please stay tuned for updates on the [MindSpore Model Zoo](https://gitee.com/mindspore/mindspore/tree/master/mindspore/model_zoo).
### Backend Support
Q: When install or run MindSpore, are there any requirements for hardwares like GPU, NPU and so forth ?
......@@ -81,6 +88,18 @@ Q: Does MindSpore have any plan on supporting other types of heterogeneous compu
A: MindSpore provides pluggable device management interface so that developer could easily integrate other types of heterogeneous computing hardwares like FPGA to MindSpore. We welcome more backend support in MindSpore from the community.
<br/>
Q: What hardware does MindSpore require?
A: Currently, you can try out MindSpore through Docker images on laptops or in environments with GPUs. Some models in MindSpore Model Zoo support GPU-based training and inference, and other models are being improved. For distributed parallel training, MindSpore supports multi-GPU training. You can obtain the latest information from [RoadMap](https://www.mindspore.cn/docs/en/master/roadmap.html) and project [Release Notes](https://gitee.com/mindspore/mindspore/blob/master/RELEASE.md).
### System Support
Q: Does MindSpore support Windows 10?
A: The MindSpore CPU version can be installed on Windows 10. For details about the installation procedure, see tutorials on the [MindSpore official website](https://www.mindspore.cn/tutorial/en/master/advanced_use/mindspore_cpu_win_install.html).
### Programming Language
Q: The recent announced programming language such as taichi got Python extensions that could be directly used as `import taichi as ti`. Does MindSpore have similar support ?
......@@ -99,14 +118,50 @@ Q: How does MindSpore implement semantic collaboration and processing? Is the po
A: The MindSpore framework does not support FCA. For semantic models, you can call third-party tools to perform FCA in the data preprocessing phase. MindSpore supports Python therefore `import FCA` could do the trick.
<br/>
Q: Where can I view the sample code or tutorial of MindSpore training and inference?
A: Please visit the [MindSpore official website](https://www.mindspore.cn/tutorial/en/master/index.html).
## Features
Q: Does MindSpore have any plan or consideration on the edge and device when the training and inference functions on the cloud are relatively mature?
A: MindSpore is a unified cloud-edge-device training and inference framework. Edge has been considered in its design, so MindSpore can perform inference at the edge. The open-source version will support Ascend 310-based inference. Currently, inference supports optimization operations, including quantization, operator fusion, and memory overcommitment.
<br/>
Q: How does MindSpore support automatic parallelism?
A: Automatic parallelism on CPUs and GPUs are being improved. You are advised to use the automatic parallelism feature on the Ascend 910 AI processor. Follow our open source community and apply for a MindSpore developer experience environment for trial use.
<br/>
Q: What is the relationship between MindSpore and ModelArts? Can MindSpore be used on ModelArts?
A: ModelArts is an online training and inference platform on HUAWEI CLOUD. MindSpore is a Huawei deep learning framework. You can view the tutorials on the [MindSpore official website](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/use_on_the_cloud.html) to learn how to train MindSpore models on ModelArts.
## Capabilities
Q: Does MindSpore have a module that can implement object detection algorithms as TensorFlow does?
A: The TensorFlow's object detection pipeline API belongs to the TensorFlow's Model module. After MindSpore's detection models are complete, similar pipeline APIs will be provided.
<br/>
Q: How do I migrate scripts or models of other frameworks to MindSpore?
A: For details about script or model migration, please visit the [MindSpore official website](https://www.mindspore.cn/tutorial/en/master/advanced_use/network_migration.html).
<br/>
Q: Does MindSpore provide open-source e-commerce datasets?
A: No. Please stay tuned for updates on the [MindSpore official website](https://www.mindspore.cn/en).
<br/>
Q: How simple can the MindSpore model training code be?
A: MindSpore provides Model APIs except for network definitions. In most scenarios, model training can be completed using only a few lines of code.
# Network Migration
<!-- TOC -->
- [Network Migration](#network-migration)
- [Overview](#overview)
- [Preparations](#preparations)
- [Operator Assessment](#operator-assessment)
- [Software and Hardware Environments](#software-and-hardware-environments)
- [E2E Network Migration](#e2e-network-migration)
- [Training Phase](#training-phase)
- [Script Migration](#script-migration)
- [Accuracy Debugging](#accuracy-debugging)
- [On-Cloud Integration](#on-cloud-integration)
- [Inference Phase](#inference-phase)
- [Examples](#examples)
<!-- /TOC -->
<a href="https://gitee.com/mindspore/docs/tree/master/tutorials/source_en/advanced_use/network_migration.md" target="_blank"><img src="../_static/logo_source.png"></a>
## Overview
You've probably written scripts for frameworks such as TensorFlow and PyTorch. This tutorial describes how to migrate existing TensorFlow and PyTorch networks to MindSpore, including key steps and operation recommendations which help you quickly migrate your network.
## Preparations
Before you start working on your scripts, prepare your operator assessment and hardware and software environments to make sure that MindSpore can support the network you want to migrate.
### Operator Assessment
Analyze the operators contained in the network to be migrated and figure out how does MindSpore support these operators based on the [Operator List](https://www.mindspore.cn/docs/en/master/operator_list.html).
Take ResNet-50 as an example. The two major operators [Conv](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.nn.html#mindspore.nn.Conv2d) and [BatchNorm](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.nn.html#mindspore.nn.BatchNorm2d) exist in the MindSpore Operator List.
If any operator does not exist, you are advised to perform the following operations:
- Operator replacement: Analyze the operator implementation formula and check whether a combination of existing operators of MindSpore can be used to achieve the expected objective.
- Substitution solution: For example, if a loss operator is not supported, check whether it can be replaced with a loss operator of the same type supported by MindSpore; alternatively, check whether the current network structure can be replaced by another mainstream network of the same type.
If the operators used for replacement are not able to fulfill complete function, you are advised to perform the following operations:
- Delete unnecessary functions.
- Find a substitution solution for necessary functions.
If the preceding requirements cannot be met, you can raise requirements in the [MindSpore community](https://gitee.com/mindspore/mindspore).
### Software and Hardware Environments
Prepare the hardware environment, find a platform corresponding to your environment by referring to the [installation guide](https://www.mindspore.cn/install/en), and install MindSpore.
## E2E Network Migration
### Training Phase
#### Script Migration
MindSpore differs from TensorFlow and PyTorch in the network structure. Before migration, you need to clearly understand the original script and information of each layer, such as shape.
The ResNet-50 network migration and training on the Ascend 910 is used as an example.
1. Import MindSpore modules.
Import the corresponding MindSpore modules based on the required APIs. For details about the module list, see <https://www.mindspore.cn/api/en/master/index.html>.
2. Load and preprocess a dataset.
Use MindSpore to build the required dataset. Currently, MindSpore supports common datasets. You can call APIs in the original format, `MindRecord`, and `TFRecord`. In addition, MindSpore supports data processing and data augmentation. For details, see the [Data Preparation](https://www.mindspore.cn/tutorial/en/master/use/data_preparation/data_preparation.html).
In this example, the CIFAR-10 dataset is loaded, which supports both single-GPU and multi-GPU scenarios.
```python
if device_num == 1:
ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=4, shuffle=True)
else:
ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=4, shuffle=True,
num_shards=device_num, shard_id=rank_id)
```
Then, perform data augmentation, data cleaning, and batch processing. For details about the code, see <https://gitee.com/mindspore/mindspore/blob/master/example/resnet50_cifar10/dataset.py>.
3. Build a network.
The biggest difference between MindSpore and TensorFlow in convolution is the data format. `NCHW` is used in MindSpore by default, while `NHWC` is used in TensorFlow.
The following uses the first convolutional layer on the ResNet-50 network whose batch\_size is set to 32 as an example:
- In TensorFlow, the format of the input feature is \[32, 224, 224, 3], and the size of the convolution kernel is \[7, 7, 3, 64].
- In MindSpore, the format of the input feature is \[32, 3, 224, 224], and the size of the convolution kernel is \[64, 3, 7, 7].
```python
def _conv7x7(in_channel, out_channel, stride=1):
weight_shape = (out_channel, in_channel, 7, 7)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=7, stride=stride, padding=0, pad_mode='same', weight_init=weight)
def _bn(channel):
return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
gamma_init=1, beta_init=0, moving_mean_init=0, moving_var_init=1)
```
4. Build a subnet.
In MindSpore, `nn.Cell` is used to build a subnet structure. The network structure must be defined before being used in a subnet. Define each operator to be used in the `__init__` function of the Cell, connect the defined operators in the `construct` function, and then return the output of the subnet through `return`.
```python
class ResidualBlock(nn.Cell):
"""
ResNet V1 residual block definition.
Args:
in_channel (int): Input channel.
out_channel (int): Output channel.
stride (int): Stride size for the first convolutional layer. Default: 1.
Returns:
Tensor, output tensor.
Examples:
>>> ResidualBlock(3, 256, stride=2)
"""
expansion = 4
def __init__(self,
in_channel,
out_channel,
stride=1):
super(ResidualBlock, self).__init__()
channel = out_channel
self.conv1 = _conv1x1(in_channel, channel, stride=1)
self.bn1 = _bn(channel)
self.conv2 = _conv3x3(channel, channel, stride=stride)
self.bn2 = _bn(channel)
self.conv3 = _conv1x1(channel, out_channel, stride=1)
self.bn3 = _bn_last(out_channel)
self.relu = nn.ReLU()
self.down_sample = False
if stride != 1 or in_channel != out_channel:
self.down_sample = True
self.down_sample_layer = None
if self.down_sample:
self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride),
_bn(out_channel)])
self.add = P.TensorAdd()
def construct(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.down_sample:
identity = self.down_sample_layer(identity)
out = self.add(out, identity)
out = self.relu(out)
return out
```
5. Define a concatenated structure.
The ResNet-50 network has a large number of repeated structures. In TensorFlow, you can use the for loop function to reduce repeated code. In MindSpore, each defined Cell object is independent. Especially for subnets with weight parameters, the defined Cell cannot be used repeatedly. If a large number of repeated concatenated structures exist, you can construct multiple Cell instances using the for loop function and concatenate them by using `SequentialCell`.
```python
def _make_layer(self, block, layer_num, in_channel, out_channel, stride):
"""
Make stage network of ResNet.
Args:
block (Cell): Resnet block.
layer_num (int): Layer number.
in_channel (int): Input channel.
out_channel (int): Output channel.
stride (int): Stride size for the first convolutional layer.
Returns:
SequentialCell, the output layer.
Examples:
>>> _make_layer(ResidualBlock, 3, 128, 256, 2)
"""
layers = []
resnet_block = block(in_channel, out_channel, stride=stride)
layers.append(resnet_block)
for _ in range(1, layer_num):
resnet_block = block(out_channel, out_channel, stride=1)
layers.append(resnet_block)
return nn.SequentialCell(layers)
```
6. Build the entire network.
The [ResNet-50](https://gitee.com/mindspore/mindspore/blob/master/mindspore/model_zoo/resnet.py) network structure is formed by connecting multiple defined subnets. Follow the rule of defining subnets before using them and define all the subnets used in the `__init__` and connect subnets in the `construct`.
7. Define a loss function and an optimizer.
After the network is defined, the loss function and optimizer need to be defined accordingly.
```python
loss = SoftmaxCrossEntropyWithLogits(sparse=True)
opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, config.momentum, config.weight_decay, config.loss_scale)
```
8. Build a model.
Similar to the `Estimator` API of TensorFlow, the defined network prototype, loss function, and optimizer are transferred to the `Model` API of MindSpore and automatically combined into a network that can be used for training.
To use loss scale in training, define a loss\_scale\_manager and transfer it to the `Model` API.
```python
loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False)
```
You can use a built-in assessment method of `Model` by setting the [metrics](https://www.mindspore.cn/tutorial/en/master/advanced_use/customized_debugging_information.html#mindspore-metrics) attribute.
```python
model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale, metrics={'acc'})
```
Similar to `estimator.train()` of TensorFlow, you can call the `model.train` API to perform training. Functions such as CheckPoint and intermediate result printing can be defined on the `model.train` API in Callback mode.
```python
time_cb = TimeMonitor(data_size=step_size)
loss_cb = LossMonitor()
cb = [time_cb, loss_cb]
if config.save_checkpoint:
config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_steps,
keep_checkpoint_max=config.keep_checkpoint_max)
ckpt_cb = ModelCheckpoint(prefix="resnet", directory=config.save_checkpoint_path, config=config_ck)
cb += [ckpt_cb]
model.train(epoch_size, dataset, callbacks=cb)
```
#### Accuracy Debugging
The accuracy optimization process is as follows:
1. When validating the single-GPU accuracy, you are advised to use a small dataset for training. After the validation is successful, use the full dataset for multi-GPU accuracy validation. This helps improve the debugging efficiency.
2. Delete unnecessary skills (such as augmentation configuration and dynamic loss scale in an optimizer) from the script. After the validation is successful, add functions one by one. After a new function is confirmed to be normal, add the next function. In this way, you can quickly locate the fault.
#### On-Cloud Integration
Run your scripts on ModelArts. For details, see [Using MindSpore on Cloud](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/use_on_the_cloud.html).
### Inference Phase
Models trained on the Ascend 910 AI processor can be used for inference on different hardware platforms.
1. Inference on the Ascend 910 AI processor
Similar to the `estimator.evaluate()` API of TensorFlow, MindSpore provides the `model.eval()` API for model validation. You only need to import the validation dataset. The processing method of the validation dataset is the same as that of the training dataset. For details about the complete code, see <https://gitee.com/mindspore/mindspore/blob/master/example/resnet50_cifar10/eval.py>.
```python
res = model.eval(dataset)
```
2. Inference on the Ascend 310 AI processor
1. Export the ONNX or GEIR model by referring to the [Export GEIR Model and ONNX Model](https://www.mindspore.cn/tutorial/en/master/use/saving_and_loading_model_parameters.html#geironnx).
2. For performing inference in the cloud environment, see the [Ascend 910 training and Ascend 310 inference samples](https://support.huaweicloud.com/bestpractice-modelarts/modelarts_10_0026.html). For details about the bare-metal environment (compared with the cloud environment where the Ascend 310 AI processor is deployed locally), see the description document of the Ascend 310 AI processor software package.
3. Inference on a GPU
1. Export the ONNX model by referring to the [Export GEIR Model and ONNX Model](https://www.mindspore.cn/tutorial/en/master/use/saving_and_loading_model_parameters.html#geironnx).
2. Perform inference on the NVIDIA GPU by referring to [TensorRT backend for ONNX](https://github.com/onnx/onnx-tensorrt).
## Examples
1. [Common network script examples](https://gitee.com/mindspore/mindspore/tree/master/example)
2. [Common dataset examples](https://www.mindspore.cn/tutorial/en/master/use/data_preparation/loading_the_datasets.html)
3. [Model Zoo](https://gitee.com/mindspore/mindspore/tree/master/mindspore/model_zoo)
......@@ -34,6 +34,7 @@ MindSpore Tutorials
advanced_use/nlp_application
advanced_use/customized_debugging_information
advanced_use/on_device_inference
advanced_use/network_migration
advanced_use/model_security
advanced_use/mindspore_cpu_win_install
advanced_use/community
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册