!272 update multi platform inference

Merge pull request !272 from leiyuning/master

!272 update multi platform inference
Merge pull request !272 from leiyuning/master
c6fdaec9 · mindspore-ci-bot · Gitee · 1cd7341f · ee9904a7 · c6fdaec9
3 changed file
--- a/docs/source_en/glossary.md
+++ b/docs/source_en/glossary.md
@@ -10,6 +10,7 @@

 |  Acronym and Abbreviation  |  Description  | 
 | -----    | -----    |
+| ACL | Ascend Computer Language, for users to develop deep neural network applications, which provides the C++ API library including device management, context management, stream management, memory management, model loading and execution, operator loading and execution, media data processing, etc. |
 |  Ascend  |  Name of Huawei Ascend series chips.  |
 |  CCE  | Cube-based Computing Engine, which is an operator development tool oriented to hardware architecture programming.  |
 |  CCE-C  |  Cube-based Computing Engine C, which is C code developed by the CCE.  |
@@ -24,6 +25,7 @@
 |  FP16  |  16-bit floating point, which is a half-precision floating point arithmetic format, consuming less memory.  |
 |  FP32  |  32-bit floating point, which is a single-precision floating point arithmetic format.  |
 |  GE  |  Graph Engine, MindSpore computational graph execution engine, which is responsible for optimizing hardware (such as operator fusion and memory overcommitment) based on the front-end computational graph and starting tasks on the device side.  |
+| GEIR | Graph Engine Intermediate Representation, such as ONNX, it is an open file format for machine learning. It is defined by Huawei and is better suited to Ascend AI processor.|
 |  GHLO  |  Graph High Level Optimization. GHLO includes optimization irrelevant to hardware (such as dead code elimination), auto parallel, and auto differentiation.  |
 |  GLLO  |  Graph Low Level Optimization. GLLO includes hardware-related optimization and in-depth optimization related to the combination of hardware and software, such as operator fusion and buffer fusion.  |
 |  Graph Mode  |  MindSpore static graph mode. In this mode, the neural network model is compiled into an entire graph and then delivered for execution, featuring high performance.  |
@@ -40,6 +42,7 @@
 |  MindSpore  |  Huawei-leaded open-source deep learning framework.  |
 |  MindSpore Predict  |  A lightweight deep neural network inference engine that provides the inference function for models trained by MindSpore on the device side.  |
 |  MNIST database  |  Modified National Handwriting of Images and Technology database, a large handwritten digit database, which is usually used to train various image processing systems.  |
+| ONNX | Open Neural Network Exchange, is an open format built to represent machine learning models.|
 |  PyNative Mode  |  MindSpore dynamic graph mode. In this mode, operators in the neural network are delivered and executed one by one, facilitating the compilation and debugging of the neural network model.  |
 |  ResNet-50  |  Residual Neural Network 50, a residual neural network proposed by four Chinese people, including Kaiming He from Microsoft Research Institute.  |
 |  Schema  |  Data set structure definition file, which defines the fields contained in a dataset and the field types.  |

--- a/docs/source_zh_cn/glossary.md
+++ b/docs/source_zh_cn/glossary.md
@@ -10,6 +10,7 @@

 |  术语/缩略语  |  说明  | 
 | -----    | -----    |
+| ACL | Ascend Computer Language，提供Device管理、Context管理、Stream管理、内存管理、模型加载与执行、算子加载与执行、媒体数据处理等C++ API库，供用户开发深度神经网络应用。|
 |  Ascend  |  华为昇腾系列芯片的系列名称。  |
 |  CCE  |  Cube-based Computing Engine，面向硬件架构编程的算子开发工具。  |
 |  CCE-C  |  Cube-based Computing Engine C，使用CCE开发的C代码。  |
@@ -24,6 +25,7 @@
 |  FP16  |  16位浮点，半精度浮点算术，消耗更小内存。  |
 |  FP32  |  32位浮点，单精度浮点算术。  |
 |  GE  |  Graph Engine，MindSpore计算图执行引擎，主要负责根据前端的计算图完成硬件相关的优化（算子融合、内存复用等等）、device侧任务启动。  |
+| GEIR | Graph Engine Intermediate Representation，类似ONNX，是华为定义的针对机器学习所设计的开放式的文件格式，能更好地适配Ascend AI处理器。|
 |  GHLO  |  Graph High Level Optimization，计算图高级别优化。GHLO包含硬件无关的优化（如死代码消除等）、自动并行和自动微分等功能。  |
 |  GLLO  |  Graph Low Level Optimization，计算图低级别优化。GLLO包含硬件相关的优化，以及算子融合、Buffer融合等软硬件结合相关的深度优化。  |
 |  Graph Mode  |  MindSpore的静态图模式，将神经网络模型编译成一整张图，然后下发执行，性能高。  |
@@ -40,6 +42,7 @@
 |  MindSpore  |  华为主导开源的深度学习框架。  |
 |  MindSpore Predict  |  一个轻量级的深度神经网络推理引擎，提供了将MindSpore训练出的模型在端侧进行推理的功能。  |
 |  MNIST database  |  Modified National Institute of Standards and Technology database，一个大型手写数字数据库，通常用于训练各种图像处理系统。  |
+| ONNX | Open Neural Network Exchange，是一种针对机器学习所设计的开放式的文件格式，用于存储训练好的模型。|
 |  PyNative Mode  |  MindSpore的动态图模式，将神经网络中的各个算子逐一下发执行，方便用户编写和调试神经网络模型。  |
 |  ResNet-50  |  Residual Neural Network 50，由微软研究院的Kaiming He等四名华人提出的残差神经网络。  |
 |  Schema  |  数据集结构定义文件，用于定义数据集包含哪些字段以及字段的类型。  |

--- a/tutorials/source_zh_cn/use/multi_platform_inference.md
+++ b/tutorials/source_zh_cn/use/multi_platform_inference.md
@@ -5,39 +5,107 @@
 - [多平台推理](#多平台推理)
    - [概述](#概述)
    - [Ascend 910 AI处理器上推理](#ascend-910-ai处理器上推理)
+        - [使用checkpoint格式文件推理](#使用checkpoint格式文件推理)
    - [Ascend 310 AI处理器上推理](#ascend-310-ai处理器上推理)
+        - [使用checkpoint格式文件推理](#使用checkpoint格式文件推理-1)
+        - [使用ONNX与GEIR格式文件推理](#使用onnx与geir格式文件推理)
    - [GPU上推理](#gpu上推理)
+        - [使用checkpoint格式文件推理](#使用checkpoint格式文件推理-2)
+        - [使用ONNX格式文件推理](#使用onnx格式文件推理)
+    - [CPU上推理](#cpu上推理)
+        - [使用checkpoint格式文件推理](#使用checkpoint格式文件推理-3)
+        - [使用ONNX格式文件推理](#使用onnx格式文件推理-1)
    - [端侧推理](#端侧推理)

 <!-- /TOC -->

-<a href="https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/advanced_use/multi_platform_inference.md" target="_blank"><img src="../_static/logo_source.png"></a>
+<a href="https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/use/multi_platform_inference.md" target="_blank"><img src="../_static/logo_source.png"></a>

 ## 概述

 基于MindSpore训练后的模型，支持在不同的硬件平台上执行推理。本文介绍各平台上的推理流程。

+按照原理不同，推理可以有两种方式：
+- 直接使用checkpiont文件进行推理，即在MindSpore训练环境下，使用推理接口加载数据及checkpoint文件进行推理。
+- 将checkpiont文件转化为通用的模型格式，如ONNX、GEIR格式模型文件进行推理，推理环境不需要依赖MindSpore。这样的好处是可以跨硬件平台，只要支持ONNX/GEIR推理的硬件平台即可进行推理。譬如在Ascend 910 AI处理器上训练的模型，可以在GPU/CPU上进行推理。
+
+MindSpore支持的推理场景，按照硬件平台维度可以分为下面几种：
+
+硬件平台 | 推理文件 | 说明
+--|--|--
+Ascend 910 AI处理器 | checkpoint格式文件 | 与MindSpore训练环境依赖一致
+Ascend 310 AI处理器 | ONNX、GEIR格式文件 | 搭载了ACL框架，需要转化为OM格式模型。
+GPU | checkpoint格式文件 | 与MindSpore训练环境依赖一致。
+GPU | ONNX格式文件 | 支持ONNX推理的runtime/SDK，如TensorRT。
+CPU | checkpoint文件 | 与MindSpore训练环境依赖一致。
+CPU | ONNX格式文件 | 支持ONNX推理的runtime/SDK，如TensorRT。
+
+> ONNX，全称Open Neural Network Exchange，是一种针对机器学习所设计的开放式的文件格式，用于存储训练好的模型。它使得不同的人工智能框架（如Pytorch, MXNet）可以采用相同格式存储模型数据并交互。详细了解，请参见ONNX官网<https://onnx.ai/>。
+
+> GEIR，全称Graph Engine Intermediate Representation，类似ONNX，是华为定义的针对机器学习所设计的开放式的文件格式，能更好地适配Ascend AI处理器。
+
+> ACL，全称Ascend Computer Language，提供Device管理、Context管理、Stream管理、内存管理、模型加载与执行、算子加载与执行、媒体数据处理等C++ API库，供用户开发深度神经网络应用。他匹配Ascend AI处理器，使能硬件的运行管理、资源管理能力。
+
+> TensorRT，NVIDIA 推出的高性能深度学习推理的SDK，包括深度推理优化器和runtime，提高深度学习模型在边缘设备上的推断速度。详细请参见<https://developer.nvidia.com/tensorrt>。
+
 ## Ascend 910 AI处理器上推理

-MindSpore提供了`model.eval`接口来进行模型验证，你只需传入验证数据集即可，验证数据集的处理方式与训练数据集相同。完整代码请参考<https://gitee.com/mindspore/mindspore/blob/master/example/resnet50_cifar10/eval.py>。
+### 使用checkpoint格式文件推理

-```python
-res = model.eval(dataset)
-```
+1. 使用`model.eval`接口来进行模型验证，你只需传入验证数据集即可，验证数据集的处理方式与训练数据集相同。   
+    ```python
+    res = model.eval(dataset)
+    ```
+    其中，  
+    `model.eval`为模型验证接口，对应接口说明：<https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.html#mindspore.Model.eval>。
+    > 推理样例代码：<https://gitee.com/mindspore/mindspore/blob/master/model_zoo/lenet/eval.py>。

-此外，也可以通过`model.predict`接口来进行推理操作，详细用法可参考API说明。
+2. 使用`model.predict`接口来进行推理操作。
+   ```python
+   model.predict(input_data)
+   ```
+   其中，  
+   `model.eval`为推理接口，对应接口说明：<https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.html#mindspore.Model.predict>

 ## Ascend 310 AI处理器上推理

-1. 参考[模型导出](https://www.mindspore.cn/tutorial/zh-CN/master/use/saving_and_loading_model_parameters.html#geironnx)生成ONNX或GEIR模型。
+### 使用checkpoint格式文件推理
+与在AscendAscend 910 AI处理器上推理一样。
+
+
+### 使用ONNX与GEIR格式文件推理
+
+Ascend 310 AI处理器上搭载了ACL框架，他支持om格式，而om格式需要从ONNX或者GEIR模型进行转换。所以需要在Ascend 310 AI处理器上推理，需要下述两个步骤：

-2. 云上环境请参考[Ascend910训练和Ascend310推理的样例](https://support.huaweicloud.com/bestpractice-modelarts/modelarts_10_0026.html)完成推理操作。裸机环境（对比云上环境，即本地有Ascend 310 AI 处理器）请参考Ascend 310 AI处理器配套软件包的说明文档。
+1. 在训练平台上生成ONNX或GEIR格式模型，具体步骤请参考[模型导出-导出GEIR模型和ONNX模型](https://www.mindspore.cn/tutorial/zh-CN/master/use/saving_and_loading_model_parameters.html#geironnx)。
+
+2. 将ONNX/GEIR格式模型文件，转化为om模型，并进行推理。
+   - 云上（ModelArt环境），请参考[Ascend910训练和Ascend310推理的样例](https://support.huaweicloud.com/bestpractice-modelarts/modelarts_10_0026.html)完成推理操作。
+   - 本地的裸机环境（对比云上环境，即本地有Ascend 310 AI 处理器），请参考Ascend 310 AI处理器配套软件包的说明文档。

 ## GPU上推理

-1. 参考[模型导出](https://www.mindspore.cn/tutorial/zh-CN/master/use/saving_and_loading_model_parameters.html#geironnx)生成ONNX模型。
+### 使用checkpoint格式文件推理
+
+与在Ascend 910 AI处理器上推理一样。
+
+### 使用ONNX格式文件推理
+
+1. 在训练平台上生成ONNX格式模型，具体步骤请参考[模型导出-导出GEIR模型和ONNX模型](https://www.mindspore.cn/tutorial/zh-CN/master/use/saving_and_loading_model_parameters.html#geironnx)。
+
+2. 在GPU上进行推理，具体可以参考推理使用runtime/SDK的文档。如在Nvidia GPU上进行推理，使用常用的TensorRT，可参考[TensorRT backend for ONNX](https://github.com/onnx/onnx-tensorrt)。
+
+## CPU上推理
+
+### 使用checkpoint格式文件推理
+与在AscendAscend 910 AI处理器上推理一样。
+
+### 使用ONNX格式文件推理
+与在GPU上进行推理类似，需要以下几个步骤：
+
+1. 在训练平台上生成ONNX格式模型，具体步骤请参考[模型导出-导出GEIR模型和ONNX模型](https://www.mindspore.cn/tutorial/zh-CN/master/use/saving_and_loading_model_parameters.html#geironnx)。

-2. 参考[TensorRT backend for ONNX](https://github.com/onnx/onnx-tensorrt)，在Nvidia GPU上完成推理操作。
+2. 在CPU上进行推理，具体可以参考推理使用runtime/SDK的文档。如使用ONNX Runtime，可以参考[ONNX Runtime说明文档](https://github.com/microsoft/onnxruntime)。

 ## 端侧推理