From 97254cb1a89fce671841724d4bc1f1eb903dda76 Mon Sep 17 00:00:00 2001
From: meng_chunyang <mengchunyang1@huawei.com>
Date: Fri, 28 Aug 2020 15:39:26 +0800
Subject: [PATCH] update roadmap and add English translation docs

---
 docs/source_en/roadmap.md                     | 19 ++--
 docs/source_zh_cn/roadmap.md                  | 19 ++--
 .../tutorials/source_en/use/benchmark_tool.md | 94 +++++++++++++++++++
 .../source_en/use/timeprofiler_tool.md        | 90 ++++++++++++++++++
 .../source_zh_cn/use/benchmark_tool.md        |  6 +-
 .../source_zh_cn/use/timeprofiler_tool.md     |  2 +-
 6 files changed, 210 insertions(+), 20 deletions(-)
diff --git a/docs/source_en/roadmap.md b/docs/source_en/roadmap.md
index a9525b2e..befe13ad 100644
--- a/docs/source_en/roadmap.md
+++ b/docs/source_en/roadmap.md
@@ -69,11 +69,14 @@ We sincerely hope that you can join the discussion in the user community and con
 * Protect data privacy during training and inference.
 
 ## Inference Framework
-* Support TensorFlow, Caffe, and ONNX model formats.
-* Support iOS.
-* Improve more CPU operators.
-* Support more CV/NLP models.
-* Online learning.
-* Support deployment on IoT devices.
-* Low-bit quantization.
-* CPU and NPU heterogeneous scheduling.
+* Continuous optimization for operator, and add more operator.
+* Support NLP neural networks.
+* Visualization for MindSpore lite model.
+* MindSpore Micro, which supports ARM Cortex-A and Cortex-M with Ultra-lightweight.
+* Support re-training and federated learning on mobile device.
+* Support auto-parallel.
+* MindData on mobile device, which supports image resize and pixel data transform.
+* Support post-training quantize, which supports inference with mixed precision to improve performance.
+* Support Kirin NPU, MTK APU.
+* Support inference for multi models with pipeline.
+* C++ API for model construction.
diff --git a/docs/source_zh_cn/roadmap.md b/docs/source_zh_cn/roadmap.md
index a985100c..1dfd7945 100644
--- a/docs/source_zh_cn/roadmap.md
+++ b/docs/source_zh_cn/roadmap.md
@@ -70,11 +70,14 @@
 * 保护训练和推理过程中的数据隐私
 
 ## 推理框架
-* 提供Tensorflow/Caffe/ONNX模型格式支持
-* IOS系统支持
-* 完善更多的CPU算子
-* 更多CV/NLP模型支持
-* 在线学习
-* 支持部署在IOT设备
-* 低比特量化
-* CPU和NPU异构调度
+* 算子性能与完备度的持续优化
+* 支持语音模型推理
+* 端侧模型的可视化
+* Micro方案，适用于嵌入式系统的超轻量化推理， 支持ARM Cortex-A、Cortex-M硬件
+* 支持端侧重训及联邦学习
+* 端侧自动并行特性
+* 端侧MindData，包含图片Resize、像素数据转换等功能
+* 配套MindSpore混合精度量化训练（或训练后量化），实现混合精度推理，提升推理性能
+* 支持Kirin NPU、MTK APU等AI加速硬件
+* 支持多模型推理pipeline
+* C++构图接口
diff --git a/lite/tutorials/source_en/use/benchmark_tool.md b/lite/tutorials/source_en/use/benchmark_tool.md
index e96f6d25..d7df0607 100644
--- a/lite/tutorials/source_en/use/benchmark_tool.md
+++ b/lite/tutorials/source_en/use/benchmark_tool.md
@@ -1,3 +1,97 @@
 # Benchmark Tool
 
+<!-- TOC -->
+
+- [Benchmark Tool](#benchmark-tool)
+    - [Overview](#overview)
+    - [Environment Preparation](#environment-preparation)
+    - [Parameter Description](#parameter-description)
+    - [Example](#example)
+        - [Performance Test](#performance-test)
+        - [Accuracy Test](#accuracy-test)
+
+<!-- /TOC -->
+
 <a href="https://gitee.com/mindspore/docs/blob/master/lite/tutorials/source_en/use/benchmark_tool.md" target="_blank"><img src="../_static/logo_source.png"></a>
+
+## Overview
+
+The Benchmark tool is used to perform benchmark testing on a MindSpore Lite model and is implemented using the C++ language. It can not only perform quantitative analysis (performance) on the forward inference execution duration of a MindSpore Lite model, but also perform comparative error analysis (accuracy) based on the output of the specified model.
+
+## Environment Preparation
+
+To use the Benchmark tool, you need to prepare the environment as follows:
+
+- Compilation: Install compilation dependencies and perform compilation. The code of the Benchmark tool is stored in the `mindspore/lite/tools/benchmark` directory of the MindSpore source code. For details about the compilation operations, see the [Environment Requirements] (https://www.mindspore.cn/lite/docs/en/master/deploy.html#id2) and [Compilation Example] (https://www.mindspore.cn/lite/docs/en/master/deploy.html#id5) in the deployment document.
+
+- Run: Obtain the `Benchmark` tool and configure environment variables. For details, see [Output Description] (https://www.mindspore.cn/lite/docs/zh-CN/master/deploy.html#id4) in the deployment document.
+
+## Parameter Description
+
+The command used for benchmark testing based on the compiled Benchmark tool is as follows:
+
+```bash
+./benchmark --modelPath=<MODELPATH> [--accuracyThreshold=<ACCURACYTHRESHOLD>]
+			[--calibDataPath=<CALIBDATAPATH>] [--cpuBindMode=<CPUBINDMODE>]
+			[--device=<DEVICE>] [--help] [--inDataPath=<INDATAPATH>]
+			[--inDataType=<INDATATYPE>] [--loopCount=<LOOPCOUNT>]
+			[--numThreads=<NUMTHREADS>] [--omModelPath=<OMMODELPATH>]
+			[--resizeDims=<RESIZEDIMS>] [--warmUpLoopCount=<WARMUPLOOPCOUNT>]
+			[--fp16Priority=<FP16PRIORITY>]
+```
+
+The following describes the parameters in detail.
+
+| Parameter            | Attribute | Function                                                     | Parameter Type                                                 | Default Value | Value Range |
+| ----------------- | ---- | ------------------------------------------------------------ | ------ | -------- | ---------------------------------- |
+| `--modelPath=<MODELPATH>` | Mandatory | Specifies the file path of the MindSpore Lite model for benchmark testing. | String | Null  | -        |
+| `--accuracyThreshold=<ACCURACYTHRESHOLD>` | Optional | Specifies the accuracy threshold. | Float           | 0.5    | -        |
+| `--calibDataPath=<CALIBDATAPATH>` | Optional | Specifies the file path of the benchmark data. The benchmark data, as the comparison output of the tested model, is output from the forward inference of the tested model under other deep learning frameworks using the same input. | String | Null | - |
+| `--cpuBindMode=<CPUBINDMODE>` | Optional | Specifies the type of the CPU core bound to the model inference program. | Integer | 1      | −1: medium core<br/>1: large core<br/>0: not bound |
+| `--device=<DEVICE>` | Optional | Specifies the type of the device on which the model inference program runs. | String | CPU | CPU, NPU, or GPU |
+| `--help` | Optional | Displays the help information about the `benchmark` command. | - | - | - |
+| `--inDataPath=<INDATAPATH>` | Optional | Specifies the file path of the input data of the tested model. If this parameter is not set, a random value will be used. | String | Null  | -        |
+| `--inDataType=<INDATATYPE>` | Optional | Specifies the file type of the input data of the tested model.  | String | Bin | Img: The input data is an image. Bin: The input data is a binary file.|
+| `--loopCount=<LOOPCOUNT>` | Optional | Specifies the number of forward inference times of the tested model when the Benchmark tool is used for the benchmark testing. The value is a positive integer. | Integer | 10 | - |
+| `--numThreads=<NUMTHREADS>` | Optional | Specifies the number of threads for running the model inference program. | Integer | 2 | - |
+| `--omModelPath=<OMMODELPATH>` | Optional | Specifies the file path of the OM model. This parameter is optional only when the `device` type is NPU. | String | Null  | -        |
+| `--resizeDims=<RESIZEDIMS>` | Optional | Specifies the size to be adjusted for the input data of the tested model. | String | Null  | -        |
+| `--warmUpLoopCount=<WARMUPLOOPCOUNT>` | Optional | Specifies the number of preheating inference times of the tested model before multiple rounds of the benchmark test are executed. | Integer | 3 | - |
+| `--fp16Priority=<FP16PIORITY>` | Optional | Specifies whether the float16 operator is preferred. | Bool | false | true, false |
+
+## Example
+
+When using the Benchmark tool to perform benchmark testing on different MindSpore Lite models, you can set different parameters to implement different test functions. The testing is classified into performance test and accuracy test.
+
+### Performance Test
+
+The main test indicator of the performance test performed by the Benchmark tool is the duration of a single forward inference. In a performance test, you do not need to set benchmark data parameters such as `calibDataPath`. For example:
+
+```bash
+./benchmark --modelPath=./models/face_age.ms
+```
+
+This command uses a random input, and other parameters use default values. After this command is executed, the following statistics are displayed. The statistics include the minimum duration, maximum duration, and average duration of a single inference after the tested model runs for the specified number of inference rounds.
+
+```
+Model = face_age.ms, numThreads = 2, MinRunTime = 72.228996 ms, MaxRuntime = 73.094002 ms, AvgRunTime = 72.556000 ms
+```
+
+### Accuracy Test
+
+The accuracy test performed by the Benchmark tool is to verify the accuracy of the MinSpore model output by setting benchmark data. In an accuracy test, in addition to the `modelPath` parameter, the `calibDataPath` parameter must be set. For example:
+
+```bash
+./benchmark --modelPath=./models/face_age.ms --inDataPath=./input/face_age.bin --device=NPU --accuracyThreshold=3 --calibDataPath=./output/face_age.out
+```
+
+This command specifies the input data and benchmark data of the tested model, specifies that the model inference program runs on the NPU, and sets the accuracy threshold to 3%. After this command is executed, the following statistics are displayed, including the single input data of the tested model, output result and average deviation rate of the output node, and average deviation rate of all nodes.
+
+```
+InData0: 139.947 182.373 153.705 138.945 108.032 164.703 111.585 227.402 245.734 97.7776 201.89 134.868 144.851 236.027 18.1142 22.218 5.15569 212.318 198.43 221.853
+================ Comparing Output data ================
+Data of node age_out : 5.94584e-08 6.3317e-08 1.94726e-07 1.91809e-07 8.39805e-08 7.66035e-08 1.69285e-07 1.46246e-07 6.03796e-07 1.77631e-07 1.54343e-07 2.04623e-07 8.89609e-07 3.63487e-06 4.86876e-06 1.23939e-05 3.09981e-05 3.37098e-05 0.000107102 0.000213932 0.000533579 0.00062465 0.00296401 0.00993984 0.038227 0.0695085 0.162854 0.123199 0.24272 0.135048 0.169159 0.0221256 0.013892 0.00502971 0.00134921 0.00135701 0.000383242 0.000163475 0.000136294 9.77864e-05 8.00793e-05 5.73874e-05 3.53858e-05 2.18535e-05 2.04467e-05 1.85286e-05 1.05075e-05 9.34751e-06 6.12732e-06 4.55476e-06
+Mean bias of node age_out : 0%
+Mean bias of all nodes: 0%
+=======================================================
+```
diff --git a/lite/tutorials/source_en/use/timeprofiler_tool.md b/lite/tutorials/source_en/use/timeprofiler_tool.md
index f779cfbd..00b6dc69 100644
--- a/lite/tutorials/source_en/use/timeprofiler_tool.md
+++ b/lite/tutorials/source_en/use/timeprofiler_tool.md
@@ -1,3 +1,93 @@
 # TimeProfiler Tool
 
+<!-- TOC -->
+
+- [TimeProfiler Tool](#timeprofiler-tool)
+    - [Overview](#overview)
+    - [Environment Preparation](#environment-preparation)
+    - [Parameter Description](#parameter-description)
+    - [Example](#example)
+
+<!-- /TOC -->
+
 <a href="https://gitee.com/mindspore/docs/blob/master/lite/tutorials/source_en/use/timeprofiler_tool.md" target="_blank"><img src="../_static/logo_source.png"></a>
+
+## Overview
+
+The TimeProfiler tool can be used to analyze the time consumption of forward inference at the network layer of a MindSpore Lite model. The analysis is implemented using the C++ language.
+
+## Environment Preparation
+
+To use the TimeProfiler tool, you need to prepare the environment as follows:
+
+- Compilation: Install compilation dependencies and perform compilation. The code of the TimeProfiler tool is stored in the `mindspore/lite/tools/time_profiler` directory of the MindSpore source code. For details about the compilation operations, see the [Environment Requirements] (https://www.mindspore.cn/lite/docs/en/master/deploy.html#id2) and [Compilation Example] (https://www.mindspore.cn/lite/docs/en/master/deploy.html#id5) in the deployment document.
+
+- Run: Obtain the `time_profiler` tool and configure environment variables by referring to [Output Description](https://www.mindspore.cn/lite/docs/zh-CN/master/deploy.html#id4) in the deployment document.
+
+## Parameter Description
+
+The command used for analyzing the time consumption of forward inference at the network layer based on the compiled TimeProfiler tool is as follows:
+
+```bash
+./timeprofiler --modelPath=<MODELPATH> [--help] [--loopCount=<LOOPCOUNT>] [--numThreads=<NUMTHREADS>] [--cpuBindMode=<CPUBINDMODE>] [--inDataPath=<INDATAPATH>] [--fp16Priority=<FP16PRIORITY>]
+```
+
+The following describes the parameters in detail.
+
+| Parameter            | Attribute | Function                                                     | Parameter Type | Default Value | Value Range |
+| ----------------- | ---- | ------------------------------------------------------------ | ------ | -------- | ---------------------------------- |
+| `--help` | Optional | Displays the help information about the `timeprofiler` command. | - | - | - |
+| `--modelPath=<MODELPATH> ` | Mandatory | Specifies the file path of the MindSpore Lite model for time consumption analysis. | String | Null  | -        |
+| `--loopCount=<LOOPCOUNT>` | Optional | Specifies the number of times that model inference is executed when the TimeProfiler tool is used for time consumption analysis. The value is a positive integer. | Integer | 100 | - |
+| `--numThreads=<NUMTHREADS>` | Optional | Specifies the number of threads for running the model inference program. | Integer | 4 | - |
+| `--cpuBindMode=<CPUBINDMODE>` | Optional | Specifies the type of the CPU core bound to the model inference program. | Integer | 1      | −1: medium core<br/>1: large core<br/>0: not bound |
+| `--inDataPath=<INDATAPATH>` | Optional | Specifies the file path of the input data of the specified model. If this parameter is not set, a random value will be used. | String | Null  | -        |
+| `--fp16Priority=<FP16PRIORITY>` | Optional | Specifies whether the float16 operator is preferred. | Bool | false | true, false |
+
+## Example
+
+Take the `tcpclassify.ms` model as an example and set the number of model inference cycles to 10. The command for using TimeProfiler to analyze the time consumption at the network layer is as follows:
+
+```bash
+./timeprofiler --modelPath=./models/tcpclassify.ms --loopCount=10
+```
+
+After this command is executed, the TimeProfiler tool outputs the statistics on the running time of the model at the network layer. In this example, the command output is as follows: The statistics are displayed by`opName` and `optype`. `opName` indicates the operator name, `optype` indicates the operator type, and `avg` indicates the average running time of the operator per single run, `percent` indicates the ratio of the operator running time to the total operator running time, `calledTimess` indicates the number of times that the operator is run, and `opTotalTime` indicates the total time that the operator is run for a specified number of times. Finally, `total time` and `kernel cost` show the average time consumed by a single inference operation of the model and the sum of the average time consumed by all operators in the model inference, respectively.
+
+```
+-----------------------------------------------------------------------------------------
+opName                                                          avg(ms)         percent         calledTimess    opTotalTime
+conv2d_1/convolution                                            2.264800        0.824012        10              22.648003
+conv2d_2/convolution                                            0.223700        0.081390        10              2.237000
+dense_1/BiasAdd                                                 0.007500        0.002729        10              0.075000
+dense_1/MatMul                                                  0.126000        0.045843        10              1.260000
+dense_1/Relu                                                    0.006900        0.002510        10              0.069000
+max_pooling2d_1/MaxPool                                         0.035100        0.012771        10              0.351000
+max_pooling2d_2/MaxPool                                         0.014300        0.005203        10              0.143000
+max_pooling2d_2/MaxPool_nchw2nhwc_reshape_1/Reshape_0           0.006500        0.002365        10              0.065000
+max_pooling2d_2/MaxPool_nchw2nhwc_reshape_1/Shape_0             0.010900        0.003966        10              0.109000
+output/BiasAdd                                                  0.005300        0.001928        10              0.053000
+output/MatMul                                                   0.011400        0.004148        10              0.114000
+output/Softmax                                                  0.013300        0.004839        10              0.133000
+reshape_1/Reshape                                               0.000900        0.000327        10              0.009000
+reshape_1/Reshape/shape                                         0.009900        0.003602        10              0.099000
+reshape_1/Shape                                                 0.002300        0.000837        10              0.023000
+reshape_1/strided_slice                                         0.009700        0.003529        10              0.097000
+-----------------------------------------------------------------------------------------
+opType          avg(ms)         percent         calledTimess    opTotalTime
+Activation      0.006900        0.002510        10              0.069000
+BiasAdd         0.012800        0.004657        20              0.128000
+Conv2D          2.488500        0.905401        20              24.885004
+MatMul          0.137400        0.049991        20              1.374000
+Nchw2Nhwc       0.017400        0.006331        20              0.174000
+Pooling         0.049400        0.017973        20              0.494000
+Reshape         0.000900        0.000327        10              0.009000
+Shape           0.002300        0.000837        10              0.023000
+SoftMax         0.013300        0.004839        10              0.133000
+Stack           0.009900        0.003602        10              0.099000
+StridedSlice    0.009700        0.003529        10              0.097000
+
+total time :     2.90800 ms,    kernel cost : 2.74851 ms
+
+-----------------------------------------------------------------------------------------
+```
\ No newline at end of file
diff --git a/lite/tutorials/source_zh_cn/use/benchmark_tool.md b/lite/tutorials/source_zh_cn/use/benchmark_tool.md
index 7c3df630..20c32881 100644
--- a/lite/tutorials/source_zh_cn/use/benchmark_tool.md
+++ b/lite/tutorials/source_zh_cn/use/benchmark_tool.md
@@ -68,13 +68,13 @@ Benchmark工具是一款可以对MindSpore Lite模型进行基准测试的工具
 Benchmark工具进行的性能测试主要的测试指标为模型单次前向推理的耗时。在性能测试任务中，不需要设置`calibDataPath`等标杆数据参数。例如：
 
 ```bash
-./benchmark --modelPath=./models/face/age/ml_face_age.ms
+./benchmark --modelPath=./models/face_age.ms
 ```
 
 这条命令使用随机输入，其他参数使用默认值。该命令执行后会输出如下统计信息，该信息显示了测试模型在运行指定推理轮数后所统计出的单次推理最短耗时、单次推理最长耗时和平均推理耗时。
 
 ```
-Model = ml_face_age.ms, numThreads = 2, MinRunTime = 72.228996 ms, MaxRuntime = 73.094002 ms, AvgRunTime = 72.556000 ms
+Model = face_age.ms, numThreads = 2, MinRunTime = 72.228996 ms, MaxRuntime = 73.094002 ms, AvgRunTime = 72.556000 ms
 ```
 
 ### 精度测试
@@ -82,7 +82,7 @@ Model = ml_face_age.ms, numThreads = 2, MinRunTime = 72.228996 ms, MaxRuntime =
 Benchmark工具进行的精度测试主要是通过设置标杆数据来对比验证MindSpore Lite模型输出的精确性。在精确度测试任务中，除了需要设置`modelPath`参数以外，还必须设置`calibDataPath`参数。例如：
 
 ```bash
-./benchmark --modelPath=./models/face/age/ml_face_age.ms --inDataPath=./data/input/ml_face_age.ms.bin --device=NPU --accuracyThreshold=3 --calibDataPath=./data/output/face/ml_face_age.ms.out
+./benchmark --modelPath=./models/face_age.ms --inDataPath=./input/face_age.bin --device=NPU --accuracyThreshold=3 --calibDataPath=./output/face_age.out
 ```
 
 这条命令指定了测试模型的输入数据、标杆数据，同时指定了模型推理程序在NPU上运行，并指定了准确度阈值为3%。该命令执行后会输出如下统计信息，该信息显示了测试模型的单条输入数据、输出节点的输出结果和平均偏差率以及所有节点的平均偏差率。
diff --git a/lite/tutorials/source_zh_cn/use/timeprofiler_tool.md b/lite/tutorials/source_zh_cn/use/timeprofiler_tool.md
index 840d20f7..55baa3ae 100644
--- a/lite/tutorials/source_zh_cn/use/timeprofiler_tool.md
+++ b/lite/tutorials/source_zh_cn/use/timeprofiler_tool.md
@@ -49,7 +49,7 @@ TimeProfiler工具可以对MindSpore Lite模型网络层的前向推理进行耗
 使用TimeProfiler对`tcpclassify.ms`模型的网络层进行耗时分析，并且设置模型推理循环运行次数为10，则其命令代码如下：
 
 ```bash
-./timeprofiler --modelPath=./models/siteAI/tcpclassify.ms --loopCount=10
+./timeprofiler --modelPath=./models/tcpclassify.ms --loopCount=10
 ```
 
 该条命令执行后，TimeProfiler工具会输出模型网络层运行耗时的相关统计信息。对于本例命令，输出的统计信息如下。其中统计信息按照`opName`和`optype`两种划分方式分别显示，`opName`表示算子名，`optype`表示算子类别，`avg`表示该算子的平均单次运行时间，`percent`表示该算子运行耗时占所有算子运行总耗时的比例，`calledTimess`表示该算子的运行次数，`opTotalTime`表示该算子运行指定次数的总耗时。最后，`total time`和`kernel cost`分别显示了该模型单次推理的平均耗时和模型推理中所有算子的平均耗时之和。
-- 
GitLab