提交 39a07419 编写于 作者: T Ting Wang

update tutorials of mindinsight

Signed-off-by: NTing Wang <kathy.wangting@huawei.com>
上级 5d3b9ce5
# Training Process Visualization
# Dashboard and Lineage
<!-- TOC -->
- [Training Process Visualization](#training-process-visualization)
- [Dashboard and Lineage](#dashboard-and-lineage)
- [Overview](#overview)
- [Operation Process](#operation-process)
- [Preparing the Training Script](#preparing-the-training-script)
- [MindInsight Commands](#mindinsight-commands)
- [Collect Summary Data](#collect-summary-data)
- [Visualization Components](#visualization-components)
- [Training Dashboard](#training-dashboard)
- [Scalar Visualization](#scalar-visualization)
......@@ -17,7 +17,7 @@
- [Model Lineage](#model-lineage)
- [Dataset Lineage](#dataset-lineage)
- [Scalars Comparision](#scalars-comparision)
- [Specifications](#specifications)
- [Specifications](#specifications)
<!-- /TOC -->
......@@ -227,71 +227,6 @@ In the saved files, `ms_output_after_hwopt.pb` is the computational graph after
> - Currently MindSpore supports recording computational graph after operator fusion for Ascend 910 AI processor only.
> - When using the Summary operator to collect data in training, 'HistogramSummary' operator affects performance, so please use as little as possible.
## MindInsight Commands
### View the command help information.
```bash
mindinsight --help
```
### View the version information.
```bash
mindinsight --version
```
### Start the service.
```bash
mindinsight start [-h] [--config <CONFIG>] [--workspace <WORKSPACE>]
[--port <PORT>] [--reload-interval <RELOAD_INTERVAL>]
[--summary-base-dir <SUMMARY_BASE_DIR>]
```
Optional parameters as follows:
- `-h, --help` : Displays the help information about the startup command.
- `--config <CONFIG>` : Specifies the configuration file or module. CONFIG indicates the physical file path (file:/path/to/config.py), or a module path (python:path.to.config.module) that can be identified by Python.
- `--workspace <WORKSPACE>` : Specifies the working directory. The default value of WORKSPACE is $HOME/mindinsight.
- `--port <PORT>` : Specifies the port number of the web visualization service. The value ranges from 1 to 65535. The default value of PORT is 8080.
- `--url-path-prefix <URL_PATH_PREFIX>` : Specifies the path prefix of the web visualization service. The default value of URL_PATH_PREFIX is empty string.
- `--reload-interval <RELOAD_INTERVAL>` : Specifies the interval (unit: second) for loading data. The value 0 indicates that data is loaded only once. The default value of RELOAD_INTERVAL is 3 seconds.
- `--summary-base-dir <SUMMARY_BASE_DIR>` : Specifies the root directory for loading training log data. MindInsight traverses the direct subdirectories in this directory and searches for log files. If a direct subdirectory contains log files, it is identified as the log file directory. If a root directory contains log files, it is identified as the log file directory. SUMMARY_BASE_DIR is the current directory path by default.
> When the service is started, the parameter values of the command line are saved as the environment variables of the process and start with `MINDINSIGHT_`, for example, `MINDINSIGHT_CONFIG`, `MINDINSIGHT_WORKSPACE`, and `MINDINSIGHT_PORT`.
### Stop the service.
```bash
mindinsight stop [-h] [--port PORT]
```
Optional parameters as follows:
- `-h, --help` : Displays the help information about the stop command.
- `--port <PORT>` : Specifies the port number of the web visualization service. The value ranges from 1 to 65535. The default value of PORT is 8080.
### View the service process information.
MindInsight provides user with web services. Run the following command to view the running web service process:
```bash
ps -ef | grep mindinsight
```
Run the following command to access the working directory `WORKSPACE` corresponding to the service process based on the service process ID:
```bash
lsof -p <PID> | grep access
```
Output with the working directory `WORKSPACE` as follows:
```bash
gunicorn <PID> <USER> <FD> <TYPE> <DEVICE> <SIZE/OFF> <NODE> <WORKSPACE>/log/gunicorn/access.log
```
## Visualization Components
### Training Dashboard
......
......@@ -7,12 +7,12 @@
- [Operation Process](#operation-process)
- [Preparing the Training Script](#preparing-the-training-script)
- [Launch MindInsight](#launch-mindinsight)
- [Performance Analysis](#performance-analysis)
- [Step Trace Analysis](#step-trace_analysis)
- [Operator Performance Analysis](#operator-performance-analysis)
- [MindData Performance Analysis](#minddata-performance-analysis)
- [Timeline Analysis](#timeline-analysis)
- [Specifications](#specifications)
- [Performance Analysis](#performance-analysis)
- [Step Trace Analysis](#step-trace-analysis)
- [Operator Performance Analysis](#operator-performance-analysis)
- [MindData Performance Analysis](#minddata-performance-analysis)
- [Timeline Analysis](#timeline-analysis)
- [Specifications](#specifications)
<!-- /TOC -->
......@@ -24,14 +24,12 @@ Performance data like operators' execution time are recorded in files and can be
## Operation Process
- Prepare a training script, add profiler apis in the training script, and run the training script.
- Start MindInsight and specify the profile data directory using startup parameters. After MindInsight is started, access the visualization page based on the IP address and port number. The default access IP address is `http://127.0.0.1:8080`.
- Start MindInsight and specify the profiler data directory using startup parameters. After MindInsight is started, access the visualization page based on the IP address and port number. The default access IP address is `http://127.0.0.1:8080`.
- Find the training in the list, click the performance profiling link, and view the data on the web page.
## Preparing the Training Script
To enable the performance profiling of neural networks, MindInsight Profiler APIs should be added into the script. At first, the MindInsight `Profiler` object need
to be set after set context and before the network initialization. Then, at the end of the training, `Profiler.analyse()` should be called to finish profiling and generate the perforamnce
analyse results.
To enable the performance profiling of neural networks, MindInsight Profiler APIs should be added into the script. At first, the MindInsight `Profiler` object need to be set after set context and before the network initialization. Then, at the end of the training, `Profiler.analyse()` should be called to finish profiling and generate the perforamnce analyse results.
The sample code is as follows:
......@@ -75,38 +73,31 @@ Users can access the Performance Profiler by selecting a specific training from
![performance_overall.png](./images/performance_overall.png)
Figure 1: Overall Performance
Figure 1 displays the overall performance of the training, including the overall data of Step Trace, Operator Performance, MindData Performance and Timeline. The data shown in these components include:
Figure 1:Overall Performance
Figure 1 displays the overall performance of the training, including the overall data of Step Trace, Operator Performance, MindData Performance and Timeline. The data shown in these components include:
- Step Trace: It will divide the training step into several stages and collect execution time for each stage. The overall performance page will show the step trace graph.
- Operator Performance: It will collect the execution time of operators and operator types. The overall performance page will show the pie graph for different operator types.
- MindData Performance: It will analyse the performance of the data input stages. The overall performance page will show the number of steps that may be the bottleneck for these stages.
- Timeline: It will collect execution time for stream tasks on the devices. The tasks will be shown on the time axis. The overall performance page will show the statistics for streams and tasks.
Users can click the detail link to see the details of each components. Besides, MindInsight Profiler will try to analyse the performance data, the assistant on the left
will show performance tuning suggestions for this training.
Users can click the detail link to see the details of each components. Besides, MindInsight Profiler will try to analyse the performance data, the assistant on the left will show performance tuning suggestions for this training.
#### Step Trace Analysis
The Step Trace Component is used to show the general performance of the stages in the training. Step Trace will divide the training into several stages:
Step Gap (The time between the end of one step and the computation of next step)、Forward/Backward Propagation、 All Reduce and Parameter Update. It will show the execution time for each stage, and help to find the bottleneck
stage quickly.
The Step Trace Component is used to show the general performance of the stages in the training. Step Trace will divide the training into several stages:
Step Gap (The time between the end of one step and the computation of next step)、Forward/Backward Propagation、 All Reduce and Parameter Update. It will show the execution time for each stage, and help to find the bottleneck stage quickly.
![step_trace.png](./images/step_trace.png)
Figure 2: Step Trace Analysis
Figure 2 displays the Step Trace page. The Step Trace detail will show the start/finish time for each stage. By default, it shows the average time for all the steps. Users
can also choose a specific step to see its step trace statistics. The graphs at the bottom of the page show how the execution time of Step Gap、Forward/Backward Propagation and
Step Tail (The time between the end of Backward Propagation and the end of Parameter Update) changes according to different steps, it will help to decide whether we can optimize the performance of some stages.
Figure 2:Step Trace Analysis
In order to divide the stages, the Step Trace Component need to figure out the forward propagation start operator and the backward propagation end operator. MindSpore will automatically figure out the two operators to reduce
the profiler configuration work. The first operator after get_next will be selected as the forward start operator and the operator before the last all reduce will be selected as the backward end operator.
**However, Profiler do not guarantee that the automatically selected operators will meet the user's expectation in all cases.** Users can set the two operators manually as follows:
Figure 2 displays the Step Trace page. The Step Trace detail will show the start/finish time for each stage. By default, it shows the average time for all the steps. Users can also choose a specific step to see its step trace statistics. The graphs at the bottom of the page show how the execution time of Step Gap, Forward/Backward Propagation and Step Tail (The time between the end of Backward Propagation and the end of Parameter Update) changes according to different steps, it will help to decide whether we can optimize the performance of some stages.
- Set environment variable ```FP_POINT``` to configure the forward start operator, for example, ```export FP_POINT=fp32_vars/conv2d/BatchNorm```
- Set environment variable ```BP_POINT``` to configure the backward end operator, for example, ```export BP_POINT=loss_scale/gradients/AddN_70```
In order to divide the stages, the Step Trace Component need to figure out the forward propagation start operator and the backward propagation end operator. MindSpore will automatically figure out the two operators to reduce the profiler configuration work. The first operator after get_next will be selected as the forward start operator and the operator before the last all reduce will be selected as the backward end operator.
**However, Profiler do not guarantee that the automatically selected operators will meet the user's expectation in all cases.** Users can set the two operators manually as follows:
- Set environment variable `FP_POINT` to configure the forward start operator, for example, `export FP_POINT=fp32_vars/conv2d/BatchNorm`.
- Set environment variable `BP_POINT` to configure the backward end operator, for example, `export BP_POINT=loss_scale/gradients/AddN_70`.
#### Operator Performance Analysis
......@@ -117,8 +108,7 @@ The operator performance analysis component is used to display the execution tim
Figure 3: Statistics for Operator Types
Figure 3 displays the statistics for the operator types, including:
Figure 3 displays the statistics for the operator types, including:
- Choose pie or bar graph to show the proportion time occupied by each operator type. The time of one operator type is calculated by accumulating the execution time of operators belong to this type.
- Display top 20 operator types with longest execution time, show the proportion and execution time (ms) of each operator type.
......@@ -126,25 +116,23 @@ Figure 3 displays the statistics for the operator types, including:
Figure 4: Statistics for Operators
Figure 4 displays the statistics table for the operators, including:
Figure 4 displays the statistics table for the operators, including:
- Choose All: Display statistics for the operators, including operator name, type, execution time, full scope time, information etc. The table will be sorted by execution time by default.
- Choose Type: Display statistics for the operator types, including operator type name, execution time, execution frequency and proportion of total time. Users can click on each line, querying for all the operators belong to this type.
- Search: There is a search box on the right, which can support fuzzy search for operators/operator types.
#### MindData Performance Analysis
The MindData performance analysis component is used to analyse the execution of data input pipeline for the training. The data input pipeline can be divided into three stages:
The MindData performance analysis component is used to analyse the execution of data input pipeline for the training. The data input pipeline can be divided into three stages:
the data process pipeline, data transfer from host to device and data fetch on device. The component will analyse the performance of each stage for detail and display the results.
![minddata_profile.png](./images/minddata_profile.png)
Figure 5: MindData Performance Analysis
Figure 5:MindData Performance Analysis
Figure 5 displays the page of MindData performance analysis component. It consists of two tabs: The step gap and the data process.
The step gap page is used to analyse whether there is performance bottleneck in the three stages. We can get our conclusion from the data queue graphs:
The step gap page is used to analyse whether there is performance bottleneck in the three stages. We can get our conclusion from the data queue graphs:
- The data queue size stands for the queue length when the training fetches data from the queue on the device. If the data queue size is 0, the training will wait until there is data in
the queue; If the data queue size is above 0, the training can get data very quickly, and it means MindData is not the bottleneck for this training step.
- The host queue size can be used to infer the speed of data process and data transfer. If the host queue size is 0, it means we need to speed up the data process stage.
......@@ -152,50 +140,41 @@ the queue; If the data queue size is above 0, the training can get data very qui
![data_op_profile.png](./images/data_op_profile.png)
Figure 6: Data Process Pipeline Analysis
Figure 6:Data Process Pipeline Analysis
Figure 6 displays the page of data process pipeline analysis. The data queues are used to exchange data between the MindData operators. The data size of the queues reflect the
data consume speed of the operators, and can be used to infer the bottleneck operator. The queue usage percentage stands for the average value of data size in queue divide data queue maximum size, the higher
the usage percentage, the more data that is accumulated in the queue. The graph at the bottom of the page shows the MindData pipeline operators with the data queues, the user can click one queue to see how
the data size changes according to the time, and the operators connected to the queue. The data process pipeline can be analysed as follows:
- When the input queue usage percentage of one operator is high, and the output queue usage percentage is low, the operator may be the bottleneck;
- For the leftmost operator, if the usage percentage of the queues on the right are all low, the operator may be the bottleneck;
Figure 6 displays the page of data process pipeline analysis. The data queues are used to exchange data between the MindData operators. The data size of the queues reflect the data consume speed of the operators, and can be used to infer the bottleneck operator. The queue usage percentage stands for the average value of data size in queue divide data queue maximum size, the higher the usage percentage, the more data that is accumulated in the queue. The graph at the bottom of the page shows the MindData pipeline operators with the data queues, the user can click one queue to see how the data size changes according to the time, and the operators connected to the queue. The data process pipeline can be analysed as follows:
- When the input queue usage percentage of one operator is high, and the output queue usage percentage is low, the operator may be the bottleneck.
- For the leftmost operator, if the usage percentage of the queues on the right are all low, the operator may be the bottleneck.
- For the rightmost operator, if the usage percentage of the queues on th left are all high, the operator may be the bottleneck.
To optimize the perforamnce of MindData operators, there are some suggestions:
- If the `Dataset` Operator is the bottleneck, try to increase the `num_parallel_workers`;
- If a `GeneratorOp` type operator is the bottleneck, try to increase the `num_parallel_workers` and replace the operator to `MindRecordDataset`;
- If a `MapOp` type operator is the bottleneck, try to increase the `num_parallel_workers`; If it is a python operator, try to optimize the training script;
- If a `BatchOp` type operator is the bottleneck, try to adjust the size of `prefetch_size`.
To optimize the perforamnce of MindData operators, there are some suggestions:
- If the Dataset Operator is the bottleneck, try to increase the `num_parallel_workers`.
- If a GeneratorOp type operator is the bottleneck, try to increase the `num_parallel_workers` and replace the operator to `MindRecordDataset`.
- If a MapOp type operator is the bottleneck, try to increase the `num_parallel_workers`. If it is a python oerator, try to optimize the training script.
- If a BatchOp type operator is the bottleneck, try to adjust the size of `prefetch_size`.
#### Timeline Analysis
The Timeline component can display:
- The operators (AICore/AICPU operators) are executed on which device;
- The MindSpore stream split strategy for this neural network;
The Timeline component can display:
- The operators (AICore/AICPU operators) are executed on which device.
- The MindSpore stream split strategy for this neural network.
- The time of tasks executed on the device.
Users can get the most detailed information from the Timeline:
- From high level, users can analyse whether the stream split strategy can be optimized and whether is step tail is too long;
Users can get the most detailed information from the Timeline:
- From high level, users can analyse whether the stream split strategy can be optimized and whether is step tail is too long.
- From low level, users can analyse the execution time for all the operators, etc.
![timeline.png](./images/timeline.png)
Figure 7 Timeline Analysis
The Timeline consists of the following parts:
Figure 7: Timeline Analysis
- **Device and Stream List**: It will show the stream list on each device. Each stream consists of a series of tasks. One rectangle stands for one task, and the area stands for the execution time of the task;
- **The Operator Information**: When we click one task, the corresponding operator of this task will be shown at the bottom.
The Timeline consists of the following parts:
- Device and Stream List: It will show the stream list on each device. Each stream consists of a series of tasks. One rectangle stands for one task, and the area stands for the execution time of the task.
- The Operator Information: When we click one task, the corresponding operator of this task will be shown at the bottom.
W/A/S/D can be applied to zoom in and out of the Timeline graph.
##Specifications
## Specifications
- To limit the data size generated by the Profiler, MindInsight suggests that for large neural network, the profiled steps should better below 10.
- The parse of Timeline data is time consuming, and several step's data is usually enough for analysis. In order to speed up the data parse and UI
display, Profiler will show at most 20M data (Contain 10+ step information for large networks).
\ No newline at end of file
- The parse of Timeline data is time consuming, and several step's data is usually enough for analysis. In order to speed up the data parse and UI display, Profiler will show at most 20M data (Contain 10+ step information for large networks).
Training Process Visualization
==============================
.. toctree::
:maxdepth: 1
dashboard_and_lineage
performance_profiling
MindInsight Commands
--------------------
1. View the command help information.
.. code-block::
mindinsight --help
2. View the version information.
.. code-block::
mindinsight --version
3. Start the service.
.. code-block::
mindinsight start [-h] [--config <CONFIG>] [--workspace <WORKSPACE>]
[--port <PORT>] [--reload-interval <RELOAD_INTERVAL>]
[--summary-base-dir <SUMMARY_BASE_DIR>]
Optional parameters as follows:
- `-h, --help` : Displays the help information about the startup command.
- `--config <CONFIG>` : Specifies the configuration file or module. CONFIG indicates the physical file path (file:/path/to/config.py), or a module path (python:path.to.config.module) that can be identified by Python.
- `--workspace <WORKSPACE>` : Specifies the working directory. The default value of WORKSPACE is $HOME/mindinsight.
- `--port <PORT>` : Specifies the port number of the web visualization service. The value ranges from 1 to 65535. The default value of PORT is 8080.
- `--url-path-prefix <URL_PATH_PREFIX>` : Specifies the path prefix of the web visualization service. The default value of URL_PATH_PREFIX is empty string.
- `--reload-interval <RELOAD_INTERVAL>` : Specifies the interval (unit: second) for loading data. The value 0 indicates that data is loaded only once. The default value of RELOAD_INTERVAL is 3 seconds.
- `--summary-base-dir <SUMMARY_BASE_DIR>` : Specifies the root directory for loading training log data. MindInsight traverses the direct subdirectories in this directory and searches for log files. If a direct subdirectory contains log files, it is identified as the log file directory. If a root directory contains log files, it is identified as the log file directory. SUMMARY_BASE_DIR is the current directory path by default.
.. note::
When the service is started, the parameter values of the command line are saved as the environment variables of the process and start with `MINDINSIGHT_`, for example, `MINDINSIGHT_CONFIG`, `MINDINSIGHT_WORKSPACE`, and `MINDINSIGHT_PORT`.
4. View the service process information.
MindInsight provides user with web services. Run the following command to view the running web service process:
.. code-block::
ps -ef | grep mindinsight
Run the following command to access the working directory `WORKSPACE` corresponding to the service process based on the service process ID:
.. code-block::
lsof -p <PID> | grep access
Output with the working directory `WORKSPACE` as follows:
.. code-block::
gunicorn <PID> <USER> <FD> <TYPE> <DEVICE> <SIZE/OFF> <NODE> <WORKSPACE>/log/gunicorn/access.log
5. Stop the service.
.. code-block::
mindinsight stop [-h] [--port PORT]
Optional parameters as follows:
- `-h, --help` : Displays the help information about the stop command.
- `--port <PORT>` : Specifies the port number of the web visualization service. The value ranges from 1 to 65535. The default value of PORT is 8080.
# 训练过程可视
# 训练看板和溯源
<!-- TOC -->
- [训练过程可视](#训练过程可视)
- [训练看板和溯源](#训练看板和溯源)
- [概述](#概述)
- [操作流程](#操作流程)
- [准备训练脚本](#准备训练脚本)
- [MindInsight相关命令](#mindinsight相关命令)
- [查看命令帮助信息](#查看命令帮助信息)
- [查看版本信息](#查看版本信息)
- [启动服务](#启动服务)
- [停止服务](#停止服务)
- [查看服务进程信息](#查看服务进程信息)
- [Summary数据收集](#summary数据收集)
- [可视化组件](#可视化组件)
- [训练看板](#训练看板)
- [标量可视化](#标量可视化)
......@@ -233,71 +228,6 @@ model.train(cnn_network, callbacks=[confusion_martrix])
> - 目前MindSpore仅支持在Ascend 910 AI处理器上导出算子融合后的计算图。
> - 在训练中使用Summary算子收集数据时,`HistogramSummary`算子会影响性能,所以请尽量少地使用。
## MindInsight相关命令
### 查看命令帮助信息
```bash
mindinsight --help
```
### 查看版本信息
```bash
mindinsight --version
```
### 启动服务
```bash
mindinsight start [-h] [--config <CONFIG>] [--workspace <WORKSPACE>]
[--port <PORT>] [--reload-interval <RELOAD_INTERVAL>]
[--summary-base-dir <SUMMARY_BASE_DIR>]
```
参数含义如下:
- `-h, --help` : 显示启动命令的帮助信息。
- `--config <CONFIG>` : 指定配置文件或配置模块,CONFIG为物理文件路径(file:/path/to/config.py)或Python可识别的模块路径(python:path.to.config.module)。
- `--workspace <WORKSPACE>` : 指定工作目录路径,WORKSPACE默认为 $HOME/mindinsight。
- `--port <PORT>` : 指定Web可视化服务端口,取值范围是1~65535,PORT默认为8080。
- `--url-path-prefix <URL_PATH_PREFIX>` : 指定Web服务地址前缀,URL_PATH_PREFIX默认为空。
- `--reload-interval <RELOAD_INTERVAL>` : 指定加载数据的时间间隔(单位:秒),设置为0时表示只加载一次数据,RELOAD_INTERVAL默认为3秒。
- `--summary-base-dir <SUMMARY_BASE_DIR>` : 指定加载训练日志数据的根目录路径,MindInsight将遍历此路径下的直属子目录。若某个直属子目录包含日志文件,则该子目录被识别为日志文件目录,若根目录包含日志文件,则根目录被识别为日志文件目录。SUMMARY_BASE_DIR默认为当前目录路径。
> 服务启动时,命令行参数值将被保存为进程的环境变量,并以 `MINDINSIGHT_` 开头作为标识,如 `MINDINSIGHT_CONFIG`,`MINDINSIGHT_WORKSPACE`,`MINDINSIGHT_PORT` 等。
### 停止服务
```bash
mindinsight stop [-h] [--port PORT]
```
参数含义如下:
- `-h, --help` : 显示停止命令的帮助信息。
- `--port <PORT>` : 指定Web可视化服务端口,取值范围是1~65535,PORT默认为8080。
### 查看服务进程信息
MindInsight向用户提供Web服务,可通过以下命令,查看当前运行的Web服务进程。
```bash
ps -ef | grep mindinsight
```
根据服务进程PID,可通过以下命令,查看当前服务进程对应的工作目录`WORKSPACE`
```bash
lsof -p <PID> | grep access
```
输出如下,可查看`WORKSPACE`
```bash
gunicorn <PID> <USER> <FD> <TYPE> <DEVICE> <SIZE/OFF> <NODE> <WORKSPACE>/log/gunicorn/access.log
```
## 可视化组件
### 训练看板
......
......@@ -7,12 +7,12 @@
- [概述](#概述)
- [操作流程](#操作流程)
- [准备训练脚本](#准备训练脚本)
- [启动MindInsight](#启动MindInsight)
- [性能分析](#性能分析)
- [迭代轨迹分析](#迭代轨迹分析)
- [算子性能分析](#算子性能分析)
- [MindData性能分析](#MindData性能分析)
- [timeline分析](#timeline分析)
- [启动MindInsight](#启动mindinsight)
- [性能分析](#性能分析)
- [迭代轨迹分析](#迭代轨迹分析)
- [算子性能分析](#算子性能分析)
- [MindData性能分析](#minddata性能分析)
- [Timeline分析](#timeline分析)
- [规格](#规格)
<!-- /TOC -->
......@@ -25,13 +25,14 @@
## 操作流程
- 准备训练脚本,并在训练脚本中调用性能调试接口,接着运行训练脚本。
- 启动MindInsight,并通过启动参数指定profile文件目录,启动成功后,根据IP和端口访问可视化界面,默认访问地址为 `http://127.0.0.1:8080`
- 启动MindInsight,并通过启动参数指定Profiler文件目录,启动成功后,根据IP和端口访问可视化界面,默认访问地址为 `http://127.0.0.1:8080`
- 在训练列表找到对应训练,点击性能分析,即可在页面中查看训练性能数据。
## 准备训练脚本
为了收集神经网络的性能数据,需要在训练脚本中添加MindInsight Profiler接口。首先,在set context之后和初始化网络之前,需要初始化MindInsight `Profiler`对象;
然后在训练结束后,调用`Profiler.analyse()`停止性能数据收集并生成性能分析结果。
为了收集神经网络的性能数据,需要在训练脚本中添加MindInsight Profiler相关接口。
-`set_context`之后和初始化网络之前,需要初始化MindInsight `Profiler`对象。
- 在训练结束后,调用`Profiler.analyse()`停止性能数据收集并生成性能分析结果。
样例代码如下:
......@@ -65,8 +66,7 @@ def test_profiler():
## 启动MindInsight
启动命令请参考[训练过程可视](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/visualization_tutorials.html)
**MindInsight相关命令**小节。
启动命令请参考[训练过程可视](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/visualization_tutorials.html)**MindInsight相关命令**小节。
### 性能分析
......@@ -75,35 +75,31 @@ def test_profiler():
![performance_overall.png](./images/performance_overall.png)
图1: 性能数据总览
图1:性能数据总览
图1展示了性能数据总览页面,包含了迭代轨迹(Step Trace)、算子性能、MindData性能和Timeline等组件的数据总体呈现。各组件展示的数据如下:
- 迭代轨迹:将训练Step划分为几个阶段,统计每个阶段的耗时,按时间线进行展示;总览页展示了迭代轨迹图;
- 算子性能:统计单算子以及各算子类型的执行时间,进行排序展示;总览页中展示了各算子类型时间占比的饼状图;
- MindData性能:统计训练数据准备各阶段的性能情况;总览页中展示了各阶段性能可能存在瓶颈的step数目;
图1展示了性能数据总览页面,包含了迭代轨迹(Step Trace)、算子性能、MindData性能和Timeline等组件的数据总体呈现。各组件展示的数据如下:
- 迭代轨迹:将训练step划分为几个阶段,统计每个阶段的耗时,按时间线进行展示;总览页展示了迭代轨迹图。
- 算子性能:统计单算子以及各算子类型的执行时间,进行排序展示;总览页中展示了各算子类型时间占比的饼状图。
- MindData性能:统计训练数据准备各阶段的性能情况;总览页中展示了各阶段性能可能存在瓶颈的step数目。
- Timeline:按设备统计每个stream中task的耗时情况,在时间轴排列展示;总览页展示了Timeline中stream和task的汇总情况。
用户可以点击查看详情链接,进入某个组件页面进行详细分析。MindInsight也会对性能数据进行分析,在左侧的智能小助手中给出性能调试的建议。
#### 迭代轨迹分析
使用迭代轨迹分析组件可以快速了解训练各阶段在总时长中的占比情况。迭代轨迹将训练的一个step划分为迭代间隙 (两次step执行的间隔时间)、前向与反向执行、all reduce、参数更新等几个阶段,
并显示出每个阶段的时长,帮助用户定界出性能瓶颈所在的执行阶段。
使用迭代轨迹分析组件可以快速了解训练各阶段在总时长中的占比情况。迭代轨迹将训练的一个step划分为迭代间隙 (两次step执行的间隔时间)、前向与反向执行、all reduce、参数更新等几个阶段,并显示出每个阶段的时长,帮助用户定界出性能瓶颈所在的执行阶段。
![step_trace.png](./images/step_trace.png)
图2: 迭代轨迹分析
图2:迭代轨迹分析
图2展示了迭代轨迹分析页面。在迭代轨迹详情中,会展示各阶段在训练step中的起止时间,默认显示的是各step的平均值,用户也可以在下拉菜单选择某个step查看该step的迭代轨迹情况。
图2展示了迭代轨迹分析页面。在迭代轨迹详情中,会展示各阶段在训练step中的起止时间,默认显示的是各step的平均值,用户也可以在下拉菜单选择某个step查看该step的迭代轨迹情况。
在页面下方显示了迭代间隙、前后向计算、迭代拖尾时间(前后向计算结束到参数更新完成的时间)随着step的变化曲线等,用户可以据此判断某个阶段是否存在性能优化空间。
迭代轨迹在做阶段划分时,需要识别前向计算开始的算子和反向计算结束的算子。为了降低用户使用Profiler的门槛,MindSpore会对这两个算子做自动识别,方法为:
前向计算开始的算子指定为get_next算子之后连接的第一个算子,反向计算结束的算子指定为最后一次all reduce之前连接的算子。**Profiler不保证在所有情况下自动识别的结果和用户的预期一致,
用户可以根据网络的特点自行调整**,调整方法如下:
- 设置```FP_POINT```环境变量指定前向计算开始的算子,如```export FP_POINT=fp32_vars/conv2d/BatchNorm```
- 设置```BP_POINT```环境变量指定反向计算结束的算子,如```export BP_POINT=loss_scale/gradients/AddN_70```
迭代轨迹在做阶段划分时,需要识别前向计算开始的算子和反向计算结束的算子。为了降低用户使用Profiler的门槛,MindSpore会对这两个算子做自动识别,方法为:
前向计算开始的算子指定为`get_next`算子之后连接的第一个算子,反向计算结束的算子指定为最后一次all reduce之前连接的算子。**Profiler不保证在所有情况下自动识别的结果和用户的预期一致,用户可以根据网络的特点自行调整**,调整方法如下:
- 设置`FP_POINT`环境变量指定前向计算开始的算子,如`export FP_POINT=fp32_vars/conv2d/BatchNorm`
- 设置`BP_POINT`环境变量指定反向计算结束的算子,如`export BP_POINT=loss_scale/gradients/AddN_70`
#### 算子性能分析
......@@ -111,88 +107,75 @@ def test_profiler():
![op_type_statistics.png](./images/op_type_statistics.PNG)
图3: 算子类别统计分析
图3:算子类别统计分析
图3展示了按算子类别进行统计分析的结果,包含以下内容:
- 可以选择饼图/柱状图展示各算子类别的时间占比,每个算子类别的执行时间会统计属于该类别的算子执行时间总和;
图3展示了按算子类别进行统计分析的结果,包含以下内容:
- 可以选择饼图/柱状图展示各算子类别的时间占比,每个算子类别的执行时间会统计属于该类别的算子执行时间总和。
- 统计前20个占比时间最长的算子类别,展示其时间所占的百分比以及具体的执行时间(毫秒)。
![op_statistics.png](./images/op_statistics.PNG)
图4: 算子统计分析
图4展示了算子性能统计表,包含以下内容:
图4:算子统计分析
- 选择全部:按单个算子的统计结果进行排序展示,展示维度包括算子名称、算子类型、算子执行时间、算子全scope名称、算子信息等;默认按算子执行时间排序;
- 选择分类:按算子类别的统计结果进行排序展示,展示维度包括算子分类名称、算子类别执行时间、执行频次、占总时间的比例等。点击每个算子类别,可以进一步查看该类别下所有单个算子的统计信息;
图4展示了算子性能统计表,包含以下内容:
- 选择全部:按单个算子的统计结果进行排序展示,展示维度包括算子名称、算子类型、算子执行时间、算子全scope名称、算子信息等;默认按算子执行时间排序。
- 选择分类:按算子类别的统计结果进行排序展示,展示维度包括算子分类名称、算子类别执行时间、执行频次、占总时间的比例等。点击每个算子类别,可以进一步查看该类别下所有单个算子的统计信息。
- 搜索:在右侧搜索框中输入字符串,支持对算子名称/类别进行模糊搜索。
#### MindData性能分析
使用MindData性能分析组件可以对训练数据准备过程进行性能分析。数据准备过程可以分为三个阶段:数据处理pipeline、数据发送至device以及device侧读取训练数据,MindData性能分析组件会对每个阶段的处理性能进行详细分析,并将分析结果进行展示。
使用MindData性能分析组件可以对训练数据准备过程进行性能分析。数据准备过程可以分为三个阶段:数据处理pipeline、数据发送至Device以及Device侧读取训练数据,MindData性能分析组件会对每个阶段的处理性能进行详细分析,并将分析结果进行展示。
![minddata_profile.png](./images/minddata_profile.png)
图5: MindData性能分析
图5:MindData性能分析
图5展示了MindData性能分析页面,包含迭代间隙和数据处理两个TAB页面。
迭代间隙TAB页主要用来分析数据准备三个阶段是否存在性能瓶颈,数据队列图是分析判断的重要依据:
- 数据队列Size代表Device侧从队列取数据时队列的长度,如果数据队列Size为0,则训练会一直等待,直到队列中有数据才会开始某个step的训练;如果数据队列Size大于0,则训练可以快速取到数据,MindData不是该step的瓶颈所在;
- 主机队列Size可以推断出数据处理和发送速度,如果主机队列Size为0,表示数据处理速度慢而数据发送速度快,需要加快数据处理;
迭代间隙TAB页主要用来分析数据准备三个阶段是否存在性能瓶颈,数据队列图是分析判断的重要依据:
- 数据队列Size代表Device侧从队列取数据时队列的长度,如果数据队列Size为0,则训练会一直等待,直到队列中有数据才会开始某个step的训练;如果数据队列Size大于0,则训练可以快速取到数据,MindData不是该step的瓶颈所在。
- 主机队列Size可以推断出数据处理和发送速度,如果主机队列Size为0,表示数据处理速度慢而数据发送速度快,需要加快数据处理。
- 如果主机队列Size一直较大,而数据队列的Size持续很小,则数据发送有可能存在性能瓶颈。
![data_op_profile.png](./images/data_op_profile.png)
图6: 数据处理Pipeline分析
图6:数据处理pipeline分析
图6展示了数据处理TAB页面,可以对数据处理pipeline做进一步分析。不同的数据算子之间使用队列进行数据交换,队列的长度可以反映出算子处理数据的快慢,进而推断出pipeline中的瓶颈算子所在。
算子队列的平均使用率代表队列中已有数据Size除以队列最大数据Size的平均值,使用率越高说明队列中数据积累越多。算子队列关系展示了数据处理pipeline中的算子以及它们之间的连接情况,点击某个
队列可以在下方查看该队列中数据Size随着时间的变化曲线,以及与数据队列连接的算子信息等。对数据处理pipeline的分析有如下建议:
- 当算子左边连接的Queue使用率都比较高,右边连接的Queue使用率比较低,该算子可能是性能瓶颈;
- 对于最左侧的算子,如果其右边所有Queue的使用率都比较低,该算子可能是性能瓶颈;
算子队列的平均使用率代表队列中已有数据Size除以队列最大数据Size的平均值,使用率越高说明队列中数据积累越多。算子队列关系展示了数据处理pipeline中的算子以及它们之间的连接情况,点击某个队列可以在下方查看该队列中数据Size随着时间的变化曲线,以及与数据队列连接的算子信息等。对数据处理pipeline的分析有如下建议:
- 当算子左边连接的Queue使用率都比较高,右边连接的Queue使用率比较低,该算子可能是性能瓶颈。
- 对于最左侧的算子,如果其右边所有Queue的使用率都比较低,该算子可能是性能瓶颈。
- 对于最右侧的算子,如果其左边所有Queue的使用率都比较高,该算子可能是性能瓶颈。
对于不同的类型的MindData算子,有如下优化建议:
- 如果Dataset算子是性能瓶颈,建议增加num_parallel_workers;
- 如果GeneratorOp类型的算子是性能瓶颈,建议增加num_parallel_workers,并尝试将其替换为MindRecordDataset;
- 如果MapOp类型的算子是性能瓶颈,建议增加num_parallel_workers,如果该算子为python算子,可以尝试优化脚本;
- 如果BatchOp类型的算子是性能瓶颈,建议调整prefetch_size的大小。
对于不同的类型的MindData算子,有如下优化建议:
- 如果Dataset算子是性能瓶颈,建议增加`num_parallel_workers`
- 如果GeneratorOp类型的算子是性能瓶颈,建议增加`num_parallel_workers`,并尝试将其替换为`MindRecordDataset`
- 如果MapOp类型的算子是性能瓶颈,建议增加`num_parallel_workers`,如果该算子为Python算子,可以尝试优化脚本。
- 如果BatchOp类型的算子是性能瓶颈,建议调整`prefetch_size`的大小。
#### Timeline分析
Timeline组件可以展示:
- 算子分配到哪个设备(AICPU、AICore等)执行;
- MindSpore对该网络的流切分策略;
- 算子在Device上的执行序列和执行时长
Timeline组件可以展示:
- 算子分配到哪个设备(AICPU、AICore等)执行。
- MindSpore对该网络的流切分策略。
- 算子在Device上的执行序列和执行时长。
通过分析Timeline,用户可以对训练过程进行细粒度分析:从High Level层面,可以分析流切分方法是否合理、迭代间隙和拖尾时间是否过长等;从Low Level层面,可以分析
算子执行时间等。
通过分析Timeline,用户可以对训练过程进行细粒度分析:从High Level层面,可以分析流切分方法是否合理、迭代间隙和拖尾时间是否过长等;从Low Level层面,可以分析算子执行时间等。
![timeline.png](./images/timeline.png)
图7: Timeline分析
图7:Timeline分析
Timeline主要包含如下几个部分:
Timeline主要包含如下几个部分:
- Device及其stream list: 包含Device上的stream列表,每个stream由task执行序列组成,一个task是其中的一个小方块,大小代表执行时间长短。
- 算子信息: 选中某个task后,可以显示该task对应算子的信息,包括名称、type等。
- **Device及其stream list**: 包含device上的stream列表,每个stream由task执行序列组成,一个task是其中的一个小方块,大小代表执行时间长短;
- **算子信息**: 选中某个task后,可以显示该task对应算子的信息,包括名称、type等
可以使用W/A/S/D来放大、缩小地查看Timline图信息
可以使用W/A/S/D来放大、缩小地查看Timeline图信息。
## 规格
- 为了控制性能测试时生成数据的大小,大型网络建议性能调试的step数目限制在10以内。
- Timeline数据的解析比较耗时,且一般几个step的数据即足够分析出结果。出于数据解析和UI展示性能的考虑,Profiler最多展示20M数据(对大型网络20M可以显示10+ step的信息)。
- Timeline数据的解析比较耗时,且一般几个step的数据即足够分析出结果。出于数据解析和UI展示性能的考虑,Profiler最多展示20M数据(对大型网络20M可以显示10+条step的信息)。
训练过程可视化
===============
.. toctree::
:maxdepth: 1
dashboard_and_lineage
performance_profiling
MindInsight相关命令
--------------------
1. 查看命令帮助信息
.. code-block::
mindinsight --help
2. 查看版本信息
.. code-block::
mindinsight --version
3. 启动服务
.. code-block::
mindinsight start [-h] [--config <CONFIG>] [--workspace <WORKSPACE>]
[--port <PORT>] [--reload-interval <RELOAD_INTERVAL>]
[--summary-base-dir <SUMMARY_BASE_DIR>]
参数含义如下:
- `-h, --help` : 显示启动命令的帮助信息。
- `--config <CONFIG>` : 指定配置文件或配置模块,CONFIG为物理文件路径(file:/path/to/config.py)或Python可识别的模块路径(python:path.to.config.module)。
- `--workspace <WORKSPACE>` : 指定工作目录路径,WORKSPACE默认为 $HOME/mindinsight。
- `--port <PORT>` : 指定Web可视化服务端口,取值范围是1~65535,PORT默认为8080。
- `--url-path-prefix <URL_PATH_PREFIX>` : 指定Web服务地址前缀,URL_PATH_PREFIX默认为空。
- `--reload-interval <RELOAD_INTERVAL>` : 指定加载数据的时间间隔(单位:秒),设置为0时表示只加载一次数据,RELOAD_INTERVAL默认为3秒。
- `--summary-base-dir <SUMMARY_BASE_DIR>` : 指定加载训练日志数据的根目录路径,MindInsight将遍历此路径下的直属子目录。若某个直属子目录包含日志文件,则该子目录被识别为日志文件目录,若根目录包含日志文件,则根目录被识别为日志文件目录。SUMMARY_BASE_DIR默认为当前目录路径。
.. note::
服务启动时,命令行参数值将被保存为进程的环境变量,并以 `MINDINSIGHT_` 开头作为标识,如 `MINDINSIGHT_CONFIG`,`MINDINSIGHT_WORKSPACE`,`MINDINSIGHT_PORT` 等。
4. 查看服务进程信息
MindInsight向用户提供Web服务,可通过以下命令,查看当前运行的Web服务进程。
.. code-block::
ps -ef | grep mindinsight
根据服务进程PID,可通过以下命令,查看当前服务进程对应的工作目录`WORKSPACE`。
.. code-block::
lsof -p <PID> | grep access
输出如下,可查看`WORKSPACE`。
.. code-block::
gunicorn <PID> <USER> <FD> <TYPE> <DEVICE> <SIZE/OFF> <NODE> <WORKSPACE>/log/gunicorn/access.log
5. 停止服务
.. code-block::
mindinsight stop [-h] [--port PORT]
参数含义如下:
- `-h, --help` : 显示停止命令的帮助信息。
- `--port <PORT>` : 指定Web可视化服务端口,取值范围是1~65535,PORT默认为8080。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册