-`-h, --help` : Displays the help information about the startup command.
-`--config <CONFIG>` : Specifies the configuration file or module. CONFIG indicates the physical file path (file:/path/to/config.py), or a module path (python:path.to.config.module) that can be identified by Python.
-`--workspace <WORKSPACE>` : Specifies the working directory. The default value of WORKSPACE is $HOME/mindinsight.
-`--port <PORT>` : Specifies the port number of the web visualization service. The value ranges from 1 to 65535. The default value of PORT is 8080.
-`--url-path-prefix <URL_PATH_PREFIX>` : Specifies the path prefix of the web visualization service. The default value of URL_PATH_PREFIX is empty string.
-`--reload-interval <RELOAD_INTERVAL>` : Specifies the interval (unit: second) for loading data. The value 0 indicates that data is loaded only once. The default value of RELOAD_INTERVAL is 3 seconds.
-`--summary-base-dir <SUMMARY_BASE_DIR>` : Specifies the root directory for loading training log data. MindInsight traverses the direct subdirectories in this directory and searches for log files. If a direct subdirectory contains log files, it is identified as the log file directory. If a root directory contains log files, it is identified as the log file directory. SUMMARY_BASE_DIR is the current directory path by default.
> When the service is started, the parameter values of the command line are saved as the environment variables of the process and start with `MINDINSIGHT_`, for example, `MINDINSIGHT_CONFIG`, `MINDINSIGHT_WORKSPACE`, and `MINDINSIGHT_PORT`.
### Stop the service.
```bash
mindinsight stop [-h][--port PORT]
```
Optional parameters as follows:
-`-h, --help` : Displays the help information about the stop command.
-`--port <PORT>` : Specifies the port number of the web visualization service. The value ranges from 1 to 65535. The default value of PORT is 8080.
### View the service process information.
MindInsight provides user with web services. Run the following command to view the running web service process:
```bash
ps -ef | grep mindinsight
```
Run the following command to access the working directory `WORKSPACE` corresponding to the service process based on the service process ID:
```bash
lsof -p <PID> | grep access
```
Output with the working directory `WORKSPACE` as follows:
@@ -24,14 +24,12 @@ Performance data like operators' execution time are recorded in files and can be
...
@@ -24,14 +24,12 @@ Performance data like operators' execution time are recorded in files and can be
## Operation Process
## Operation Process
- Prepare a training script, add profiler apis in the training script, and run the training script.
- Prepare a training script, add profiler apis in the training script, and run the training script.
- Start MindInsight and specify the profile data directory using startup parameters. After MindInsight is started, access the visualization page based on the IP address and port number. The default access IP address is `http://127.0.0.1:8080`.
- Start MindInsight and specify the profiler data directory using startup parameters. After MindInsight is started, access the visualization page based on the IP address and port number. The default access IP address is `http://127.0.0.1:8080`.
- Find the training in the list, click the performance profiling link, and view the data on the web page.
- Find the training in the list, click the performance profiling link, and view the data on the web page.
## Preparing the Training Script
## Preparing the Training Script
To enable the performance profiling of neural networks, MindInsight Profiler APIs should be added into the script. At first, the MindInsight `Profiler` object need
To enable the performance profiling of neural networks, MindInsight Profiler APIs should be added into the script. At first, the MindInsight `Profiler` object need to be set after set context and before the network initialization. Then, at the end of the training, `Profiler.analyse()` should be called to finish profiling and generate the perforamnce analyse results.
to be set after set context and before the network initialization. Then, at the end of the training, `Profiler.analyse()` should be called to finish profiling and generate the perforamnce
analyse results.
The sample code is as follows:
The sample code is as follows:
...
@@ -75,38 +73,31 @@ Users can access the Performance Profiler by selecting a specific training from
...
@@ -75,38 +73,31 @@ Users can access the Performance Profiler by selecting a specific training from
Figure 1 displays the overall performance of the training, including the overall data of Step Trace, Operator Performance, MindData Performance and Timeline. The data shown in these components include:
Figure 1 displays the overall performance of the training, including the overall data of Step Trace, Operator Performance, MindData Performance and Timeline. The data shown in these components include:
- Step Trace: It will divide the training step into several stages and collect execution time for each stage. The overall performance page will show the step trace graph.
- Step Trace: It will divide the training step into several stages and collect execution time for each stage. The overall performance page will show the step trace graph.
- Operator Performance: It will collect the execution time of operators and operator types. The overall performance page will show the pie graph for different operator types.
- Operator Performance: It will collect the execution time of operators and operator types. The overall performance page will show the pie graph for different operator types.
- MindData Performance: It will analyse the performance of the data input stages. The overall performance page will show the number of steps that may be the bottleneck for these stages.
- MindData Performance: It will analyse the performance of the data input stages. The overall performance page will show the number of steps that may be the bottleneck for these stages.
- Timeline: It will collect execution time for stream tasks on the devices. The tasks will be shown on the time axis. The overall performance page will show the statistics for streams and tasks.
- Timeline: It will collect execution time for stream tasks on the devices. The tasks will be shown on the time axis. The overall performance page will show the statistics for streams and tasks.
Users can click the detail link to see the details of each components. Besides, MindInsight Profiler will try to analyse the performance data, the assistant on the left
Users can click the detail link to see the details of each components. Besides, MindInsight Profiler will try to analyse the performance data, the assistant on the left will show performance tuning suggestions for this training.
will show performance tuning suggestions for this training.
#### Step Trace Analysis
#### Step Trace Analysis
The Step Trace Component is used to show the general performance of the stages in the training. Step Trace will divide the training into several stages:
The Step Trace Component is used to show the general performance of the stages in the training. Step Trace will divide the training into several stages:
Step Gap (The time between the end of one step and the computation of next step)、Forward/Backward Propagation、 All Reduce and Parameter Update. It will show the execution time for each stage, and help to find the bottleneck
Step Gap (The time between the end of one step and the computation of next step)、Forward/Backward Propagation、 All Reduce and Parameter Update. It will show the execution time for each stage, and help to find the bottleneck stage quickly.
stage quickly.
![step_trace.png](./images/step_trace.png)
![step_trace.png](./images/step_trace.png)
Figure 2: Step Trace Analysis
Figure 2:Step Trace Analysis
Figure 2 displays the Step Trace page. The Step Trace detail will show the start/finish time for each stage. By default, it shows the average time for all the steps. Users
can also choose a specific step to see its step trace statistics. The graphs at the bottom of the page show how the execution time of Step Gap、Forward/Backward Propagation and
Step Tail (The time between the end of Backward Propagation and the end of Parameter Update) changes according to different steps, it will help to decide whether we can optimize the performance of some stages.
In order to divide the stages, the Step Trace Component need to figure out the forward propagation start operator and the backward propagation end operator. MindSpore will automatically figure out the two operators to reduce
Figure 2 displays the Step Trace page. The Step Trace detail will show the start/finish time for each stage. By default, it shows the average time for all the steps. Users can also choose a specific step to see its step trace statistics. The graphs at the bottom of the page show how the execution time of Step Gap, Forward/Backward Propagation and Step Tail (The time between the end of Backward Propagation and the end of Parameter Update) changes according to different steps, it will help to decide whether we can optimize the performance of some stages.
the profiler configuration work. The first operator after get_next will be selected as the forward start operator and the operator before the last all reduce will be selected as the backward end operator.
**However, Profiler do not guarantee that the automatically selected operators will meet the user's expectation in all cases.** Users can set the two operators manually as follows:
- Set environment variable ```FP_POINT``` to configure the forward start operator, for example, ```export FP_POINT=fp32_vars/conv2d/BatchNorm```
In order to divide the stages, the Step Trace Component need to figure out the forward propagation start operator and the backward propagation end operator. MindSpore will automatically figure out the two operators to reduce the profiler configuration work. The first operator after get_next will be selected as the forward start operator and the operator before the last all reduce will be selected as the backward end operator.
- Set environment variable ```BP_POINT``` to configure the backward end operator, for example, ```export BP_POINT=loss_scale/gradients/AddN_70```
**However, Profiler do not guarantee that the automatically selected operators will meet the user's expectation in all cases.** Users can set the two operators manually as follows:
- Set environment variable `FP_POINT` to configure the forward start operator, for example, `export FP_POINT=fp32_vars/conv2d/BatchNorm`.
- Set environment variable `BP_POINT` to configure the backward end operator, for example, `export BP_POINT=loss_scale/gradients/AddN_70`.
#### Operator Performance Analysis
#### Operator Performance Analysis
...
@@ -117,8 +108,7 @@ The operator performance analysis component is used to display the execution tim
...
@@ -117,8 +108,7 @@ The operator performance analysis component is used to display the execution tim
Figure 3: Statistics for Operator Types
Figure 3: Statistics for Operator Types
Figure 3 displays the statistics for the operator types, including:
Figure 3 displays the statistics for the operator types, including:
- Choose pie or bar graph to show the proportion time occupied by each operator type. The time of one operator type is calculated by accumulating the execution time of operators belong to this type.
- Choose pie or bar graph to show the proportion time occupied by each operator type. The time of one operator type is calculated by accumulating the execution time of operators belong to this type.
- Display top 20 operator types with longest execution time, show the proportion and execution time (ms) of each operator type.
- Display top 20 operator types with longest execution time, show the proportion and execution time (ms) of each operator type.
...
@@ -126,25 +116,23 @@ Figure 3 displays the statistics for the operator types, including:
...
@@ -126,25 +116,23 @@ Figure 3 displays the statistics for the operator types, including:
Figure 4: Statistics for Operators
Figure 4: Statistics for Operators
Figure 4 displays the statistics table for the operators, including:
Figure 4 displays the statistics table for the operators, including:
- Choose All: Display statistics for the operators, including operator name, type, execution time, full scope time, information etc. The table will be sorted by execution time by default.
- Choose All: Display statistics for the operators, including operator name, type, execution time, full scope time, information etc. The table will be sorted by execution time by default.
- Choose Type: Display statistics for the operator types, including operator type name, execution time, execution frequency and proportion of total time. Users can click on each line, querying for all the operators belong to this type.
- Choose Type: Display statistics for the operator types, including operator type name, execution time, execution frequency and proportion of total time. Users can click on each line, querying for all the operators belong to this type.
- Search: There is a search box on the right, which can support fuzzy search for operators/operator types.
- Search: There is a search box on the right, which can support fuzzy search for operators/operator types.
#### MindData Performance Analysis
#### MindData Performance Analysis
The MindData performance analysis component is used to analyse the execution of data input pipeline for the training. The data input pipeline can be divided into three stages:
The MindData performance analysis component is used to analyse the execution of data input pipeline for the training. The data input pipeline can be divided into three stages:
the data process pipeline, data transfer from host to device and data fetch on device. The component will analyse the performance of each stage for detail and display the results.
the data process pipeline, data transfer from host to device and data fetch on device. The component will analyse the performance of each stage for detail and display the results.
Figure 5 displays the page of MindData performance analysis component. It consists of two tabs: The step gap and the data process.
Figure 5 displays the page of MindData performance analysis component. It consists of two tabs: The step gap and the data process.
The step gap page is used to analyse whether there is performance bottleneck in the three stages. We can get our conclusion from the data queue graphs:
The step gap page is used to analyse whether there is performance bottleneck in the three stages. We can get our conclusion from the data queue graphs:
- The data queue size stands for the queue length when the training fetches data from the queue on the device. If the data queue size is 0, the training will wait until there is data in
- The data queue size stands for the queue length when the training fetches data from the queue on the device. If the data queue size is 0, the training will wait until there is data in
the queue; If the data queue size is above 0, the training can get data very quickly, and it means MindData is not the bottleneck for this training step.
the queue; If the data queue size is above 0, the training can get data very quickly, and it means MindData is not the bottleneck for this training step.
- The host queue size can be used to infer the speed of data process and data transfer. If the host queue size is 0, it means we need to speed up the data process stage.
- The host queue size can be used to infer the speed of data process and data transfer. If the host queue size is 0, it means we need to speed up the data process stage.
...
@@ -152,50 +140,41 @@ the queue; If the data queue size is above 0, the training can get data very qui
...
@@ -152,50 +140,41 @@ the queue; If the data queue size is above 0, the training can get data very qui
Figure 6 displays the page of data process pipeline analysis. The data queues are used to exchange data between the MindData operators. The data size of the queues reflect the
Figure 6 displays the page of data process pipeline analysis. The data queues are used to exchange data between the MindData operators. The data size of the queues reflect the data consume speed of the operators, and can be used to infer the bottleneck operator. The queue usage percentage stands for the average value of data size in queue divide data queue maximum size, the higher the usage percentage, the more data that is accumulated in the queue. The graph at the bottom of the page shows the MindData pipeline operators with the data queues, the user can click one queue to see how the data size changes according to the time, and the operators connected to the queue. The data process pipeline can be analysed as follows:
data consume speed of the operators, and can be used to infer the bottleneck operator. The queue usage percentage stands for the average value of data size in queue divide data queue maximum size, the higher
- When the input queue usage percentage of one operator is high, and the output queue usage percentage is low, the operator may be the bottleneck.
the usage percentage, the more data that is accumulated in the queue. The graph at the bottom of the page shows the MindData pipeline operators with the data queues, the user can click one queue to see how
- For the leftmost operator, if the usage percentage of the queues on the right are all low, the operator may be the bottleneck.
the data size changes according to the time, and the operators connected to the queue. The data process pipeline can be analysed as follows:
- When the input queue usage percentage of one operator is high, and the output queue usage percentage is low, the operator may be the bottleneck;
- For the leftmost operator, if the usage percentage of the queues on the right are all low, the operator may be the bottleneck;
- For the rightmost operator, if the usage percentage of the queues on th left are all high, the operator may be the bottleneck.
- For the rightmost operator, if the usage percentage of the queues on th left are all high, the operator may be the bottleneck.
To optimize the perforamnce of MindData operators, there are some suggestions:
To optimize the perforamnce of MindData operators, there are some suggestions:
- If the Dataset Operator is the bottleneck, try to increase the `num_parallel_workers`.
- If the `Dataset` Operator is the bottleneck, try to increase the `num_parallel_workers`;
- If a GeneratorOp type operator is the bottleneck, try to increase the `num_parallel_workers` and replace the operator to `MindRecordDataset`.
- If a `GeneratorOp` type operator is the bottleneck, try to increase the `num_parallel_workers` and replace the operator to `MindRecordDataset`;
- If a MapOp type operator is the bottleneck, try to increase the `num_parallel_workers`. If it is a python oerator, try to optimize the training script.
- If a `MapOp` type operator is the bottleneck, try to increase the `num_parallel_workers`; If it is a python operator, try to optimize the training script;
- If a BatchOp type operator is the bottleneck, try to adjust the size of `prefetch_size`.
- If a `BatchOp` type operator is the bottleneck, try to adjust the size of `prefetch_size`.
#### Timeline Analysis
#### Timeline Analysis
The Timeline component can display:
The Timeline component can display:
- The operators (AICore/AICPU operators) are executed on which device.
- The operators (AICore/AICPU operators) are executed on which device;
- The MindSpore stream split strategy for this neural network.
- The MindSpore stream split strategy for this neural network;
- The time of tasks executed on the device.
- The time of tasks executed on the device.
Users can get the most detailed information from the Timeline:
Users can get the most detailed information from the Timeline:
- From high level, users can analyse whether the stream split strategy can be optimized and whether is step tail is too long.
- From high level, users can analyse whether the stream split strategy can be optimized and whether is step tail is too long;
- From low level, users can analyse the execution time for all the operators, etc.
- From low level, users can analyse the execution time for all the operators, etc.
![timeline.png](./images/timeline.png)
![timeline.png](./images/timeline.png)
Figure 7 Timeline Analysis
Figure 7: Timeline Analysis
The Timeline consists of the following parts:
-**Device and Stream List**: It will show the stream list on each device. Each stream consists of a series of tasks. One rectangle stands for one task, and the area stands for the execution time of the task;
The Timeline consists of the following parts:
-**The Operator Information**: When we click one task, the corresponding operator of this task will be shown at the bottom.
- Device and Stream List: It will show the stream list on each device. Each stream consists of a series of tasks. One rectangle stands for one task, and the area stands for the execution time of the task.
- The Operator Information: When we click one task, the corresponding operator of this task will be shown at the bottom.
W/A/S/D can be applied to zoom in and out of the Timeline graph.
W/A/S/D can be applied to zoom in and out of the Timeline graph.
##Specifications
##Specifications
- To limit the data size generated by the Profiler, MindInsight suggests that for large neural network, the profiled steps should better below 10.
- To limit the data size generated by the Profiler, MindInsight suggests that for large neural network, the profiled steps should better below 10.
- The parse of Timeline data is time consuming, and several step's data is usually enough for analysis. In order to speed up the data parse and UI
- The parse of Timeline data is time consuming, and several step's data is usually enough for analysis. In order to speed up the data parse and UI display, Profiler will show at most 20M data (Contain 10+ step information for large networks).
display, Profiler will show at most 20M data (Contain 10+ step information for large networks).
- `-h, --help` : Displays the help information about the startup command.
- `--config <CONFIG>` : Specifies the configuration file or module. CONFIG indicates the physical file path (file:/path/to/config.py), or a module path (python:path.to.config.module) that can be identified by Python.
- `--workspace <WORKSPACE>` : Specifies the working directory. The default value of WORKSPACE is $HOME/mindinsight.
- `--port <PORT>` : Specifies the port number of the web visualization service. The value ranges from 1 to 65535. The default value of PORT is 8080.
- `--url-path-prefix <URL_PATH_PREFIX>` : Specifies the path prefix of the web visualization service. The default value of URL_PATH_PREFIX is empty string.
- `--reload-interval <RELOAD_INTERVAL>` : Specifies the interval (unit: second) for loading data. The value 0 indicates that data is loaded only once. The default value of RELOAD_INTERVAL is 3 seconds.
- `--summary-base-dir <SUMMARY_BASE_DIR>` : Specifies the root directory for loading training log data. MindInsight traverses the direct subdirectories in this directory and searches for log files. If a direct subdirectory contains log files, it is identified as the log file directory. If a root directory contains log files, it is identified as the log file directory. SUMMARY_BASE_DIR is the current directory path by default.
.. note::
When the service is started, the parameter values of the command line are saved as the environment variables of the process and start with `MINDINSIGHT_`, for example, `MINDINSIGHT_CONFIG`, `MINDINSIGHT_WORKSPACE`, and `MINDINSIGHT_PORT`.
4. View the service process information.
MindInsight provides user with web services. Run the following command to view the running web service process:
.. code-block::
ps -ef | grep mindinsight
Run the following command to access the working directory `WORKSPACE` corresponding to the service process based on the service process ID:
.. code-block::
lsof -p <PID> | grep access
Output with the working directory `WORKSPACE` as follows: