@@ -38,12 +38,49 @@ Scalars, images, computational graphs, and model hyperparameters during training
### Collect Summary Data
Currently, MindSpore uses the `Callback` mechanism to save scalars, images, computational graphs, and model hyperparameters to summary log files and display them on the web page.
Currently, MindSpore supports to save scalars, images, computational graph, and model hyperparameters to summary log file and display them on the web page.
Scalar and image data is recorded by using the `Summary` operator. A computational graph is saved to the summary log file by using `SummaryRecord` after network compilation is complete.
Model parameters are saved to the summary log file by using `TrainLineage` or `EvalLineage`.
MindSpore currently supports three ways to record data into summary log file.
Step 1: Call the `Summary` operator in the `construct` function of the derived class that inherits `nn.Cell` to collect image or scalar data.
**Method one: Automatically collected through `SummaryCollector`**
The `Callback` mechanism in MindSpore provides a quick and easy way to collect common information, including the calculational graph, loss value, learning rate, parameter weights, etc. It is named 'SummaryCollector'.
When you write a training script, you just instantiate the `SummaryCollector` and apply it to either `model.train` or `model.eval`. You can automatically collect some common summary data. `SummaryCollector` detailed usage can reference `API` document `mindspore.train.callback.SummaryCollector`.
**Method two: Custom collection of network data with summary operators and SummaryCollector**
In addition to providing the `SummaryCollector` that automatically collects some summary data, MindSpore provides summary operators that enable custom collection other data on the network, such as the input of each convolutional layer, or the loss value in the loss function, etc. The recording method is shown in the following steps.
Step 1: Call the summary operator in the `construct` function of the derived class that inherits `nn.Cell` to collect image or scalar data.
For example, when a network is defined, image data is recorded in `construct` of the network. When the loss function is defined, the loss value is recorded in `construct` of the loss function.
...
...
@@ -120,62 +157,77 @@ class Net(nn.Cell):
returnout
```
Step 2: Use the `Callback` mechanism to add the required callback instance to specify the data to be recorded during training.
Step 2: In the training script, instantiate the `SummaryCollector` and apply it to `model.train`.
-`SummaryStep` specifies the step interval for recording summary data.
The sample code is as follows:
-`TrainLineage` records parameters related to model training.
The `network` parameter needs to be specified when `SummaryRecord` is called to record the computational graph. By default, the computational graph is not recorded.
The following pseudocode is shown in the CNN network, where developers can use the network output with the original tag and the prediction tag to generate the image of the confusion matrix.
It is then recorded into the summary log file through the `SummaryRecord` module.
`SummaryRecord` detailed usage can reference `API` document `mindspore.train.summary.SummaryRecord`.
# Init SummaryRecord and specify a folder for storing summary log files
Use the `save_graphs` option of `context` to record the computational graph after operator fusion.
`ms_output_after_hwopt.pb` is the computational graph after operator fusion.
The above three ways, support the record computational graph, loss value and other data. In addition, MindSpore also supports the saving of computational graph for other phases of training, through
the `save_graphs` option of `context.set_context` in the training script is set to `True` to record computational graphs of other phases, including the computational graph after operator fusion.
In the saved files, `ms_output_after_hwopt.pb` is the computational graph after operator fusion, which can be viewed on the web page.
> - Currently MindSpore supports recording computational graph after operator fusion for Ascend 910 AI processor only.
> - It's recommended that you reduce calls to `HistogramSummary` under 10 times per batch. The more you call `HistogramSummary`, the more performance overhead.
> - Please use the *with statement* to ensure that `SummaryRecord` is properly closed at the end, otherwise the process may fail to exit.
> - When using the Summary operator to collect data in training, 'HistogramSummary' operator affects performance, so please use as little as possible.