diff --git a/docs/source_zh_cn/design/mindinsight/graph_visual_design.md b/docs/source_zh_cn/design/mindinsight/graph_visual_design.md index 7e5f0a932e4a1174e2737f23e90535f70a232484..56e78694f3e04171f9436f11db454ffe891c6c42 100644 --- a/docs/source_zh_cn/design/mindinsight/graph_visual_design.md +++ b/docs/source_zh_cn/design/mindinsight/graph_visual_design.md @@ -8,8 +8,8 @@ - [概念设计](#概念设计) - [后端设计](#后端设计) - [前端设计](#前端设计) - - [接口设计](#接口设计) - - [文件接口设计](#文件接口设计) + - [接口设计](#接口设计) + - [文件接口设计](#文件接口设计) @@ -61,12 +61,12 @@ 计算图中,根据斜线(/)对节点的名称划分层次,并逐层展示,参考`计算图主体展示`图。双击一个作用域节点后,将会展示它的子节点。 -## 接口设计 +### 接口设计 计算图中,主要有文件接口和RESTful API接口,其中文件接口为`summary.proto`文件,是MindInsight和MindSpore进行数据对接的接口。 RESTful API接口是MindInsight前后端进行数据交互的接口。 -### 文件接口设计 +#### 文件接口设计 MindSpore与MindInsight之间的数据交互,采用[protobuf](https://developers.google.cn/protocol-buffers/docs/pythontutorial?hl=zh-cn)定义数据格式。 [summary.proto文件](https://gitee.com/mindspore/mindinsight/blob/master/mindinsight/datavisual/proto_files/mindinsight_summary.proto)为总入口,计算图的消息对象定义为 `GraphProto`。`GraphProto`的详细定义可以参考[anf_ir.proto文件](https://gitee.com/mindspore/mindinsight/blob/master/mindinsight/datavisual/proto_files/mindinsight_anf_ir.proto)。 diff --git a/docs/source_zh_cn/design/mindinsight/training_visual_design.md b/docs/source_zh_cn/design/mindinsight/training_visual_design.md index fc4345fdf19cd0c325644d749557bd64c814ea50..51320902896702b6bb8fb96b820ccc4d7f8ba2ec 100644 --- a/docs/source_zh_cn/design/mindinsight/training_visual_design.md +++ b/docs/source_zh_cn/design/mindinsight/training_visual_design.md @@ -1,13 +1,13 @@ -# MindInsight训练可视总体设计 +# 训练可视总体设计 -- [MindInsight训练可视总体设计](#mindinsight训练可视总体设计) - - [MindInsight训练可视逻辑架构](#mindinsight训练可视逻辑架构) +- [训练可视总体设计](#训练可视总体设计) + - [训练可视逻辑架构](#训练可视逻辑架构) - [训练信息收集架构](#训练信息收集架构) - [训练信息分析及展示架构](#训练信息分析及展示架构) - [代码组织](#代码组织) - - [MindInsight训练可视数据模型](#mindinsight训练可视数据模型) + - [训练可视数据模型](#训练可视数据模型) - [训练信息数据流](#训练信息数据流) - [数据模型](#数据模型) - [训练作业](#训练作业) @@ -28,7 +28,7 @@ 本文主要介绍MindInsight训练可视功能的逻辑架构、代码组织和数据模型。 -## MindInsight训练可视逻辑架构 +## 训练可视逻辑架构 在架构上,训练可视功能的逻辑架构分为两部分:训练信息收集架构,训练信息分析及展示架构。 @@ -83,7 +83,7 @@ ||ui||MindInsight Web UI。| |tests|||测试用例目录。| -## MindInsight训练可视数据模型 +## 训练可视数据模型 ### 训练信息数据流 diff --git a/docs/source_zh_cn/design/mindspore/distributed_training_design.md b/docs/source_zh_cn/design/mindspore/distributed_training_design.md index 85955ac3398612b2387b3c9bae23c93b6b193cda..2572c6121f4f7d4649b6fdf7789bae489aad8de4 100644 --- a/docs/source_zh_cn/design/mindspore/distributed_training_design.md +++ b/docs/source_zh_cn/design/mindspore/distributed_training_design.md @@ -10,7 +10,7 @@ - [数据并行](#数据并行) - [设计原理](#设计原理) - [代码实现](#代码实现) - - [其他并行](#其他并行) + - [其他并行](#其他并行) @@ -71,7 +71,7 @@ - [grad_reducer.py](https://gitee.com/mindspore/mindspore/blob/master/mindspore/nn/wrap/grad_reducer.py): 这个文件实现了梯度聚合的过程。对入参`grads`用`HyperMap`展开后插入`AllReduce`算子,这里采用的是全局通信组,用户也可以根据自己网络的需求仿照这个模块进行自定义开发。MindSpore中单机和分布式执行共用一套网络封装接口,在`Cell`内部通过`ParallelMode`来区分是否要对梯度做聚合操作,网络封装接口建议参考`TrainOneStepCell`代码实现。 -### 其他并行 +## 其他并行 建设中,即将上线。 diff --git a/tutorials/source_en/advanced_use/dashboard.md b/tutorials/source_en/advanced_use/dashboard.md index f47161b93d5d584520e56d15fbf880077b0e67f9..86ea80788cd63dce417fbab02f3629f3992f5e4b 100644 --- a/tutorials/source_en/advanced_use/dashboard.md +++ b/tutorials/source_en/advanced_use/dashboard.md @@ -9,7 +9,9 @@ - [Computational Graph Visualization](#computational-graph-visualization) - [Dataset Graph Visualization](#dataset-graph-visualization) - [Image Visualization](#image-visualization) - - [Notices](#Notices) + - [Tensor Visualization](#tensor-visualization) + - [Notices](#notices) + @@ -173,15 +175,15 @@ Figure 13 shows tensors recorded by a user in a form of a histogram. Click the u 2. When using the Summary operator to collect data in training, 'HistogramSummary' operator affects performance, so please use as little as possible. 3. To limit memory usage, MindInsight limits the number of tags and steps: -- There are 300 tags at most in each training dashboard. Total number of scalar tags, image tags, computation graph tags, parameter distribution(histogram) tags, tensor tags can not exceed 300. Specially, there are 10 computation graph tags and 6 tensor tags at most. When tags exceed limit, MindInsight preserves the most recently processed tags. -- There are 1000 steps at most for each scalar tag in each training dashboard. When steps exceed limit, MindInsight will sample steps randomly to meet this limit. -- There are 10 steps at most for each image tag in each training dashboard. When steps exceed limit, MindInsight will sample steps randomly to meet this limit. -- There are 50 steps at most for each parameter distribution(histogram) tag in each training dashboard. When steps exceed limit, MindInsight will sample steps randomly to meet this limit. -- There are 20 steps at most for each tensor tag in each training dashboard. When steps exceed limit, MindInsight will sample steps randomly to meet this limit. + - There are 300 tags at most in each training dashboard. Total number of scalar tags, image tags, computation graph tags, parameter distribution(histogram) tags, tensor tags can not exceed 300. Specially, there are 10 computation graph tags and 6 tensor tags at most. When tags exceed limit, MindInsight preserves the most recently processed tags. + - There are 1000 steps at most for each scalar tag in each training dashboard. When steps exceed limit, MindInsight will sample steps randomly to meet this limit. + - There are 10 steps at most for each image tag in each training dashboard. When steps exceed limit, MindInsight will sample steps randomly to meet this limit. + - There are 50 steps at most for each parameter distribution(histogram) tag in each training dashboard. When steps exceed limit, MindInsight will sample steps randomly to meet this limit. + - There are 20 steps at most for each tensor tag in each training dashboard. When steps exceed limit, MindInsight will sample steps randomly to meet this limit. 4. Since `TensorSummary` will record complete tensor data, the amount of data is usually relatively large. In order to limit memory usage and ensure performance, MindInsight make the following restrictions with the size of tensor and the number of value responsed and displayed on the front end: -- MindInsight supports loading tensor containing up to 10 million values. -- After the tensor is loaded, in the tensor-visible table view, you can view a maximum of 100,000 values. If the value obtained by the selected dimension query exceeds this limit, it cannot be displayed. + - MindInsight supports loading tensor containing up to 10 million values. + - After the tensor is loaded, in the tensor-visible table view, you can view a maximum of 100,000 values. If the value obtained by the selected dimension query exceeds this limit, it cannot be displayed. 5. Since tensor visualizatioin (`TensorSummary`) records raw tensor data, it requires a large amount of storage space. Before using `TensorSummary` and during training, please check that the system storage space is sufficient. The storage space occupied by the tensor visualizatioin function can be reduced by the following methods: diff --git a/tutorials/source_en/advanced_use/lineage_and_scalars_comparision.md b/tutorials/source_en/advanced_use/lineage_and_scalars_comparision.md index f534c5642d4dd2a586a8f0765bdff4ddaa21e4c3..e4f4b2b8233960d5a00c98b1c0c2430ef722318c 100644 --- a/tutorials/source_en/advanced_use/lineage_and_scalars_comparision.md +++ b/tutorials/source_en/advanced_use/lineage_and_scalars_comparision.md @@ -7,7 +7,6 @@ - [Model Lineage](#model-lineage) - [Dataset Lineage](#dataset-lineage) - [Scalars Comparision](#scalars-comparision) - - [Specifications](#specifications) - [Notices](#notices) diff --git a/tutorials/source_en/advanced_use/summary_record.md b/tutorials/source_en/advanced_use/summary_record.md index dbc30a342274342e12f808703a5231ca08b8d929..07982670cec921762a8cdba1fefc153d6e99458a 100644 --- a/tutorials/source_en/advanced_use/summary_record.md +++ b/tutorials/source_en/advanced_use/summary_record.md @@ -1,21 +1,22 @@ -# Summary_Record +# Summary Record -- [Summary_Record](#summary_record) +- [Summary Record](#summary-record) - [Overview](#overview) - [Operation Process](#operation-process) - [Preparing the Training Script](#preparing-the-training-script) - - [Method one: Automatically collected through SummaryCollector](#method-one:-automatically-collected-through-summarycollector) - - [Method two: Custom collection of network data with summary operators and SummaryCollector](#method-two:-custom-collection-of-network-data-with-summary-operators-and-summarycollector) - - [Method three: Custom callback recording data](#method-three:-custom-callback-recording-data) - - [Notices](#Notices) + - [Method one: Automatically collected through SummaryCollector](#method-one-automatically-collected-through-summarycollector) + - [Method two: Custom collection of network data with summary operators and SummaryCollector](#method-two-custom-collection-of-network-data-with-summary-operators-and-summarycollector) + - [Method three: Custom callback recording data](#method-three-custom-callback-recording-data) + - [Notices](#notices) ## Overview + Scalars, images, computational graphs, and model hyperparameters during training are recorded in files and can be viewed on the web page. ## Operation Process diff --git a/tutorials/source_zh_cn/advanced_use/dashboard.md b/tutorials/source_zh_cn/advanced_use/dashboard.md index eb0d6054b85cb317bc7296046a1bc3d94179cf0b..eca509ed4102760adc5e29bb34cb2ba21b3e8917 100644 --- a/tutorials/source_zh_cn/advanced_use/dashboard.md +++ b/tutorials/source_zh_cn/advanced_use/dashboard.md @@ -1,6 +1,7 @@ # 训练看板 + - [训练看板](#训练看板) - [概述](#概述) - [标量可视化](#标量可视化) diff --git a/tutorials/source_zh_cn/advanced_use/summary_record.md b/tutorials/source_zh_cn/advanced_use/summary_record.md index cbe143e4195597dc89adeb1954497863e01695d0..9a794f9db5be924441d3e0c852db233e860302fe 100644 --- a/tutorials/source_zh_cn/advanced_use/summary_record.md +++ b/tutorials/source_zh_cn/advanced_use/summary_record.md @@ -6,9 +6,9 @@ - [概述](#概述) - [操作流程](#操作流程) - [准备训练脚本](#准备训练脚本) - - [方式一:通过SummaryCollector自动收集](#方式一:通过summaryCollector自动收集) - - [方式二:结合Summary算子和SummaryCollector,自定义收集网络中的数据](#方式二:结合summary算子和summaryCollector,自定义收集网络中的数据) - - [方式三:自定义Callback记录数据](#方式三:自定义Callback记录数据) + - [方式一:通过SummaryCollector自动收集](#方式一通过summarycollector自动收集) + - [方式二:结合Summary算子和SummaryCollector,自定义收集网络中的数据](#方式二结合summary算子和summarycollector自定义收集网络中的数据) + - [方式三:自定义Callback记录数据](#方式三自定义callback记录数据) - [注意事项](#注意事项) @@ -17,6 +17,7 @@ ## 概述 + 训练过程中的标量、图像、计算图以及模型超参等信息记录到文件中,通过可视化界面供用户查看。 ## 操作流程