提交 5979bb93 编写于 作者: L LiHongzhang 提交者: Li Hongzhang

add system metrics tutorial section

上级 718db185
# System Metrics
<!-- TOC -->
- [Overview](#overview)
- [Ascend AI Processor Board](#ascend-ai-processor-board)
- [CPU Board](#cpu-board)
- [Memory Board](#memory-board)
<!-- /TOC -->
<a href="https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/advanced_use/system_metrics.md" target="_blank"><img src="../_static/logo_source.png"></a>
## Overview
Users can view system metrics such as Ascend AI processor, CPU, memory, etc., so as to allocate appropriate resources for training.
Just [Start MindInsight](https://www.mindspore.cn/tutorial/en/master/advanced_use/mindinsight_commands.html#start-the-service), and click "System Metrics" in the navigation bar to view it.
## Ascend AI Processor Board
The Ascend AI processor board is used to view the current information of each NPU chip.
![sysmetric_npu.png](./images/sysmetric_npu.png)
Figure 1: System metrics Ascend AI processor board
Figure 1 is a table, each row shows the information of each NPU chip at a certain time. The metrics in each column are as follows:
- **Name**: The name of the chip.
- **NPU**: The chip number, from `0` to `7`.
- **Available**: Whether the chip is available.
- **Health**: The chip health status.
- **IP Address**: The chip IP address.
- **AI Core(%)**: The chip utilization.
- **HBM-Usage(MB)**: The chip used HBM memory.
- **Power(W)**: The chip power.
- **Temp(°C)**: The chip temperature.
> The result of whether the chip is available is for reference only.
## CPU Board
The CPU board is used to view the current system CPU total and the information of each core.
![sysmetric_cpu.png](./images/sysmetric_cpu.png)
Figure 2: System Metrics CPU board
The two-dimensional table in Figure 2 shows the percentage of CPU utilization for each core; the following two rows show the detailed metrics of *CPU-total* and *CPU-selected*.
- **user**: Time spent by normal processes executing in user mode.
- **system**: Time spent by processes executing in kernel mode
- **idle**: Time spent doing nothing.
- **nice**: Time spent by niced (prioritized) processes executing in user mode.
- **iowait**: Time spent waiting for I/O to complete.
- **irq**: Time spent for servicing hardware interrupts.
- **softirq**: Time spent for servicing software interrupts.
- **steal**: Time spent by other operating systems running in a virtualized environment.
- **guest**: Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.
- **guest_nice**: Time spent running a niced guest.
- **interrupt**: Time spent for servicing hardware interrupts.
- **dpc**: Time spent servicing deferred procedure calls (DPCs).
> The CPU metrics displayed by different systems may be different.
## Memory Board
The memory board is used to view the current system memory information.
![sysmetric_mem.png](./images/sysmetric_mem.png)
Figure 3: Hardware resource memory board
Figure 3 shows a pie chart showing used memory and available memory. Other memory types are classified into *others*.
......@@ -5,5 +5,6 @@ Training Process Visualization
:maxdepth: 1
dashboard_and_lineage
system_metrics
performance_profiling
mindinsight_commands
# 硬件资源
<!-- TOC -->
- [概述](#概述)
- [昇腾AI处理器看板](#昇腾ai处理器看板)
- [CPU看板](#cpu看板)
- [内存看板](#内存看板)
<!-- /TOC -->
<a href="https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/advanced_use/system_metrics.md" target="_blank"><img src="../_static/logo_source.png"></a>&nbsp;&nbsp;
## 概述
用户可查看昇腾AI处理器、CPU、内存等系统指标,从而分配适当的资源进行训练。直接[启动MindInsight](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mindinsight_commands.html#id3),点击导航栏的“硬件资源”即可查看。
## 昇腾AI处理器看板
昇腾AI处理器看板用于查看当前各芯片的信息。
![sysmetric_npu.png](./images/sysmetric_npu.png)
图1:硬件资源昇腾AI处理器看板
图1是一个表格,每一行展示了某一时刻各芯片的信息。其中每一列的指标如下:
- **Name**: 芯片名称。
- **NPU**: 芯片号,从`0``7`.
- **Available**: 芯片是否空闲。
- **Health**: 芯片健康指数。
- **IP Address**: 芯片IP地址。
- **AI Core(%)**: 芯片利用率。
- **HBM-Usage(MB)**: 芯片已用的HBM内存。
- **Power(W)**: 芯片功率。
- **Temp(°C)**: 芯片温度。
> 目前芯片是否空闲的结果仅供参考。
## CPU看板
CPU看板用于查看当前系统CPU总计及每个核的信息。
![sysmetric_cpu.png](./images/sysmetric_cpu.png)
图2:硬件资源CPU看板
图2的二维表格展示了CPU每个核的使用率百分比;下面两行分别展示了*CPU-总计**CPU-选中*的详细指标。
- **user**: 运行于用户态的时间百分比。
- **system**: 运行于内核态的时间百分比。
- **idle**: 处于空闲状态的时间百分比。
- **nice**: 用于运行低优先级进程的时间百分比。
- **iowait**: 用于等待IO的时间百分比。
- **irq**: 用于处理硬中断的时间百分比。
- **softirq**: 用于处理软中断的时间百分比。
- **steal**: 被其他虚拟机抢夺的时间百分比。
- **guest**: 用于运行虚拟机的时间百分比。
- **guest_nice**: 用于运行低优先级虚拟机的时间百分比。
- **interrupt**: 用于处理硬中断的时间百分比。
- **dpc**: 用于远程调用的时间百分比。
> 不同的系统展示的CPU指标可能有所不同。
## 内存看板
内存用于查看当前系统内存的信息。
![sysmetric_mem.png](./images/sysmetric_mem.png)
图3:硬件资源内存看板
图3用一个扇形图展示了已用内存和可用内存。其他的内存归类到了*others*一项。
......@@ -5,5 +5,6 @@
:maxdepth: 1
dashboard_and_lineage
system_metrics
performance_profiling
mindinsight_commands
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册