提交 3c4ebe5b 编写于 作者: M mindspore-ci-bot 提交者: Gitee

!943 add index of docs

Merge pull request !943 from TingWang/add-lite-api
# mindspore::lite
## Allocator
Allocator defines a memory pool for dynamic memory malloc and memory free.
## Context
Context is defined for holding environment variables during runtime.
**Constructors & Destructors**
```
Context()
```
Constructor of MindSpore Lite Context using default value for parameters.
```
Context(int thread_num, std::shared_ptr< Allocator > allocator, DeviceContext device_ctx)
```
Constructor of MindSpore Lite Context using input value for parameters.
- Parameters
- `thread_num`: Define the work thread number during the runtime.
- `allocator`: Define the allocator for malloc.
- `device_ctx`: Define device information during the runtime.
- Returns
The instance of MindSpore Lite Context.
```
~Context()
```
Destructor of MindSpore Lite Context.
**Public Attributes**
```
float16_priority
```
A **bool** value. Defaults to **false**. Prior enable float16 inference.
```
device_ctx_{DT_CPU}
```
A **DeviceContext** struct.
```
thread_num_
```
An **int** value. Defaults to **2**. Thread number config for thread pool.
```
allocator
```
A **std::shared_ptr<Allocator>** pointer.
```
cpu_bind_mode_
```
A **CpuBindMode** enum variable. Defaults to **MID_CPU**.
## CpuBindMode
An **enum** type. CpuBindMode defined for holding bind cpu strategy argument.
**Attributes**
```
MID_CPU = -1
```
Bind middle cpu first.
```
HIGHER_CPU = 1
```
Bind higher cpu first.
```
NO_BIND = 0
```
No bind.
## DeviceType
An **enum** type. DeviceType defined for holding user's preferred backend.
**Attributes**
```
DT_CPU = -1
```
CPU device type.
```
DT_GPU = 1
```
GPU device type.
```
DT_NPU = 0
```
NPU device type, not supported yet.
## DeviceContext
A **struct** . DeviceContext defined for holding DeviceType.
**Attributes**
```
type
```
A **DeviceType** variable. The device type.
\ No newline at end of file
# mindspore::lite
**Functions**
```
std::string Version()
```
Global method to get a version string.
- Returns
The version string of MindSpore Lite.
C++ API
=======
.. toctree::
:maxdepth: 1
class_list
lite
session
tensor
errorcode_and_metatype
\ No newline at end of file
Here is a list of all namespace members with links to the namespace documentation for each member: # Class List
Here is a list of all classes with links to the namespace documentation for each member:
| Namespace | Class Name | Description | | Namespace | Class Name | Description |
| --- | --- | --- | | --- | --- | --- |
| mindspore::lite | [Allocator](https://www.mindspore.cn/lite/docs/en/master/api/context.html#allocator) | Allocator defines a memory pool for dynamic memory malloc and memory free. | | mindspore::lite | [Allocator](https://www.mindspore.cn/lite/docs/en/master/apicc/lite.html#allocator) | Allocator defines a memory pool for dynamic memory malloc and memory free. |
| mindspore::lite | [Context](https://www.mindspore.cn/lite/docs/en/master/api/context.html#context) | Context defines for holding environment variables during runtime. | | mindspore::lite | [Context](https://www.mindspore.cn/lite/docs/en/master/apicc/lite.html#context) | Context defines for holding environment variables during runtime. |
| mindspore::lite | [ModelImpl](https://www.mindspore.cn/lite/docs/en/master/api/model.html#modelimpl) | ModelImpl defines the implement class of Model in MindSpore Lite. | | mindspore::lite | [ModelImpl](https://www.mindspore.cn/lite/docs/en/master/apicc/lite.html#modelimpl) | ModelImpl defines the implement class of Model in MindSpore Lite. |
| mindspore::lite | [PrimitiveC](https://www.mindspore.cn/lite/docs/en/master/api/model.html#primitivec) | Primitive defines as prototype of operator. | | mindspore::lite | [PrimitiveC](https://www.mindspore.cn/lite/docs/en/master/apicc/lite.html#primitivec) | Primitive defines as prototype of operator. |
| mindspore::lite | [Model](https://www.mindspore.cn/lite/docs/en/master/api/model.html#model) | Model defines model in MindSpore Lite for managing graph. | | mindspore::lite | [Model](https://www.mindspore.cn/lite/docs/en/master/apicc/lite.html#model) | Model defines model in MindSpore Lite for managing graph. |
| mindspore::lite | [ModelBuilder](https://www.mindspore.cn/lite/docs/en/master/api/model.html#modelbuilder) | ModelBuilder is defined by MindSpore Lite. | | mindspore::lite | [ModelBuilder](https://www.mindspore.cn/lite/docs/en/master/apicc/lite.html#modelbuilder) | ModelBuilder is defined by MindSpore Lite. |
| mindspore::session | [LiteSession](https://www.mindspore.cn/lite/docs/en/master/api/lite_session.html#litesession) | LiteSession defines session in MindSpore Lite for compiling Model and forwarding model. | | mindspore::session | [LiteSession](https://www.mindspore.cn/lite/docs/en/master/apicc/session.html#litesession) | LiteSession defines session in MindSpore Lite for compiling Model and forwarding model. |
| mindspore::tensor | [MSTensor](https://www.mindspore.cn/lite/docs/en/master/api/ms_tensor.html#mstensor) | MSTensor defines tensor in MindSpore Lite. | | mindspore::tensor | [MSTensor](https://www.mindspore.cn/lite/docs/en/master/apicc/tensor.html#mstensor) | MSTensor defines tensor in MindSpore Lite. |
\ No newline at end of file \ No newline at end of file
# ErrorCode and MetaType
Description of error code and meta type supported in MindSpore Lite. Description of error code and meta type supported in MindSpore Lite.
# ErrorCode ## ErrorCode
| Definition | Value | Description | | Definition | Value | Description |
| --- | --- | --- | | --- | --- | --- |
...@@ -23,7 +25,7 @@ Description of error code and meta type supported in MindSpore Lite. ...@@ -23,7 +25,7 @@ Description of error code and meta type supported in MindSpore Lite.
| RET_INFER_ERR | -501 | Failed to infer shape. | | RET_INFER_ERR | -501 | Failed to infer shape. |
| RET_INFER_INVALID | -502 | Invalid infer shape before runtime. | | RET_INFER_INVALID | -502 | Invalid infer shape before runtime. |
# MetaType ## MetaType
An **enum** type. An **enum** type.
| Type Name | Definition | Value | Description | | Type Name | Definition | Value | Description |
......
# mindspore::lite # mindspore::lite context
## ModelImpl
ModelImpl defines the implement class of Model in MindSpore Lite. ## Allocator
## PrimitiveC Allocator defines a memory pool for dynamic memory malloc and memory free.
Primitive is defined as prototype of operator.
## Context
## Model
Model defines model in MindSpore Lite for managing graph. Context is defined for holding environment variables during runtime.
**Constructors & Destructors** **Constructors & Destructors**
```
Model() ```
``` Context()
```
Constructor of MindSpore Lite Model using default value for parameters.
Constructor of MindSpore Lite Context using default value for parameters.
```
virtual ~Model() ```
``` Context(int thread_num, std::shared_ptr< Allocator > allocator, DeviceContext device_ctx)
```
Destructor of MindSpore Lite Model. Constructor of MindSpore Lite Context using input value for parameters.
**Public Member Functions** - Parameters
```
PrimitiveC* GetOp(const std::string &name) const - `thread_num`: Define the work thread number during the runtime.
```
Get MindSpore Lite Primitive by name. - `allocator`: Define the allocator for malloc.
- Parameters - `device_ctx`: Define device information during the runtime.
- `name`: Define name of primitive to be returned. - Returns
- Returns The instance of MindSpore Lite Context.
The pointer of MindSpore Lite Primitive. ```
~Context()
``` ```
const schema::MetaGraph* GetMetaGraph() const Destructor of MindSpore Lite Context.
```
Get graph defined in flatbuffers. **Public Attributes**
- Returns ```
float16_priority
The pointer of graph defined in flatbuffers. ```
A **bool** value. Defaults to **false**. Prior enable float16 inference.
```
void FreeMetaGraph() ```
``` device_ctx_{DT_CPU}
Free MetaGraph in MindSpore Lite Model. ```
A **DeviceContext** struct.
**Static Public Member Functions**
``` ```
static Model *Import(const char *model_buf, size_t size) thread_num_
``` ```
Static method to create a Model pointer.
An **int** value. Defaults to **2**. Thread number config for thread pool.
- Parameters
```
- `model_buf`: Define the buffer read from a model file. allocator
```
- `size`: variable. Define bytes number of model buffer.
A **std::shared_ptr<Allocator>** pointer.
- Returns
```
Pointer of MindSpore Lite Model. cpu_bind_mode_
```
**Public Attributes**
``` A **CpuBindMode** enum variable. Defaults to **MID_CPU**.
model_impl_
``` ## ModelImpl
The **pointer** of implement of model in MindSpore Lite. Defaults to **nullptr**. ModelImpl defines the implement class of Model in MindSpore Lite.
## ModelBuilder ## PrimitiveC
ModelBuilder is defined by MindSpore Lite. Primitive is defined as prototype of operator.
**Constructors & Destructors** ## Model
``` Model defines model in MindSpore Lite for managing graph.
ModelBuilder()
``` **Constructors & Destructors**
```
Constructor of MindSpore Lite ModelBuilder using default value for parameters. Model()
```
```
virtual ~ModelBuilder() Constructor of MindSpore Lite Model using default value for parameters.
```
```
Destructor of MindSpore Lite ModelBuilder. virtual ~Model()
```
**Public Member Functions**
``` Destructor of MindSpore Lite Model.
virtual std::string AddOp(const PrimitiveC &op, const std::vector<OutEdge> &inputs)
``` **Public Member Functions**
```
Add primitive into model builder for model building. PrimitiveC* GetOp(const std::string &name) const
```
- Parameters Get MindSpore Lite Primitive by name.
- `op`: Define the primitive to be added. - Parameters
- `inputs`: Define input edge of primitive to be added. - `name`: Define name of primitive to be returned.
- Returns - Returns
ID of the added primitive. The pointer of MindSpore Lite Primitive.
``` ```
const schema::MetaGraph* GetMetaGraph() const const schema::MetaGraph* GetMetaGraph() const
``` ```
Get graph defined in flatbuffers. Get graph defined in flatbuffers.
- Returns - Returns
The pointer of graph defined in flatbuffers. The pointer of graph defined in flatbuffers.
``` ```
virtual Model *Construct() void FreeMetaGraph()
``` ```
Finish constructing the model. Free MetaGraph in MindSpore Lite Model.
## OutEdge **Static Public Member Functions**
**Attributes** ```
``` static Model *Import(const char *model_buf, size_t size)
nodeId ```
``` Static method to create a Model pointer.
A **string** variable. ID of a node linked by this edge.
- Parameters
```
outEdgeIndex - `model_buf`: Define the buffer read from a model file.
```
A **size_t** variable. Index of this edge. - `size`: variable. Define bytes number of model buffer.
\ No newline at end of file
- Returns
Pointer of MindSpore Lite Model.
**Public Attributes**
```
model_impl_
```
The **pointer** of implement of model in MindSpore Lite. Defaults to **nullptr**.
## ModelBuilder
ModelBuilder is defined by MindSpore Lite.
**Constructors & Destructors**
```
ModelBuilder()
```
Constructor of MindSpore Lite ModelBuilder using default value for parameters.
```
virtual ~ModelBuilder()
```
Destructor of MindSpore Lite ModelBuilder.
**Public Member Functions**
```
virtual std::string AddOp(const PrimitiveC &op, const std::vector<OutEdge> &inputs)
```
Add primitive into model builder for model building.
- Parameters
- `op`: Define the primitive to be added.
- `inputs`: Define input edge of primitive to be added.
- Returns
ID of the added primitive.
```
const schema::MetaGraph* GetMetaGraph() const
```
Get graph defined in flatbuffers.
- Returns
The pointer of graph defined in flatbuffers.
```
virtual Model *Construct()
```
Finish constructing the model.
## OutEdge
**Attributes**
```
nodeId
```
A **string** variable. ID of a node linked by this edge.
```
outEdgeIndex
```
A **size_t** variable. Index of this edge.
## CpuBindMode
An **enum** type. CpuBindMode defined for holding bind cpu strategy argument.
**Attributes**
```
MID_CPU = -1
```
Bind middle cpu first.
```
HIGHER_CPU = 1
```
Bind higher cpu first.
```
NO_BIND = 0
```
No bind.
## DeviceType
An **enum** type. DeviceType defined for holding user's preferred backend.
**Attributes**
```
DT_CPU = -1
```
CPU device type.
```
DT_GPU = 1
```
GPU device type.
```
DT_NPU = 0
```
NPU device type, not supported yet.
## DeviceContext
A **struct** . DeviceContext defined for holding DeviceType.
**Attributes**
```
type
```
A **DeviceType** variable. The device type.
## Version
```
std::string Version()
```
Global method to get a version string.
- Returns
The version string of MindSpore Lite.
\ No newline at end of file
# mindspore::session # mindspore::session
## LiteSession ## LiteSession
LiteSession defines session in MindSpore Lite for compiling Model and forwarding model. LiteSession defines session in MindSpore Lite for compiling Model and forwarding model.
**Constructors & Destructors** **Constructors & Destructors**
``` ```
LiteSession() LiteSession()
``` ```
Constructor of MindSpore Lite LiteSession using default value for parameters. Constructor of MindSpore Lite LiteSession using default value for parameters.
``` ```
~LiteSession() ~LiteSession()
``` ```
Destructor of MindSpore Lite LiteSession. Destructor of MindSpore Lite LiteSession.
**Public Member Functions** **Public Member Functions**
``` ```
virtual void BindThread(bool if_bind) virtual void BindThread(bool if_bind)
``` ```
Attempt to bind or unbind threads in the thread pool to or from the specified cpu core. Attempt to bind or unbind threads in the thread pool to or from the specified cpu core.
- Parameters - Parameters
- `if_bind`: Define whether to bind or unbind threads. - `if_bind`: Define whether to bind or unbind threads.
``` ```
virtual int CompileGraph(lite::Model *model) virtual int CompileGraph(lite::Model *model)
``` ```
Compile MindSpore Lite model. Compile MindSpore Lite model.
> Note: CompileGraph should be called before RunGraph. > Note: CompileGraph should be called before RunGraph.
- Parameters - Parameters
- `model`: Define the model to be compiled. - `model`: Define the model to be compiled.
- Returns - Returns
STATUS as an error code of compiling graph, STATUS is defined in errorcode.h. STATUS as an error code of compiling graph, STATUS is defined in errorcode.h.
``` ```
virtual std::vector <tensor::MSTensor *> GetInputs() const virtual std::vector <tensor::MSTensor *> GetInputs() const
``` ```
Get input MindSpore Lite MSTensors of model. Get input MindSpore Lite MSTensors of model.
- Returns - Returns
The vector of MindSpore Lite MSTensor. The vector of MindSpore Lite MSTensor.
``` ```
std::vector <tensor::MSTensor *> GetInputsByName(const std::string &node_name) const std::vector <tensor::MSTensor *> GetInputsByName(const std::string &node_name) const
``` ```
Get input MindSpore Lite MSTensors of model by node name. Get input MindSpore Lite MSTensors of model by node name.
- Parameters - Parameters
- `node_name`: Define node name. - `node_name`: Define node name.
- Returns - Returns
The vector of MindSpore Lite MSTensor. The vector of MindSpore Lite MSTensor.
``` ```
virtual int RunGraph(const KernelCallBack &before = nullptr, const KernelCallBack &after = nullptr) virtual int RunGraph(const KernelCallBack &before = nullptr, const KernelCallBack &after = nullptr)
``` ```
Run session with callback. Run session with callback.
> Note: RunGraph should be called after CompileGraph. > Note: RunGraph should be called after CompileGraph.
- Parameters - Parameters
- `before`: Define a call_back_function to be called before running each node. - `before`: Define a call_back_function to be called before running each node.
- `after`: Define a call_back_function called after running each node. - `after`: Define a call_back_function called after running each node.
- Returns - Returns
STATUS as an error code of running graph, STATUS is defined in errorcode.h. STATUS as an error code of running graph, STATUS is defined in errorcode.h.
``` ```
virtual std::unordered_map<std::string, std::vector<mindspore::tensor::MSTensor *>> GetOutputMapByNode() const virtual std::unordered_map<std::string, std::vector<mindspore::tensor::MSTensor *>> GetOutputMapByNode() const
``` ```
Get output MindSpore Lite MSTensors of model mapped by node name. Get output MindSpore Lite MSTensors of model mapped by node name.
- Returns - Returns
The map of output node name and MindSpore Lite MSTensor. The map of output node name and MindSpore Lite MSTensor.
``` ```
virtual std::vector <tensor::MSTensor *> GetOutputsByNodeName(const std::string &node_name) const virtual std::vector <tensor::MSTensor *> GetOutputsByNodeName(const std::string &node_name) const
``` ```
Get output MindSpore Lite MSTensors of model by node name. Get output MindSpore Lite MSTensors of model by node name.
- Parameters - Parameters
- `node_name`: Define node name. - `node_name`: Define node name.
- Returns - Returns
The vector of MindSpore Lite MSTensor. The vector of MindSpore Lite MSTensor.
``` ```
virtual std::unordered_map <std::string, mindspore::tensor::MSTensor *> GetOutputMapByTensor() const virtual std::unordered_map <std::string, mindspore::tensor::MSTensor *> GetOutputMapByTensor() const
``` ```
Get output MindSpore Lite MSTensors of model mapped by tensor name. Get output MindSpore Lite MSTensors of model mapped by tensor name.
- Returns - Returns
The map of output tensor name and MindSpore Lite MSTensor. The map of output tensor name and MindSpore Lite MSTensor.
``` ```
virtual std::vector <std::string> GetOutputTensorNames() const virtual std::vector <std::string> GetOutputTensorNames() const
``` ```
Get name of output tensors of model compiled by this session. Get name of output tensors of model compiled by this session.
- Returns - Returns
The vector of string as output tensor names in order. The vector of string as output tensor names in order.
``` ```
virtual mindspore::tensor::MSTensor *GetOutputByTensorName(const std::string &tensor_name) const virtual mindspore::tensor::MSTensor *GetOutputByTensorName(const std::string &tensor_name) const
``` ```
Get output MindSpore Lite MSTensors of model by tensor name. Get output MindSpore Lite MSTensors of model by tensor name.
- Parameters - Parameters
- `tensor_name`: Define tensor name. - `tensor_name`: Define tensor name.
- Returns - Returns
Pointer of MindSpore Lite MSTensor. Pointer of MindSpore Lite MSTensor.
``` ```
virtual mindspore::tensor::MSTensor *GetOutputByTensorName(const std::string &tensor_name) const virtual mindspore::tensor::MSTensor *GetOutputByTensorName(const std::string &tensor_name) const
``` ```
Get output MindSpore Lite MSTensors of model by tensor name. Get output MindSpore Lite MSTensors of model by tensor name.
- Parameters - Parameters
- `tensor_name`: Define tensor name. - `tensor_name`: Define tensor name.
- Returns - Returns
Pointer of MindSpore Lite MSTensor. Pointer of MindSpore Lite MSTensor.
``` ```
virtual int Resize(const std::vector <tensor::MSTensor *> &inputs) virtual int Resize(const std::vector <tensor::MSTensor *> &inputs)
``` ```
Resize inputs shape. Resize inputs shape.
- Parameters - Parameters
- `inputs`: Define the new inputs shape. - `inputs`: Define the new inputs shape.
- Returns - Returns
STATUS as an error code of resize inputs, STATUS is defined in errorcode.h. STATUS as an error code of resize inputs, STATUS is defined in errorcode.h.
**Static Public Member Functions** **Static Public Member Functions**
``` ```
static LiteSession *CreateSession(lite::Context *context) static LiteSession *CreateSession(lite::Context *context)
``` ```
Static method to create a LiteSession pointer. Static method to create a LiteSession pointer.
- Parameters - Parameters
- `context`: Define the context of session to be created. - `context`: Define the context of session to be created.
- Returns - Returns
Pointer of MindSpore Lite LiteSession. Pointer of MindSpore Lite LiteSession.
## CallBackParam ## CallBackParam
CallBackParam defines input arguments for callBack function. CallBackParam defines input arguments for callBack function.
**Attributes** **Attributes**
``` ```
name_callback_param name_callback_param
``` ```
A **string** variable. Node name argument. A **string** variable. Node name argument.
``` ```
type_callback_param type_callback_param
``` ```
A **string** variable. Node type argument. A **string** variable. Node type argument.
\ No newline at end of file
...@@ -11,5 +11,6 @@ MindSpore Lite Documentation ...@@ -11,5 +11,6 @@ MindSpore Lite Documentation
:maxdepth: 1 :maxdepth: 1
architecture architecture
apicc/apicc
operator_list operator_list
glossary glossary
# MobileNetV2 增量学习 # MobileNetV2 增量学习
`CPU` `Ascend` `GPU` `模型开发` `中级` `高级` `CPU` `Ascend` `GPU` `模型开发` `中级` `高级`
<!-- TOC --> <!-- TOC -->
- [增量学习](#增量学习) - [增量学习](#增量学习)
- [概述](#概述) - [概述](#概述)
- [任务描述及准备](#任务描述及准备) - [任务描述及准备](#任务描述及准备)
- [环境配置](#环境配置) - [环境配置](#环境配置)
- [下载代码](#下载代码) - [下载代码](#下载代码)
- [准备预训练模型](#准备预训练模型) - [准备预训练模型](#准备预训练模型)
- [准备数据](#准备数据) - [准备数据](#准备数据)
- [预训练模型加载代码详解](#预训练模型加载代码详解) - [预训练模型加载代码详解](#预训练模型加载代码详解)
- [参数简介](#参数简介) - [参数简介](#参数简介)
- [运行Python文件](#运行python文件) - [运行Python文件](#运行python文件)
- [运行Shell脚本](#运行shell脚本) - [运行Shell脚本](#运行shell脚本)
- [加载增量学习训练](#加载增量学习训练) - [加载增量学习训练](#加载增量学习训练)
- [CPU加载训练](#cpu加载训练) - [CPU加载训练](#cpu加载训练)
- [GPU加载训练](#gpu加载训练) - [GPU加载训练](#gpu加载训练)
- [Ascend加载训练](#ascend加载训练) - [Ascend加载训练](#ascend加载训练)
- [增量学习训练结果](#增量学习训练结果) - [增量学习训练结果](#增量学习训练结果)
- [验证增量学习训练模型](#验证增量学习训练模型) - [验证增量学习训练模型](#验证增量学习训练模型)
- [验证模型](#验证模型) - [验证模型](#验证模型)
- [验证结果](#验证结果) - [验证结果](#验证结果)
<!-- /TOC --> <!-- /TOC -->
<a href="https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/advanced_use/mobilenetv2_incremental_learn.md" target="_blank"><img src="../_static/logo_source.png"></a>&nbsp;&nbsp; <a href="https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/advanced_use/mobilenetv2_incremental_learn.md" target="_blank"><img src="../_static/logo_source.png"></a>&nbsp;&nbsp;
## 概述 ## 概述
计算机视觉任务中,从头开始训练一个网络耗时巨大,需要大量计算能力。预训练模型选择的常见的OpenImage、ImageNet、VOC、COCO等公开大型数据集,规模达到几十万甚至超过上百万张。大部分任务数据规模较大,训练网络模型时,如果不使用预训练模型,从头开始训练网络,需要消耗大量的时间与计算能力,模型容易陷入局部极小值和过拟合。因此大部分任务都会选择预训练模型,在其上做增量学习。 计算机视觉任务中,从头开始训练一个网络耗时巨大,需要大量计算能力。预训练模型选择的常见的OpenImage、ImageNet、VOC、COCO等公开大型数据集,规模达到几十万甚至超过上百万张。大部分任务数据规模较大,训练网络模型时,如果不使用预训练模型,从头开始训练网络,需要消耗大量的时间与计算能力,模型容易陷入局部极小值和过拟合。因此大部分任务都会选择预训练模型,在其上做增量学习。
MindSpore是一个多元化的机器学习框架。既可以在手机等端侧和PC等设备上运行,也可以在云上的服务器集群上运行。目前MobileNetV2支持在Windows系统中使用单核CPU做增量学习,在EulerOS、Ubuntu系统中使用单个或者多个Ascend AI处理器或GPU中做增量学习,本教程将会介绍如何在不同系统与处理器下的MindSpore框架中做增量学习的训练与验证。 MindSpore是一个多元化的机器学习框架。既可以在手机等端侧和PC等设备上运行,也可以在云上的服务器集群上运行。目前MobileNetV2支持在Windows系统中使用单核CPU做增量学习,在EulerOS、Ubuntu系统中使用单个或者多个Ascend AI处理器或GPU中做增量学习,本教程将会介绍如何在不同系统与处理器下的MindSpore框架中做增量学习的训练与验证。
目前,Window上暂只支持支持CPU,Ubuntu与EulerOS上支持CPU、GPU与Ascend AI处理器三种处理器。 目前,Window上暂只支持支持CPU,Ubuntu与EulerOS上支持CPU、GPU与Ascend AI处理器三种处理器。
>你可以在这里找到完整可运行的样例代码:https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/mobilenetv2 >你可以在这里找到完整可运行的样例代码:https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/mobilenetv2
## 任务描述及准备 ## 任务描述及准备
### 环境配置 ### 环境配置
若在本地环境运行,需要安装MindSpore框架,配置CPU、GPU或Ascend AI处理器。若在华为云环境上运行,不需要安装MindSpore框架,不需要配置Ascend AI处理器、CPU与GPU,可以跳过本小节。 若在本地环境运行,需要安装MindSpore框架,配置CPU、GPU或Ascend AI处理器。若在华为云环境上运行,不需要安装MindSpore框架,不需要配置Ascend AI处理器、CPU与GPU,可以跳过本小节。
1. 安装MindSpore框架 1. 安装MindSpore框架
在EulerOS、Ubuntu或者Windows等系统上需要根据系统和处理器架构[安装对应版本MindSpore框架](https://www.mindspore.cn/install) 在EulerOS、Ubuntu或者Windows等系统上需要根据系统和处理器架构[安装对应版本MindSpore框架](https://www.mindspore.cn/install)
2. 配置CPU环境 2. 配置CPU环境
使用CPU时,在代码中,需要在调用CPU开始训练或测试前,按照如下代码设置: 使用CPU时,在代码中,需要在调用CPU开始训练或测试前,按照如下代码设置:
```Python ```Python
if config.platform == "CPU": if config.platform == "CPU":
context.set_context(mode=context.GRAPH_MODE, device_target=config.platform, \ context.set_context(mode=context.GRAPH_MODE, device_target=config.platform, \
save_graphs=False) save_graphs=False)
``` ```
3. 配置GPU环境 3. 配置GPU环境
使用GPU时,在代码中,需要在调用GPU开始训练或测试前,按照如下代码设置: 使用GPU时,在代码中,需要在调用GPU开始训练或测试前,按照如下代码设置:
```Python ```Python
elif config.platform == "GPU": elif config.platform == "GPU":
context.set_context(mode=context.GRAPH_MODE, device_target=config.platform, \ context.set_context(mode=context.GRAPH_MODE, device_target=config.platform, \
save_graphs=False) save_graphs=False)
init("nccl") init("nccl")
context.set_auto_parallel_context(device_num=get_group_size(), context.set_auto_parallel_context(device_num=get_group_size(),
parallel_mode=ParallelMode.DATA_PARALLEL, parallel_mode=ParallelMode.DATA_PARALLEL,
mirror_mean=True) mirror_mean=True)
``` ```
4. 配置Ascend环境 4. 配置Ascend环境
以Ascend 910 AI处理器为例,1个8个处理器环境的json配置文件`hccl_config.json`示例如下。单/多处理器环境可以根据以下示例调整`"server_count"``device` 以Ascend 910 AI处理器为例,1个8个处理器环境的json配置文件`hccl_config.json`示例如下。单/多处理器环境可以根据以下示例调整`"server_count"``device`
```json ```json
{ {
"version": "1.0", "version": "1.0",
"server_count": "1", "server_count": "1",
"server_list": [ "server_list": [
{ {
"server_id": "10.155.111.140", "server_id": "10.155.111.140",
"device": [ba "device": [ba
{"device_id": "0","device_ip": "192.1.27.6","rank_id": "0"}, {"device_id": "0","device_ip": "192.1.27.6","rank_id": "0"},
{"device_id": "1","device_ip": "192.2.27.6","rank_id": "1"}, {"device_id": "1","device_ip": "192.2.27.6","rank_id": "1"},
{"device_id": "2","device_ip": "192.3.27.6","rank_id": "2"}, {"device_id": "2","device_ip": "192.3.27.6","rank_id": "2"},
{"device_id": "3","device_ip": "192.4.27.6","rank_id": "3"}, {"device_id": "3","device_ip": "192.4.27.6","rank_id": "3"},
{"device_id": "4","device_ip": "192.1.27.7","rank_id": "4"}, {"device_id": "4","device_ip": "192.1.27.7","rank_id": "4"},
{"device_id": "5","device_ip": "192.2.27.7","rank_id": "5"}, {"device_id": "5","device_ip": "192.2.27.7","rank_id": "5"},
{"device_id": "6","device_ip": "192.3.27.7","rank_id": "6"}, {"device_id": "6","device_ip": "192.3.27.7","rank_id": "6"},
{"device_id": "7","device_ip": "192.4.27.7","rank_id": "7"}], {"device_id": "7","device_ip": "192.4.27.7","rank_id": "7"}],
"host_nic_ip": "reserve" "host_nic_ip": "reserve"
} }
], ],
"status": "completed" "status": "completed"
} }
``` ```
使用Ascend AI处理器时,在代码中,需要在调用Ascend AI处理器开始训练或测试前,按照如下代码设置: 使用Ascend AI处理器时,在代码中,需要在调用Ascend AI处理器开始训练或测试前,按照如下代码设置:
```Python ```Python
elif config.platform == "Ascend": elif config.platform == "Ascend":
context.set_context(mode=context.GRAPH_MODE, device_target=config.platform, \ context.set_context(mode=context.GRAPH_MODE, device_target=config.platform, \
device_id=config.device_id, save_graphs=False) device_id=config.device_id, save_graphs=False)
if config.run_distribute: if config.run_distribute:
context.set_auto_parallel_context(device_num=config.rank_size, context.set_auto_parallel_context(device_num=config.rank_size,
parallel_mode=ParallelMode.DATA_PARALLEL, parallel_mode=ParallelMode.DATA_PARALLEL,
parameter_broadcast=True, mirror_mean=True) parameter_broadcast=True, mirror_mean=True)
auto_parallel_context().set_all_reduce_fusion_split_indices([140]) auto_parallel_context().set_all_reduce_fusion_split_indices([140])
init() init()
... ...
``` ```
### 下载代码 ### 下载代码
在Gitee中克隆[MindSpore开源项目仓库](https://gitee.com/mindspore/mindspore.git),进入`./model_zoo/official/cv/mobilenetv2/` 在Gitee中克隆[MindSpore开源项目仓库](https://gitee.com/mindspore/mindspore.git),进入`./model_zoo/official/cv/mobilenetv2/`
```bash ```bash
git clone https://gitee.com/mindspore/mindspore/pulls/5766 git clone https://gitee.com/mindspore/mindspore/pulls/5766
cd ./mindspore/model_zoo/official/cv/mobilenetv2 cd ./mindspore/model_zoo/official/cv/mobilenetv2
``` ```
代码结构如下: 代码结构如下:
``` ```
├─MobileNetV2 ├─MobileNetV2
├─README.md # descriptions about MobileNetV2 ├─README.md # descriptions about MobileNetV2
├─scripts ├─scripts
│ run_train.sh # Shell script for train with Ascend or GPU │ run_train.sh # Shell script for train with Ascend or GPU
│ run_eval.sh # Shell script for evaluation with Ascend or GPU │ run_eval.sh # Shell script for evaluation with Ascend or GPU
├─src ├─src
│ config.py # parameter configuration │ config.py # parameter configuration
│ dataset.py # creating dataset │ dataset.py # creating dataset
│ launch.py # start Python script │ launch.py # start Python script
│ lr_generator.py # learning rate config │ lr_generator.py # learning rate config
│ mobilenetV2.py # MobileNetV2 architecture │ mobilenetV2.py # MobileNetV2 architecture
│ models.py # net utils to load ckpt_file, define_net... │ models.py # net utils to load ckpt_file, define_net...
│ utils.py # net utils to switch precision, set_context and so on │ utils.py # net utils to switch precision, set_context and so on
├─train.py # training script ├─train.py # training script
└─eval.py # evaluation script └─eval.py # evaluation script
``` ```
运行增量学习训练与测试时,Windows、Ubuntu与EulersOS上可以使用Python文件`train.py``eval.py`,Ubuntu与EulerOS上还可以使用Shell脚本文件`run_train.sh``run_eval.sh` 运行增量学习训练与测试时,Windows、Ubuntu与EulersOS上可以使用Python文件`train.py``eval.py`,Ubuntu与EulerOS上还可以使用Shell脚本文件`run_train.sh``run_eval.sh`
使用脚本文件`run_train.sh`时,该文件会将运行`launch.py`并且将参数传入`launch.py``launch.py`根据分配的CPU、GPU或Ascend AI处理器数量,启动单个/多个进程运行`train.py`,每一个进程分配对应的一个处理器。 使用脚本文件`run_train.sh`时,该文件会将运行`launch.py`并且将参数传入`launch.py``launch.py`根据分配的CPU、GPU或Ascend AI处理器数量,启动单个/多个进程运行`train.py`,每一个进程分配对应的一个处理器。
### 准备预训练模型 ### 准备预训练模型
[下载预训练模型](https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite/mobilenetV2.ckpt)到以下目录: [下载预训练模型](https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite/mobilenetV2.ckpt)到以下目录:
`./pretrain_checkpoint/[pretrain_checkpoint_file]` `./pretrain_checkpoint/[pretrain_checkpoint_file]`
```Python ```Python
mkdir pretrain_checkpoint mkdir pretrain_checkpoint
wget -P ./pretrain_checkpoint https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite/mobilenetV2.ckpt wget -P ./pretrain_checkpoint https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite/mobilenetV2.ckpt
``` ```
### 准备数据 ### 准备数据
准备ImageFolder格式管理的数据集,运行`run_train.sh`时加入`[dataset_path]`参数,运行`train.py`时加入`--dataset_path [dataset_path]`参数: 准备ImageFolder格式管理的数据集,运行`run_train.sh`时加入`[dataset_path]`参数,运行`train.py`时加入`--dataset_path [dataset_path]`参数:
数据集结构如下: 数据集结构如下:
``` ```
└─ImageFolder └─ImageFolder
├─train ├─train
│ class1Folder │ class1Folder
│ class2Folder │ class2Folder
│ ...... │ ......
└─eval └─eval
class1Folder class1Folder
class2Folder class2Folder
...... ......
``` ```
## 预训练模型加载代码详解 ## 预训练模型加载代码详解
在增量学习时,需要加载预训练模型。不同数据集和任务中特征提取层(卷积层)分布趋于一致,但是特征向量的组合(全连接层)不相同,分类数量(全连接层output_size)通常也不一致。在增量学习时,只加载与训练特征提取层参数,不加载与训练全连接层参数;在微调与初始训练时,加载与训练特征提取层参数与全连接层参数。 在增量学习时,需要加载预训练模型。不同数据集和任务中特征提取层(卷积层)分布趋于一致,但是特征向量的组合(全连接层)不相同,分类数量(全连接层output_size)通常也不一致。在增量学习时,只加载与训练特征提取层参数,不加载与训练全连接层参数;在微调与初始训练时,加载与训练特征提取层参数与全连接层参数。
在训练与测试之前,首先按照代码第1行,构建MobileNetV2的backbone网络,head网络,并且构建包含这两个子网络的MobileNetV2网络。代码第4-11行展示了如何在`fine_tune`训练模式下,将预训练模型加载入`net`(MobileNetV2);在`incremental_learn`训练模式下,将预训练模型分别加载入backbone_net子网络,并且冻结backbone_net中的参数,不参与训练。代码第22-24行展示了如何冻结网络参数。 在训练与测试之前,首先按照代码第1行,构建MobileNetV2的backbone网络,head网络,并且构建包含这两个子网络的MobileNetV2网络。代码第4-11行展示了如何在`fine_tune`训练模式下,将预训练模型加载入`net`(MobileNetV2);在`incremental_learn`训练模式下,将预训练模型分别加载入backbone_net子网络,并且冻结backbone_net中的参数,不参与训练。代码第22-24行展示了如何冻结网络参数。
```Python ```Python
1: backbone_net, head_net, net = define_net(args_opt, config) 1: backbone_net, head_net, net = define_net(args_opt, config)
2: ... 2: ...
3: def define_net(args, config): 3: def define_net(args, config):
4: backbone_net = MobileNetV2Backbone(platform=args.platform) 4: backbone_net = MobileNetV2Backbone(platform=args.platform)
5: head_net = MobileNetV2Head(input_channel=backbone_net.out_channels, num_classes=config.num_classes) 5: head_net = MobileNetV2Head(input_channel=backbone_net.out_channels, num_classes=config.num_classes)
6: net = mobilenet_v2(backbone_net, head_net) 6: net = mobilenet_v2(backbone_net, head_net)
7: if args.pretrain_ckpt: 7: if args.pretrain_ckpt:
8: if args.train_method == "fine_tune": 8: if args.train_method == "fine_tune":
9: load_ckpt(net, args.pretrain_ckpt) 9: load_ckpt(net, args.pretrain_ckpt)
10: elif args.train_method == "incremental_learn": 10: elif args.train_method == "incremental_learn":
11: load_ckpt(backbone_net, args.pretrain_ckpt, trainable=False) 11: load_ckpt(backbone_net, args.pretrain_ckpt, trainable=False)
12: elif args.train_method == "train": 12: elif args.train_method == "train":
13: pass 13: pass
14: else: 14: else:
15: raise ValueError("must input the usage of pretrain_ckpt when the pretrain_ckpt isn't None") 15: raise ValueError("must input the usage of pretrain_ckpt when the pretrain_ckpt isn't None")
16: return backbone_net, head_net, net 16: return backbone_net, head_net, net
17: ... 17: ...
18: def load_ckpt(network, pretrain_ckpt_path, trainable=True): 18: def load_ckpt(network, pretrain_ckpt_path, trainable=True):
19: """load the pretrain checkpoint and with the param trainable or not""" 19: """load the pretrain checkpoint and with the param trainable or not"""
20: param_dict = load_checkpoint(pretrain_ckpt_path) 20: param_dict = load_checkpoint(pretrain_ckpt_path)
21: load_param_into_net(network, param_dict) 21: load_param_into_net(network, param_dict)
22: if not trainable: 22: if not trainable:
23: for param in network.get_parameters(): 23: for param in network.get_parameters():
24: param.requires_grad = False 24: param.requires_grad = False
``` ```
## 参数简介 ## 参数简介
### 运行Python文件 ### 运行Python文件
在Windows与Linux系统上训练时,运行`train.py`时需要传入`dataset_path``platform``train_method``pretrain_ckpt`四个参数。验证时,运行`eval.py`并且传入`dataset_path``platform``pretrain_ckpt``head_ckpt`四个参数。 在Windows与Linux系统上训练时,运行`train.py`时需要传入`dataset_path``platform``train_method``pretrain_ckpt`四个参数。验证时,运行`eval.py`并且传入`dataset_path``platform``pretrain_ckpt``head_ckpt`四个参数。
```Shell ```Shell
# Windows/Linux train with Python file # Windows/Linux train with Python file
python train.py --dataset_path [dataset_path] --platform [platform] --pretrain_ckpt [pretrain_checkpoint_path] --train_method[("train", "fine_tune", "incremental_learn")] python train.py --dataset_path [dataset_path] --platform [platform] --pretrain_ckpt [pretrain_checkpoint_path] --train_method[("train", "fine_tune", "incremental_learn")]
# Windows/Linux eval with Python file # Windows/Linux eval with Python file
python eval.py --dataset_path [dataset_path] --platform [platform] --pretrain_ckpt [pretrain_checkpoint_path] --head_ckpt [head_ckpt_path] python eval.py --dataset_path [dataset_path] --platform [platform] --pretrain_ckpt [pretrain_checkpoint_path] --head_ckpt [head_ckpt_path]
``` ```
- `--dataset_path`:训练与验证数据集地址,无默认值,用户训练/验证时必须输入。 - `--dataset_path`:训练与验证数据集地址,无默认值,用户训练/验证时必须输入。
- `--platform`:处理器类型,默认为“Ascend”,可以设置为“CPU”或"GPU"。 - `--platform`:处理器类型,默认为“Ascend”,可以设置为“CPU”或"GPU"。
- `--train_method`:训练方法,必须输入“train"、"fine_tune"和incremental_learn"其中一个。 - `--train_method`:训练方法,必须输入“train"、"fine_tune"和incremental_learn"其中一个。
- `--pretrain_ckpt`:增量训练或调优时,需要传入pretrain_checkpoint文件路径以加载预训练好的模型参数权重。 - `--pretrain_ckpt`:增量训练或调优时,需要传入pretrain_checkpoint文件路径以加载预训练好的模型参数权重。
- `--head_ckpt`:增量训练模型验证时,需要传入head_net预训练模型路径以加载预训练好的模型参数权重。 - `--head_ckpt`:增量训练模型验证时,需要传入head_net预训练模型路径以加载预训练好的模型参数权重。
### 运行Shell脚本 ### 运行Shell脚本
在Linux系统上时,可以选择运行Shell脚本文件`./scripts/run_train.sh``./scripts/run_eval.sh`。运行时需要在交互界面中同时传入参数。 在Linux系统上时,可以选择运行Shell脚本文件`./scripts/run_train.sh``./scripts/run_eval.sh`。运行时需要在交互界面中同时传入参数。
```Shell ```Shell
# Windows doesn't support Shell # Windows doesn't support Shell
# Linux train with Shell script # Linux train with Shell script
sh run_train.sh [PLATFORM] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH] [TRAIN_METHOD] [CKPT_PATH] sh run_train.sh [PLATFORM] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH] [TRAIN_METHOD] [CKPT_PATH]
# Linux eval with Shell script for incremental learn # Linux eval with Shell script for incremental learn
sh run_eval.sh [PLATFORM] [DATASET_PATH] [PRETRAIN_CKPT_PATH] [HEAD_CKPT_PATH] sh run_eval.sh [PLATFORM] [DATASET_PATH] [PRETRAIN_CKPT_PATH] [HEAD_CKPT_PATH]
``` ```
- `[PLATFORM]`:处理器类型,默认为“Ascend”,可以设置为“GPU”。 - `[PLATFORM]`:处理器类型,默认为“Ascend”,可以设置为“GPU”。
- `[DEVICE_NUM]`:每个节点(一台服务器/PC相当于一个节点)进程数量,建议设置为机器上Ascend AI处理器数量或GPU数量。 - `[DEVICE_NUM]`:每个节点(一台服务器/PC相当于一个节点)进程数量,建议设置为机器上Ascend AI处理器数量或GPU数量。
- `[VISIABLE_DEVICES(0,1,2,3,4,5,6,7)]`:字符串格式的的设备ID,训练将会根据`[VISIABLE_DEVICES]`将进程绑定到对应ID的设备上,多个设备ID之间使用','分隔,建议ID数量与进程数量相同。 - `[VISIABLE_DEVICES(0,1,2,3,4,5,6,7)]`:字符串格式的的设备ID,训练将会根据`[VISIABLE_DEVICES]`将进程绑定到对应ID的设备上,多个设备ID之间使用','分隔,建议ID数量与进程数量相同。
- `[RANK_TABLE_FILE]`:platform选择Ascend时,需要配置Ascend的配置Json文件,。 - `[RANK_TABLE_FILE]`:platform选择Ascend时,需要配置Ascend的配置Json文件,。
- `[DATASET_PATH]`:训练与验证数据集地址,无默认值,用户训练/验证时必须输入。 - `[DATASET_PATH]`:训练与验证数据集地址,无默认值,用户训练/验证时必须输入。
- `[CKPT_PATH]`:增量训练或调优时,需要传入checkpoint文件路径以加载预训练好的模型参数权重。 - `[CKPT_PATH]`:增量训练或调优时,需要传入checkpoint文件路径以加载预训练好的模型参数权重。
- `[TRAIN_METHOD]`:训练方法,必须输入`train``fine_tune``incremental_learn`其中一个。 - `[TRAIN_METHOD]`:训练方法,必须输入`train``fine_tune``incremental_learn`其中一个。
- `[PRETRAIN_CKPT_PATH]`:针对增量学习的模型做验证时,需要输入主干网络层保存模型路径。 - `[PRETRAIN_CKPT_PATH]`:针对增量学习的模型做验证时,需要输入主干网络层保存模型路径。
- `[HEAD_CKPT_PATH]`:针对增量学习的模型做验证时,需要输入全连接层保存模型路径。 - `[HEAD_CKPT_PATH]`:针对增量学习的模型做验证时,需要输入全连接层保存模型路径。
## 加载增量学习训练 ## 加载增量学习训练
Windows系统上,MobileNetV2做增量学习训练时,只能运行`train.py`。Linux系统上,使用MobileNetV2做增量学习训练时,可以选择运行`run_train.sh`, 并在运行Shell脚本文件时传入[参数](#参数简介) Windows系统上,MobileNetV2做增量学习训练时,只能运行`train.py`。Linux系统上,使用MobileNetV2做增量学习训练时,可以选择运行`run_train.sh`, 并在运行Shell脚本文件时传入[参数](#参数简介)
Windows系统输出信息到交互式命令行,Linux系统环境下运行`run_train.sh`时,命令行结尾使用`&> [log_file_path]`将标准输出与错误输出写入log文件。 增量学习成功开始训练,`./train/device*/log*.log`中会持续写入每一个epoch的训练时间与Loss等信息。若未成功,上述log文件会写入失败报错信息。 Windows系统输出信息到交互式命令行,Linux系统环境下运行`run_train.sh`时,命令行结尾使用`&> [log_file_path]`将标准输出与错误输出写入log文件。 增量学习成功开始训练,`./train/device*/log*.log`中会持续写入每一个epoch的训练时间与Loss等信息。若未成功,上述log文件会写入失败报错信息。
### CPU加载训练 ### CPU加载训练
- 设置节点数量 - 设置节点数量
目前运行`train.py`时仅支持单处理器,不需要调整处理器数量。运行`run_train.sh`文件时,`CPU`设备默认为单处理器,目前暂不支持修改CPU数量。 目前运行`train.py`时仅支持单处理器,不需要调整处理器数量。运行`run_train.sh`文件时,`CPU`设备默认为单处理器,目前暂不支持修改CPU数量。
- 开始增量训练 - 开始增量训练
使用样例1:通过Python文件调用1个CPU处理器。 使用样例1:通过Python文件调用1个CPU处理器。
```Shell ```Shell
# Windows or Linux with Python # Windows or Linux with Python
python train.py --platform CPU --dataset_path /store/dataset/OpenImage/train/ -- train_method incremental_learn --pretrain_ckpt ./pretrain_checkpoint/mobilenetV2.ckpt python train.py --platform CPU --dataset_path /store/dataset/OpenImage/train/ -- train_method incremental_learn --pretrain_ckpt ./pretrain_checkpoint/mobilenetV2.ckpt
``` ```
使用样例2:通过Shell文件调用1个CPU处理器。 使用样例2:通过Shell文件调用1个CPU处理器。
```Shell ```Shell
# Linux with Shell # Linux with Shell
sh run_train.sh CPU /store/dataset/OpenImage/train/ incremental_learn ../pretrain_checkpoint/mobilenetV2.ckpt sh run_train.sh CPU /store/dataset/OpenImage/train/ incremental_learn ../pretrain_checkpoint/mobilenetV2.ckpt
``` ```
### GPU加载训练 ### GPU加载训练
- 设置节点数量 - 设置节点数量
目前运行`train.py`时仅支持单处理器,不需要调整节点数量。运行`run_train.sh`文件时,设置`[nproc_per_node]`为GPU数量, `[visible_devices]`为可使用的处理器编号,即GPU的ID,可以选择一个或多个设备ID,使用`,`隔开。 目前运行`train.py`时仅支持单处理器,不需要调整节点数量。运行`run_train.sh`文件时,设置`[nproc_per_node]`为GPU数量, `[visible_devices]`为可使用的处理器编号,即GPU的ID,可以选择一个或多个设备ID,使用`,`隔开。
- 开始增量训练 - 开始增量训练
- 使用样例1:通过Python文件调用1个GPU处理器。 - 使用样例1:通过Python文件调用1个GPU处理器。
```Shell ```Shell
# Windows or Linux with Python # Windows or Linux with Python
python train.py --platform GPU --dataset_path /store/dataset/OpenImage/train/ --pretrain_ckpt ./pretrain_checkpoint/mobilenetV2.ckpt --train_method incremental_learn python train.py --platform GPU --dataset_path /store/dataset/OpenImage/train/ --pretrain_ckpt ./pretrain_checkpoint/mobilenetV2.ckpt --train_method incremental_learn
``` ```
- 使用样例2:通过Shell脚本调用1个GPU处理器,设备ID为`“0”` - 使用样例2:通过Shell脚本调用1个GPU处理器,设备ID为`“0”`
```Shell ```Shell
# Linux with Shell # Linux with Shell
sh run_train.sh GPU 1 0 /store/dataset/OpenImage/train/ incremental_learn ../pretrain_checkpoint/mobilenetV2.ckpt sh run_train.sh GPU 1 0 /store/dataset/OpenImage/train/ incremental_learn ../pretrain_checkpoint/mobilenetV2.ckpt
``` ```
- 使用样例3:通过Shell脚本调用8个GPU处理器,设备ID为`“0,1,2,3,4,5,6,7”` - 使用样例3:通过Shell脚本调用8个GPU处理器,设备ID为`“0,1,2,3,4,5,6,7”`
```Shell ```Shell
# Linux with Shell # Linux with Shell
sh run_train.sh GPU 8 0,1,2,3,4,5,6,7 /store/dataset/OpenImage/train/ incremental_learn ../pretrain_checkpoint/mobilenetV2.ckpt sh run_train.sh GPU 8 0,1,2,3,4,5,6,7 /store/dataset/OpenImage/train/ incremental_learn ../pretrain_checkpoint/mobilenetV2.ckpt
``` ```
### Ascend加载训练 ### Ascend加载训练
- 设置节点数量 - 设置节点数量
目前运行`train.py`时仅支持单处理器,不需要调整节点数量。运行`run_train.sh`文件时,设置`[nproc_per_node]`为Ascend AI处理器数量, `[visible_devices]`为可使用的处理器编号,即Ascend AI处理器的ID,8卡服务器可以选择0-7中一个或多个设备ID,使用`,`隔开。Ascend节点处理器数量目前只能设置为1或者8。 目前运行`train.py`时仅支持单处理器,不需要调整节点数量。运行`run_train.sh`文件时,设置`[nproc_per_node]`为Ascend AI处理器数量, `[visible_devices]`为可使用的处理器编号,即Ascend AI处理器的ID,8卡服务器可以选择0-7中一个或多个设备ID,使用`,`隔开。Ascend节点处理器数量目前只能设置为1或者8。
- 开始增量训练 - 开始增量训练
- 使用样例1:通过Python文件调用1个Ascend处理器。 - 使用样例1:通过Python文件调用1个Ascend处理器。
```Shell ```Shell
# Windows or Linux with Python # Windows or Linux with Python
python train.py --platform Ascend --dataset_path /store/dataset/OpenImage/train/ --train_method incremental_learn --pretrain_ckpt ./pretrain_checkpoint/mobilenetV2.ckpt python train.py --platform Ascend --dataset_path /store/dataset/OpenImage/train/ --train_method incremental_learn --pretrain_ckpt ./pretrain_checkpoint/mobilenetV2.ckpt
``` ```
- 使用样例2:通过Shell脚本调用1个Ascend AI处理器,设备ID为“0”。 - 使用样例2:通过Shell脚本调用1个Ascend AI处理器,设备ID为“0”。
```Shell ```Shell
# Linux with Shell # Linux with Shell
sh run_train.sh Ascend 1 0 ~/rank_table.json /store/dataset/OpenImage/train/ incremental_learn ../pretrain_checkpoint/mobilenetV2.ckpt sh run_train.sh Ascend 1 0 ~/rank_table.json /store/dataset/OpenImage/train/ incremental_learn ../pretrain_checkpoint/mobilenetV2.ckpt
``` ```
- 使用样例3:通过Shell脚本调用8个Ascend AI处理器,设备ID为”0,1,2,3,4,5,6,7“。 - 使用样例3:通过Shell脚本调用8个Ascend AI处理器,设备ID为”0,1,2,3,4,5,6,7“。
```Shell ```Shell
# Linux with Shell # Linux with Shell
sh run_train.sh Ascend 8 0,1,2,3,4,5,6,7 ~/rank_table.json /store/dataset/OpenImage/train/ incremental_learn ../pretrain_checkpoint/mobilenetV2.ckpt sh run_train.sh Ascend 8 0,1,2,3,4,5,6,7 ~/rank_table.json /store/dataset/OpenImage/train/ incremental_learn ../pretrain_checkpoint/mobilenetV2.ckpt
``` ```
### 增量学习训练结果 ### 增量学习训练结果
- 查看运行结果。 - 查看运行结果。
- 运行Python文件时在交互式命令行中查看打印信息,`Linux`上运行Shell脚本运行后使用`cat ./train/device0/log0.log`中查看打印信息,输出结果如下: - 运行Python文件时在交互式命令行中查看打印信息,`Linux`上运行Shell脚本运行后使用`cat ./train/device0/log0.log`中查看打印信息,输出结果如下:
```Shell ```Shell
train args: Namespace(dataset_path='.\\dataset\\train', platform='CPU', \ train args: Namespace(dataset_path='.\\dataset\\train', platform='CPU', \
pretrain_ckpt='.\\pretrain_checkpoint\\mobilenetV2.ckpt', train_method='incremental_learn') pretrain_ckpt='.\\pretrain_checkpoint\\mobilenetV2.ckpt', train_method='incremental_learn')
cfg: {'num_classes': 26, 'image_height': 224, 'image_width': 224, 'batch_size': 150, \ cfg: {'num_classes': 26, 'image_height': 224, 'image_width': 224, 'batch_size': 150, \
'epoch_size': 15, 'warmup_epochs': 0, 'lr_max': 0.03, 'lr_end': 0.03, 'momentum': 0.9, \ 'epoch_size': 15, 'warmup_epochs': 0, 'lr_max': 0.03, 'lr_end': 0.03, 'momentum': 0.9, \
'weight_decay': 4e-05, 'label_smooth': 0.1, 'loss_scale': 1024, 'save_checkpoint': True, \ 'weight_decay': 4e-05, 'label_smooth': 0.1, 'loss_scale': 1024, 'save_checkpoint': True, \
'save_checkpoint_epochs': 1, 'keep_checkpoint_max': 20, 'save_checkpoint_path': './checkpoint', \ 'save_checkpoint_epochs': 1, 'keep_checkpoint_max': 20, 'save_checkpoint_path': './checkpoint', \
'platform': 'CPU'} 'platform': 'CPU'}
Processing batch: 16: 100%|███████████████████████████████████████████ █████████████████████| 16/16 [00:00<?, ?it/s] Processing batch: 16: 100%|███████████████████████████████████████████ █████████████████████| 16/16 [00:00<?, ?it/s]
epoch[15], iter[16] cost: 256.030, per step time: 256.030, avg loss: 1.775total cos 7.2574 s epoch[15], iter[16] cost: 256.030, per step time: 256.030, avg loss: 1.775total cos 7.2574 s
``` ```
- 查看保存的checkpoint文件。 - 查看保存的checkpoint文件。
- Windows上使用`dir checkpoint`查看保存的模型文件: - Windows上使用`dir checkpoint`查看保存的模型文件:
```Shell ```Shell
dir checkpoint dir checkpoint
2020//0814 11:20 267,727 mobilenetv2_head_1.ckpt 2020//0814 11:20 267,727 mobilenetv2_head_1.ckpt
2020//0814 11:21 267,727 mobilenetv2_head_10.ckpt 2020//0814 11:21 267,727 mobilenetv2_head_10.ckpt
2020//0814 11:21 267,727 mobilenetv2_head_11.ckpt 2020//0814 11:21 267,727 mobilenetv2_head_11.ckpt
... ...
2020//0814 11:21 267,727 mobilenetv2_head_7.ckpt 2020//0814 11:21 267,727 mobilenetv2_head_7.ckpt
2020//0814 11:21 267,727 mobilenetv2_head_8.ckpt 2020//0814 11:21 267,727 mobilenetv2_head_8.ckpt
2020//0814 11:21 267,727 mobilenetv2_head_9.ckpt 2020//0814 11:21 267,727 mobilenetv2_head_9.ckpt
``` ```
- Linux上使用`ls ./checkpoint`查看保存的模型文件: - Linux上使用`ls ./checkpoint`查看保存的模型文件:
```Shell ```Shell
ls ./checkpoint/ ls ./checkpoint/
mobilenetv2_head_1.ckpt mobilenetv2_head_2.ckpt mobilenetv2_head_1.ckpt mobilenetv2_head_2.ckpt
mobilenetv2_head_3.ckpt mobilenetv2_head_4.ckpt mobilenetv2_head_3.ckpt mobilenetv2_head_4.ckpt
... ...
``` ```
## 验证增量学习训练模型 ## 验证增量学习训练模型
### 验证模型 ### 验证模型
使用验证集测试模型性能,需要输入必要[参数](#参数简介)`--platform`默认为“Ascend”,可自行设置为"CPU"或"GPU"。最终在交互式命令行中展示标准输出与错误输出,或者将其写入`infer.log`文件。 使用验证集测试模型性能,需要输入必要[参数](#参数简介)`--platform`默认为“Ascend”,可自行设置为"CPU"或"GPU"。最终在交互式命令行中展示标准输出与错误输出,或者将其写入`infer.log`文件。
```Shell ```Shell
# Windows/Linux with Python # Windows/Linux with Python
python eval.py --dataset_path \store\dataset\openimage\val\ --platform CPU pretrain_ckpt .\pretrain_checkpoint\mobilenetV2.ckpt --head_ckpt .\checkpoint\mobilenetv2_head_15,ckpt python eval.py --dataset_path \store\dataset\openimage\val\ --platform CPU pretrain_ckpt .\pretrain_checkpoint\mobilenetV2.ckpt --head_ckpt .\checkpoint\mobilenetv2_head_15,ckpt
# Linux with Shell # Linux with Shell
sh run_infer.sh CPU /store/dataset/openimage/val/ ../pretrain_checkpoint/mobilenetV2.ckpt ../checkpoint/mobilenetv2_head_15.ckpt sh run_infer.sh CPU /store/dataset/openimage/val/ ../pretrain_checkpoint/mobilenetV2.ckpt ../checkpoint/mobilenetv2_head_15.ckpt
``` ```
### 验证结果 ### 验证结果
- 运行Python文件时在交互式命令行中输出验证结果,Shell脚本将把这些信息写入`./infer.log`中,需要使用`cat ./infer.log`查看,以Window运行结果为例,结果如下: - 运行Python文件时在交互式命令行中输出验证结果,Shell脚本将把这些信息写入`./infer.log`中,需要使用`cat ./infer.log`查看,以Window运行结果为例,结果如下:
```Shell ```Shell
result:{'acc': 0.9466666666666666666667} result:{'acc': 0.9466666666666666666667}
pretrain_ckpt = .\pretrain_checkpoint\mobilenetV2.ckpt pretrain_ckpt = .\pretrain_checkpoint\mobilenetV2.ckpt
head_ckpt = .\checkpoint\mobilenetv2_head_15.ckpt head_ckpt = .\checkpoint\mobilenetv2_head_15.ckpt
``` ```
...@@ -36,6 +36,7 @@ MindSpore教程 ...@@ -36,6 +36,7 @@ MindSpore教程
advanced_use/synchronization_training_and_evaluation advanced_use/synchronization_training_and_evaluation
advanced_use/bert_poetry advanced_use/bert_poetry
advanced_use/optimize_the_performance_of_data_preparation advanced_use/optimize_the_performance_of_data_preparation
advanced_use/mobilenetv2_incremental_learning
.. toctree:: .. toctree::
:glob: :glob:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册