提交 1132268f 编写于 作者: W weishengyu 提交者: Tingquan Gao

update format and add function_intro_en

上级 c285b016
......@@ -2,53 +2,65 @@
## Contents
- [Overview of Code and Content](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/code_overview.md#1)
- [Training Module](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/code_overview.md#2)
- [2.1 Data](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/code_overview.md#2.1)
- [2.2 Model Structure](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/code_overview.md#2.2)
- [2.3 Loss Function](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/code_overview.md#2.3)
- [2.4 Optimizer, Learning Rate Decay, and Weight Decay](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/code_overview.md#2.4)
- [2.5 Evaluation During Training](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/code_overview.md#2.5)
- [2.6 Model Saving](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/code_overview.md#2.6)
- [2.7 Model Pruning and Quantification](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/code_overview.md#2.7)
- [Codes and Methods for Inference and Deployment](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/code_overview.md#3)
- [Overview of Code and Content](#1)
- [Training Module](#2)
- [2.1 Data](#2.1)
- [2.2 Model Structure](#2.2)
- [2.3 Loss Function](#2.3)
- [2.4 Optimizer, Learning Rate Decay, and Weight Decay](#2.4)
- [2.5 Evaluation During Training](#2.5)
- [2.6 Model Saving](#2.6)
- [2.7 Model Pruning and Quantification](#2.7)
- [Codes and Methods for Inference and Deployment](#3)
<a name="1"></a>
## 1 Overview of Code and Content
The main code and content structure of PaddleClas are as follows:
- benchmark: The folder stores shell scripts to test the speed metrics of different models in PaddleClas, such as single-card training speed metrics, multi-card training speed metrics, etc.
- dataset: The folder stores datasets and the scripts used to process datasets. The scripts are responsible for processing the dataset into a suitable format for Dataloader.
- deploy: Deploy the core code, the folder stores the deployment tools, which support python/cpp inference, Hub Serveing, Paddle Lite, Slim offline quantification and other deployment methods.
- ppcls: Train the core code, the folder holds the main body of the PaddleClas framework. It also has configuration files, and specific code of model training, evaluation, inference, dynamic to static export, etc.
- tools: The file contains the entry functions and scripts for training, evaluation, inference, and dynamic to static export.
- benchmark: shell scripts to test the speed metrics of different models in PaddleClas, such as single-card training speed metrics, multi-card training speed metrics, etc.
- dataset: datasets and the scripts used to process datasets. The scripts are responsible for processing the dataset into a suitable format for Dataloader.
- deploy: code for deployment, including deployment tools, which support python/cpp inference, Hub Serveing, Paddle Lite, Slim offline quantification and other deployment methods.
- ppcls: code for training and evaluation which is the main body of the PaddleClas framework. It also contains configuration files, and specific code of model training, evaluation, inference, dynamic to static export, etc.
- tools: entry functions and scripts for training, evaluation, inference, and dynamic to static export.
- The requirements.txt file is adopted to install the dependencies for PaddleClas. Use pip for upgrading, installation, and application.
- tests: Full-link tests of PaddleClas models from training to prediction to verify that whether each function works properly.
- test_tipc: TIPC tests of PaddleClas models from training to prediction to verify that whether each function works properly.
<a name="2"></a>
## 2 Training Module
The training of deep learning model mainly contains data, model structure, loss function, strategies such as optimizer, learning rate decay, and weight decay strategy, etc., which are explained below.
Modules of training deep learning model mainly contains data, model structure, loss function,
strategies such as optimizer, learning rate decay, and weight decay strategy, etc., which are explained below.
<a name="2.1"></a>
## 2.1 Data
For supervised tasks, the training data generally contains the original data and its annotation. In a single-label-based image classification task, the raw data refers to the image data, while the annotation is the class to which the image data belongs. In PaddleClas, a label file, in the following format, is required for training, with each row containing one training sample and separated by a separator (space by default), representing the image path and the class label respectively.
For supervised tasks, the training data generally contains the raw data and its annotation.
In a single-label-based image classification task, the raw data refers to the image data,
while the annotation is the class to which the image data belongs. In PaddleClas, a label file,
in the following format, is required for training,
with each row containing one training sample and separated by a separator (space by default),
representing the image path and the class label respectively.
```
train/n01440764/n01440764_10026.JPEG 0
train/n01440764/n01440764_10027.JPEG 0
```
The code `ppcls/data/dataloader/common_dataset.py` contains the `CommonDataset` class inherited from `paddle.io.Dataset`, which is a dataset class that can index and fetch a given sample by a key value. Dataset classes such as `ImageNetDataset`, `LogoDataset`, `CommonDataset`, etc. are all inherited from this class.
`ppcls/data/dataloader/common_dataset.py` contains the `CommonDataset` class inherited from `paddle.io.Dataset`,
which is a dataset class that can index and fetch a given sample by a key value.
Dataset classes such as `ImageNetDataset`, `LogoDataset`, `CommonDataset`, etc. are all inherited from this class.
For the read-in data, the raw image needs to be transformed by data conversion. The standard data preprocessing during training contains `DecodeImage`, `RandCropImage`, `RandFlipImage`, `NormalizeImage`, and `ToCHWImage`. The data preprocessing is mainly in the `transforms` field, which is presented in a list, and then converts the data in order, as reflected in the configuration file below.
The raw image needs to be preprocessed before training.
The standard data preprocessing during training contains
`DecodeImage`, `RandCropImage`, `RandFlipImage`, `NormalizeImage`, and `ToCHWImage`.
The data preprocessing is mainly in the `transforms` field, which is presented in a list,
and then converts the data in order, as reflected in the configuration file below.
```
```yaml
DataLoader:
Train:
dataset:
......@@ -70,19 +82,25 @@ DataLoader:
order: ''
```
PaddleClas also contains `AutoAugment`, `RandAugment`, and other data augmentation methods, which can also be configured in the configuration file and thus added to the data preprocessing of the training. Each data conversion method is implemented as a class for easy migration and reuse. For more specific implementation of data processing, please refer to the code under `ppcls/data/preprocess/ops/`.
PaddleClas also contains `AutoAugment`, `RandAugment`, and other data augmentation methods,
which can also be configured in the configuration file and thus added to the data preprocessing of the training.
Each data augmentation and process method is implemented as a class for easy migration and reuse.
For more specific implementation of data processing, please refer to the code under `ppcls/data/preprocess/ops/`.
You can also use methods such as mixup or cutmix to augment the data that make up a batch. PaddleClas integrates `MixupOperator`, `CutmixOperator`, `FmixOperator`, and other batch-based data augmentation methods, which can be configured by deploying the mix parameter in the configuration file. For more specific implementation, please refer to `ppcls/data/preprocess /batch_ops/batch_operators.py`.
You can also use methods such as mixup or cutmix to augment the data that make up a batch.
PaddleClas integrates `MixupOperator`, `CutmixOperator`, `FmixOperator`, and other batch-based data augmentation methods,
which can be configured by deploying the mix parameter in the configuration file.
For code implementation, please refer to `ppcls/data/preprocess /batch_ops/batch_operators.py`.
In image classification, the data post-processing is mainly `argmax` operation, which is not elaborated here.
<a name="2.2"></a>
## 2.2 Model Structure
The model in the configuration file is structured as follows:
```
```yaml
Arch:
name: ResNet50
class_num: 1000
......@@ -90,11 +108,17 @@ Arch:
use_ssld: False
```
`Arch.name` indicates the name of the model, `Arch.pretrained` whether to add a pre-trained model, and `use_ssld` whether to use a pre-trained model based on `SSLD` knowledge distillation. All model names are defined in `ppcls/arch/backbone/__init__.py`.
`Arch.name`: the name of the model
`Arch.pretrained`: whether to add a pre-trained model
`Arch.use_ssld`: whether to use a pre-trained model based on `SSLD` knowledge distillation.
All model names are defined in `ppcls/arch/backbone/__init__.py`.
Correspondingly, the model object is created in `ppcls/arch/__init__.py` with the `build_model` method.
```
```python
def build_model(config):
config = copy.deepcopy(config)
model_type = config.pop("name")
......@@ -104,21 +128,24 @@ def build_model(config):
```
<a name="2.3"></a>
## 2.3 Loss Function
PaddleClas contains `CELoss` , `JSDivLoss`, `TripletLoss`, `CenterLoss` and other loss functions, all defined in `ppcls/loss`.
PaddleClas implement `CELoss` , `JSDivLoss`, `TripletLoss`, `CenterLoss` and other loss functions, all defined in `ppcls/loss`.
In the `ppcls/loss/__init__.py` file, `CombinedLoss` is used to construct and combine loss functions. The loss functions and calculation methods required in different training strategies are disparate, and the following factors are considered by PaddleClas in the construction of the loss function.
In the `ppcls/loss/__init__.py` file, `CombinedLoss` is used to construct and combine loss functions.
The loss functions and calculation methods required in different training strategies are disparate,
and the following factors are considered by PaddleClas in the construction of the loss function.
1. whether to use label smooth
2. whether to use mixup or cutmix
3. whether to use distillation method for training
4. whether to train metric learning
The user can specify the type and weight of the loss function in the configuration file, such as adding TripletLossV2 to the training, the configuration file is as follows:
User can specify the type and weight of the loss function in the configuration file,
such as adding TripletLossV2 to the training, the configuration file is as follows:
```
```yaml
Loss:
Train:
- CELoss:
......@@ -129,18 +156,22 @@ Loss:
```
<a name="2.4"></a>
## 2.4 Optimizer, Learning Rate Decay, and Weight Decay
In image classification tasks, `Momentum` is a commonly used optimizer, and several optimizer strategies such as `Momentum`, `RMSProp`, `Adam`, and `AdamW` are provided in PaddleClas.
In image classification tasks, `Momentum` is a commonly used optimizer,
and several optimizer strategies such as `Momentum`, `RMSProp`, `Adam`, and `AdamW` are provided in PaddleClas.
The weight decay strategy is a common regularization method, mainly adopted to prevent model overfitting. Two weight decay strategies, `L1Decay` and `L2Decay`, are provided in PaddleClas.
The weight decay strategy is a common regularization method, mainly adopted to prevent model overfitting.
Two weight decay strategies, `L1Decay` and `L2Decay`, are provided in PaddleClas.
Learning rate decay is an essential training method for accuracy improvement in image classification tasks. PaddleClas currently supports `Cosine`, `Piecewise`, `Linear`, and other learning rate decay strategies.
Learning rate decay is an essential training method for accuracy improvement in image classification tasks.
PaddleClas currently supports `Cosine`, `Piecewise`, `Linear`, and other learning rate decay strategies.
In the configuration file, the optimizer, weight decay, and learning rate decay strategies can be configured with the following fields.
In the configuration file, the optimizer, weight decay,
and learning rate decay strategies can be configured with the following fields.
```
```yaml
Optimizer:
name: Momentum
momentum: 0.9
......@@ -156,7 +187,7 @@ Optimizer:
Employ `build_optimizer` in `ppcls/optimizer/__init__.py` to create the optimizer and learning rate objects.
```
```python
def build_optimizer(config, epochs, step_each_epoch, parameters):
config = copy.deepcopy(config)
# step1 build lr
......@@ -185,13 +216,17 @@ def build_optimizer(config, epochs, step_each_epoch, parameters):
return optim, lr
```
Different optimizers and weight decay strategies are implemented as classes, which can be found in the file `ppcls/optimizer/optimizer.py`; different learning rate decay strategies can be found in the file `ppcls/optimizer/learning_rate.py`.
Different optimizers and weight decay strategies are implemented as classes,
which can be found in the file `ppcls/optimizer/optimizer.py`.
Different learning rate decay strategies can be found in the file `ppcls/optimizer/learning_rate.py`.
<a name="2.5"></a>
## 2.5 Evaluation During Training
When training the model, you can set the interval of model saving, or you can evaluate the validation set every several epochs so that the model with the best accuracy can be saved. Follow the fields below to configure.
When training the model, you can set the interval of model saving,
or you can evaluate the validation set every several epochs so that the model with the best accuracy can be saved.
Follow the examples below to configure.
```
Global:
......@@ -201,50 +236,68 @@ Global:
```
<a name="2.6"></a>
## 2.6 Model Saving
The model is saved through the `paddle.save()` function of the Paddle framework. The dynamic graph version of the model is saved in the form of a dictionary to facilitate further training. The specific implementation is as follows:
```
def save_model(program, model_path, epoch_id, prefix='ppcls'): model_path = os.path.join(model_path, str(epoch_id)) _mkdir_if_not_exist(model_path) model_prefix = os.path.join(model_path, prefix) paddle.static.save(program, model_prefix) logger.info( logger.coloring("Already save model in {}".format(model_path), "HEADER"))
The model is saved through the `paddle.save()` function of the Paddle framework.
The dynamic graph version of the model is saved in the form of a dictionary to facilitate further training.
The specific implementation is as follows:
```python
def save_model(program, model_path, epoch_id, prefix='ppcls'):
model_path = os.path.join(model_path, str(epoch_id))
_mkdir_if_not_exist(model_path)
model_prefix = os.path.join(model_path, prefix)
paddle.static.save(program, model_prefix)
logger.info(
logger.coloring("Already save model in {}".format(model_path), "HEADER"))
```
When saving, there are two things to keep in mind:
1. Only save the model on node 0, otherwise, if all nodes save models to the same path, a file conflict may occur during multi-card training when multiple nodes write files, preventing the final saved model from being loaded correctly.
1. Only save the model on node 0, otherwise, if all nodes save models to the same path,
a file conflict may occur during multi-card training when multiple nodes write files,
preventing the final saved model from being loaded correctly.
2. Optimizer parameters also need to be saved to facilitate subsequent loading of breakpoints for training.
- Model pruning and quantification training
If you want to conduct compression training, please configure with the following fields.
<a name="2.7"></a>
## 2.7 Model Pruning and Quantification
If you want to conduct compression training, please configure with the following fields.
1. Model pruning:
```
Slim: prune: name: fpgm pruned_ratio: 0.3
```yaml
Slim:
prune:
name: fpgm
pruned_ratio: 0.3
```
2. Model quantification:
```yaml
Slim:
quant:
name: pact
```
Slim: quant: name: pact
```
For details of the training method, see [Pruning and Quantification Application](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/model_prune_ quantization.md), and the algorithm is described in [Pruning and Quantification algorithms](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/algorithm_introduction/ model_prune_quantization.md).
For details of the training method, see [Pruning and Quantification Application](model_prune_quantization_en.md),
and the algorithm is described in [Pruning and Quantification algorithms](model_prune_quantization_en.md).
<a name="3"></a>
## 3 Codes and Methods for Inference and Deployment
- If you wish to quantify the classification model offline, please refer to the [Model Pruning and Quantification Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/model_) for offline quantification.
- If you wish to use python for server-side deployment, please refer to [Python Inference Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/inference_ deployment/python_deploy.md).
- If you wish to use cpp for server-side deployment, please refer to [Cpp Inference Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/inference_ deployment/cpp_deploy.md).
- If you wish to deploy the classification model as a service, please refer to the [Hub Serving Inference Deployment Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/inference_deployment/ paddle_hub_serving_deploy.md).
- If you wish to use classification models for inference on mobile, please refer to [PaddleLite Inference Deployment Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/inference_ deployment/paddle_lite_deploy.md)
- If you wish to use the whl package for inference of classification models, please refer to [whl Package Inference](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/inference_deployment/whl_ deploy.md) .
- If you wish to quantify the classification model offline, please refer to
[Model Pruning and Quantification Tutorial](model_prune_quantization_en.md) for offline quantification.
- If you wish to use python for server-side deployment,
please refer to [Python Inference Tutorial](../inference_deployment/python_deploy_en.md).
- If you wish to use cpp for server-side deployment,
please refer to [Cpp Inference Tutorial](../inference_deployment/cpp_deploy_en.md).
- If you wish to deploy the classification model as a service,
please refer to the [Hub Serving Inference Deployment Tutorial](../inference_deployment/paddle_hub_serving_deploy_en.md).
- If you wish to use classification models for inference on mobile,
please refer to [PaddleLite Inference Deployment Tutorial](../inference_deployment/paddle_lite_deploy_en.md)
- If you wish to use the whl package for inference of classification models,
please refer to [whl Package Inference](../inference_deployment/whl_deploy_en.md) .
......@@ -4,22 +4,22 @@
## Contents
- [1. How to Contribute Code](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1)
- [1.1 Branches of PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.1)
- [1.2 Commit Code to PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2)
- [1.2.1 Codes of Fork and Clone](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.1)
- [1.2.2 Connect to the Remote Repository](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.2)
- [1.2.3 Create the Local Branch](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.3)
- [1.2.4 Employ Pre-commit Hook](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.4)
- [1.2.5 Modify and Commit Code](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.5)
- [1.2.6 Keep the Local Repository Updated](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.6)
- [1.2.7 Push to Remote Repository](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.7)
- [1.2.8 Commit Pull Request](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.8)
- [1.2.9 CLA and Unit Test](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.9)
- [1.2.10 Delete Branch](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.10)
- [1.2.11 Conventions](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#1.2.11)
- [2. Summary](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#2)
- [3. Inferences](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/advanced_tutorials/how_to_contribute.md#3)
- [1. How to Contribute Code](#1)
- [1.1 Branches of PaddleClas](#1.1)
- [1.2 Commit Code to PaddleClas](#1.2)
- [1.2.1 Codes of Fork and Clone](#1.2.1)
- [1.2.2 Connect to the Remote Repository](#1.2.2)
- [1.2.3 Create the Local Branch](#1.2.3)
- [1.2.4 Employ Pre-commit Hook](#1.2.4)
- [1.2.5 Modify and Commit Code](#1.2.5)
- [1.2.6 Keep the Local Repository Updated](#1.2.6)
- [1.2.7 Push to Remote Repository](#1.2.7)
- [1.2.8 Commit Pull Request](#1.2.8)
- [1.2.9 CLA and Unit Test](#1.2.9)
- [1.2.10 Delete Branch](#1.2.10)
- [1.2.11 Conventions](#1.2.11)
- [2. Summary](#2)
- [3. Inferences](#3)
......@@ -29,15 +29,28 @@
### 1.1 Branches of PaddleClas
PaddleClas will maintain the following two branches:
PaddleClas maintains the following two branches:
- release/x.x series: A stable release branch, which will be tagged with the release version of Paddle in due course. The latest branch and the default one is the release/2.3, which is compatible with Paddle v2.1.0. The branch of release/x.x series will continue to grow with future iteration, and the latest release will be maintained by default, while the former one will fix bugs with no other branches covered.
- develop : A development branch, which is adapted to the develop version of Paddle and is mainly used for developing new functions. A good choice for secondary development. To ensure that the develop branch can pull out the release/x.x when needed, only the API that is valid in Paddle's latest release branch can be adopted for its code. In other words, if a new API has been developed in this branch but not yet in the release, please do not use it in PaddleClas. Apart from that, features that do not involve the performance optimizations, parameter adjustments, and policy updates of the API can be developed normally.
- release/x.x series: Stable release branches, which are tagged with the release version of Paddle in due course.
The latest and the default branch is the release/2.3, which is compatible with Paddle v2.1.0.
The branch of release/x.x series will continue to grow with future iteration,
and the latest release will be maintained by default, while the former one will fix bugs with no other branches covered.
- develop : developing branch, which is adapted to the develop version of Paddle and is mainly used for
developing new functions. A good choice for secondary development.
To ensure that the develop branch can pull out the release/x.x when needed,
only the API that is valid in Paddle's latest release branch can be adopted for its code.
In other words, if a new API has been developed in this branch but not yet in the release,
please do not use it in PaddleClas. Apart from that, features that do not involve the performance optimizations,
parameter adjustments, and policy updates of the API can be developed normally.
The historical branches of PaddleClas will not be maintained, but will be remained for the existing users.
- release/static: This branch was used for static graph development and testing, and is currently compatible with >=1.7 versions of Paddle. It is still practicable for the special need of adapting an old version of Paddle, but the code will not be updated except for bug fixing.
- dygraph-dev: This branch will no longer be maintained and accept no new code. Please transfer to the develop branch as soon as possible.
- release/static: This branch was used for static graph development and testing,
and is currently compatible with >=1.7 versions of Paddle.
It is still practicable for the special need of adapting an old version of Paddle,
but the code will not be updated except for bug fixing.
- dygraph-dev: This branch will no longer be maintained and accept no new code.
Please transfer to the develop branch as soon as possible.
PaddleClas welcomes code contributions to the repo, and the basic process is detailed in the next part.
......@@ -49,13 +62,14 @@ PaddleClas welcomes code contributions to the repo, and the basic process is det
#### 1.2.1 Codes of Fork and Clone
- Skip to the home page of [PaddleClas GitHub](https://github.com/PaddlePaddle/PaddleClas) and click the Fork button to generate a repository in your own directory, such as `https://github.com/USERNAME/ PaddleClas`.
- Skip to the home page of [PaddleClas GitHub](https://github.com/PaddlePaddle/PaddleClas) and click the
Fork button to generate a repository in your own directory, such as `https://github.com/USERNAME/PaddleClas`.
[![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/quick_start/community/001_fork.png)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/quick_start/community/001_fork.png)
![img](../../images/quick_start/community/001_fork.png)
- Clone the remote repository to local
```
```shell
# Pull the code of the develop branch
git clone https://github.com/USERNAME/PaddleClas.git -b develop
cd PaddleClas
......@@ -63,7 +77,7 @@ cd PaddleClas
Obtain the address below
[![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/quick_start/community/002_clone.png)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/quick_start/community/002_clone.png)
![img](../../images/quick_start/community/002_clone.png)
......@@ -71,20 +85,22 @@ Obtain the address below
First check the current information of the remote repository with `git remote -v`.
```
```shell
origin https://github.com/USERNAME/PaddleClas.git (fetch)
origin https://github.com/USERNAME/PaddleClas.git (push)
```
The above information only contains the cloned remote repository, which is the PaddleClas under your username. Then we create a remote host of the original PaddleClas repository named upstream.
The above information only contains the cloned remote repository,
which is the PaddleClas under your username. Then we create a remote host of the original PaddleClas repository named upstream.
```
```shell
git remote add upstream https://github.com/PaddlePaddle/PaddleClas.git
```
Adopt `git remote -v` to view the current information of the remote repository, and 2 remote repositories including origin and upstream can be found, as shown below.
Adopt `git remote -v` to view the current information of the remote repository,
and 2 remote repositories including origin and upstream can be found, as shown below.
```
```shell
origin https://github.com/USERNAME/PaddleClas.git (fetch)
origin https://github.com/USERNAME/PaddleClas.git (push)
upstream https://github.com/PaddlePaddle/PaddleClas.git (fetch)
......@@ -99,21 +115,22 @@ This is mainly to keep the local repository updated when committing a pull reque
Run the following command to create a new local branch based on the current one.
```
```shell
git checkout -b new_branch
```
Or you can create new ones based on remote or upstream branches.
```
```shell
# Create the new_branch based on the develope of origin (unser remote repository)
git checkout -b new_branch origin/develop
# Create the new_branch base on the develope of upstream
# If you need to create a new branch from upstream, please first employ git fetch upstream to fetch the upstream code
# If you need to create a new branch from upstream,
# please first employ git fetch upstream to fetch the upstream code
git checkout -b new_branch upstream/develop
```
Then it is shown that it has switched to the new branch with the following output:
The following output shows that it has switched to the new branch with :
```
Branch new_branch set up to track remote branch develop from upstream.
......@@ -124,9 +141,13 @@ Switched to a new branch 'new_branch'
#### 1.2.4 Employ Pre-commit Hook
Paddle developers adopt the pre-commit tool to manage Git pre-commit hooks. It helps us format the source code (C++, Python) and automatically check basic issues before committing (e.g., one EOL per file, no large files added to Git, etc.).
Paddle developers adopt the pre-commit tool to manage Git pre-commit hooks.
It helps us format the source code (C++, Python) and automatically check basic issues before committing
e.g., one EOL per file, no large files added to Git, etc.
The pre-commit test is part of the unit tests in Travis-CI, and PRs that do not satisfy the hook cannot be committed to PaddleClas. Please install it first and run it in the current directory:
The pre-commit test is part of the unit tests in Travis-CI,
and PRs that do not satisfy the hook cannot be committed to PaddleClas.
Please install it first and run it in the current directory:
```
pip install pre-commit
......@@ -135,8 +156,9 @@ pre-commit install
- **Note**
3. Paddle uses clang-format to format C/C++ source code, please make sure `clang-format` has a version of 3.8 or higher.
4. `yapf` installed by `pip install pre-commit` and `conda install -c conda-forge pre-commit` is slightly different, and the former one is chosen by PaddleClas developers.
1. Paddle uses clang-format to format C/C++ source code, please make sure `clang-format` has a version of 3.8 or higher.
2. `yapf` installed by `pip install pre-commit` and `conda install -c conda-forge pre-commit` is slightly different,
and the former one is chosen by PaddleClas developers.
......@@ -145,12 +167,13 @@ pre-commit install
You can check the changed files via `git status`. Follow the steps below to commit the `README.md` of PaddleClas after modification:
```
git add README.mdpre-commit
git add README.md
pre-commit
```
Repeat the above steps until the pre-commit format check does not report an error, as shown below.
[![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/quick_start/community/003_precommit_pass.png)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/quick_start/community/003_precommit_pass.png)
![img](../../images/quick_start/community/003_precommit_pass.png)
Run the following command to commit.
......@@ -162,10 +185,13 @@ git commit -m "your commit info"
#### 1.2.6 Keep the Local Repository Updated
Get the latest code for upstream and update the current branch. The upstream here is from the `Connecting to a remote repository` part in section 1.2.
Get the latest code for upstream and update the current branch.
The upstream here is from the `Connecting to a remote repository` part in section 1.2.
```
git fetch upstream# If you want to commit to another branch, please pull the code from another branch of upstream, in this case it is developgit pull upstream develop
git fetch upstream
# If you want to commit to another branch, please pull the code from another branch of upstream, in this case it is develop
git pull upstream develop
```
......@@ -180,18 +206,25 @@ git push origin new_branch
#### 1.2.8 Commit Pull Request
Click new pull request and select the local branch and the target branch, as shown in the following figure. In the description of the PR, fill out what the PR accomplishes. Next, wait for the review, and if any changes are required, update the corresponding branch in origin by referring to the above steps.
Click new pull request and select the local branch and the target branch,
as shown in the following figure. In the description of the PR, fill out what the PR accomplishes.
Next, wait for the review, and if any changes are required,
update the corresponding branch in origin by referring to the above steps.
[![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/quick_start/community/004_create_pr.png)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/quick_start/community/004_create_pr.png)
[img](../../images/quick_start/community/004_create_pr.png)
#### 1.2.9 CLA and Unit Test
- When you first commit a Pull Request to PaddlePaddle, you will be required to sign a CLA (Contributor License Agreement) to ensure that your code can be merged, please follow the step below to sign CLA:
- When you first commit a Pull Request to PaddlePaddle,
you are required to sign a CLA (Contributor License Agreement) to ensure that your code can be merged,
please follow the step below to sign CLA:
1. Please examine the Check section of your PR, find license/cla, and click the detail on the right side to enter the CLA website
2. Click `Sign in with GitHub to agree` on the CLA website, and you will be redirected back to your Pull Request page when you are done.
1. Please examine the Check section of your PR, find license/cla,
and click the detail on the right side to enter the CLA website
2. Click `Sign in with GitHub to agree` on the CLA website,
and you will be redirected back to your Pull Request page when you are done.
......@@ -207,40 +240,52 @@ You can also delete the remote branch using `git push origin :branch name`, e.g.
git push origin :new_branch
```
- Delete local branch
- Delete local branch
```
# Switch to the develop branch, otherwise the current branch cannot be deletedgit checkout develop# Delete new_branchgit branch -D new_branch
# Switch to the develop branch, otherwise the current branch cannot be deleted
git checkout develop
# Delete new_branch
git branch -D new_branch
```
#### 1.2.11 Conventions
To help official maintainers focus on the code itself when reviewing it, please adhere to the following conventions each time you commit code:
1)Please pass the unit test in Travis-CI first. Otherwise, the submitted code may have problems and usually receive no official review.
2)Before committing a Pull Request:
To help official maintainers focus on the code itself when reviewing it,
please adhere to the following conventions each time you commit code:
1. Please pass the unit test in Travis-CI first.
Otherwise, the submitted code may have problems and usually receive no official review.
2. Before committing a Pull Request:
3.
Note the number of commits.
Reason: If only one file is modified but more than a dozen commits are committed with a few changes for each, this may overwhelm the reviewer for they need to check each and every commit for specific changes, including the case that the changes between commits overwrite each other.
Reason: If only one file is modified but more than a dozen commits are committed with a few changes for each,
this may overwhelm the reviewer for they need to check each and every commit for specific changes,
including the case that the changes between commits overwrite each other.
Recommendation: Minimize the number of commits each time, and add the last commit with `git commit --amend`. For multiple commits that have been pushed to a remote repository, please refer to [squash commits after push](https://stackoverflow.com/questions/5667884/how-to-squash-commits-in-git-after- they-have-been-pushed).
Recommendation: Minimize the number of commits each time, and add the last commit with `git commit --amend`.
For multiple commits that have been pushed to a remote repository, please refer to
[squash commits after push](https://stackoverflow.com/questions/5667884/how-to-squash-commits-in-git-after-they-have-been-pushed).
Please pay attention to the name of each commit: it should reflect the content of the current commit without being too casual.
Please pay attention to the name of each commit:
it should reflect the content of the current commit without being too casual.
3)If an issue is resolved, please add `fix #issue_number` to the first comment box of the Pull Request, so that the corresponding issue will be closed automatically when the Pull Request is merged. Please choose the appropriate term with keywords such as close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved, please choose the appropriate term. See details in [Closing issues via commit messages](https://help.github.com/articles/closing-issues-via-commit-messages).
3. If an issue is resolved, please add `fix #issue_number` to the first comment box of the Pull Request,
so that the corresponding issue will be closed automatically when the Pull Request is merged. Please choose the appropriate term with keywords such as close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved, please choose the appropriate term. See details in [Closing issues via commit messages](https://help.github.com/articles/closing-issues-via-commit-messages).
In addition, please stick to the following convention to respond to reviewers' comments:
1)Every review comment from the official maintainer is expected to be answered, which will better enhance the contribution of the open source community.
1. Every review comment from the official maintainer is expected to be answered,
which will better enhance the contribution of the open source community.
- If you agree with the review and finish the corresponding modification, please simply return Done;
- If you disagree with the review, please give your reasons.
2If there are plenty of review comments,
2. If there are plenty of review comments,
- Please present the revision in general.
- Please reply with `start a review` instead of a direct approach, for it may be overwhelming to receive the email of every reply.
......@@ -249,11 +294,12 @@ In addition, please stick to the following convention to respond to reviewers' c
## 2. Summary
- The open source community relies on the contributions and feedback of developers and users. We highly appreciate that and look forward to your valuable comments and Pull Requests to PaddleClas in the hope that together we can build a leading practical and comprehensive code repository for image recognition!
- The open source community relies on the contributions and feedback of developers and users.
We highly appreciate that and look forward to your valuable comments and Pull Requests to PaddleClas in the hope that together we can build a leading practical and comprehensive code repository for image recognition!
## 3. References
1. [Guide to PaddlePaddle Local Development](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/08_contribution/index_cn.html)
2. [Committing PR to Open Source Framework](
\ No newline at end of file
1. [Guide to PaddlePaddle Local Development](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/08_contribution/index_en.html)
2. [Committing PR to Open Source Framework](https://blog.csdn.net/vim_wj/article/details/78300239)
## Features of PaddleClas
PaddleClas is an image recognition toolset for industry and academia,
helping users train better computer vision models and apply them in real scenarios.
Specifically, it contains the following core features.
- Practical image recognition system: Integrate detection, feature learning,
and retrieval modules to be applicable to all types of image recognition tasks. Four sample solutions are provided,
including product recognition, vehicle recognition, logo recognition, and animation character recognition.
- Rich library of pre-trained models: Provide a total of 175 ImageNet pre-trained models of 36 series,
among which 7 selected series of models support fast structural modification.
- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be
combined and switched at will through configuration files.
- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by
more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset
and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc.
with the detailed introduction, code replication, and evaluation of effectiveness in a unified experimental environment.
![img](../../images/recognition.gif)
For more information about the quick start of image recognition, algorithm details, model training and evaluation,
and prediction and deployment methods, please refer to the [README Tutorial](../../../README_en.md) on home page.
......@@ -29,21 +29,26 @@ PaddleClas 主要代码和目录结构如下
<a name="2"></a>
## 2. 训练模块定义
深度学习模型训练过程中,主要包含数据、模型结构、损失函数、优化器和学习率衰减、权重衰减策略等,以下一一解读。
深度学习模型训练模块,主要包含数据、模型结构、损失函数、优化器和学习率衰减、权重衰减策略等,以下一一解读。
<a name="2.1"></a>
### 2.1 数据
对于有监督任务来说,训练数据一般包含原始数据及其标注。在基于单标签的图像分类任务中,原始数据指的是图像数据,而标注则是该图像数据所属的类比。PaddleClas 中,训练时需要提供标签文件,形式如下,每一行包含一条训练样本,分别表示图片路径和类别标签,用分隔符隔开(默认为空格)。
对于有监督任务来说,训练数据一般包含原始数据及其标注。
在基于单标签的图像分类任务中,原始数据指的是图像数据,而标注则是该图像数据所属的类别。
PaddleClas 中,训练时需要提供标签文件,形式如下,每一行包含一条训练样本,分别表示图片路径和类别标签,用分隔符隔开(默认为空格)。
```
train/n01440764/n01440764_10026.JPEG 0
train/n01440764/n01440764_10027.JPEG 0
```
在代码 `ppcls/data/dataloader/common_dataset.py` 中,包含 `CommonDataset` 类,继承自 `paddle.io.Dataset`,该数据集类可以通过一个键值进行索引并获取指定样本。`ImageNetDataset`, `LogoDataset`, `CommonDataset` 等数据集类都继承自这个类别
在代码 `ppcls/data/dataloader/common_dataset.py` 中,包含 `CommonDataset` 类,继承自 `paddle.io.Dataset`
该数据集类可以通过一个键值进行索引并获取指定样本。`ImageNetDataset`, `LogoDataset`, `CommonDataset` 等数据集类都继承自这个类别
对于读入的数据,需要通过数据转换,将原始的图像数据进行转换。训练时,标准的数据预处理包含:`DecodeImage`, `RandCropImage`, `RandFlipImage`, `NormalizeImage`, `ToCHWImage`。在配置文件中体现如下,数据预处理主要包含在 `transforms` 字段中,以列表形式呈现,会按照顺序对数据依次做这些转换。
对于读入的数据,需要通过数据转换,将原始的图像数据进行转换。训练时,标准的数据预处理包含:`DecodeImage`, `RandCropImage`,
`RandFlipImage`, `NormalizeImage`, `ToCHWImage`
在配置文件中体现如下,数据预处理主要包含在 `transforms` 字段中,以列表形式呈现,会按照顺序对数据依次做这些转换。
```yaml
DataLoader:
......@@ -67,9 +72,12 @@ DataLoader:
order: ''
```
PaddleClas 中也包含了 `AutoAugment`, `RandAugment` 等数据增广方法,也可以通过在配置文件中配置,从而添加到训练过程的数据预处理中。每个数据转换的方法均以类实现,方便迁移和复用,更多的数据处理具体实现过程可以参考 `ppcls/data/preprocess/ops/` 下的代码。
PaddleClas 中也包含了 `AutoAugment`, `RandAugment` 等数据增广方法,也可以通过在配置文件中配置,从而添加到训练过程的数据预处理中。
每个数据转换的方法均以类实现,方便迁移和复用,更多的数据处理具体实现过程可以参考 `ppcls/data/preprocess/ops/` 下的代码。
对于组成一个 batch 的数据,也可以使用 mixup 或者 cutmix 等方法进行数据增广。 PaddleClas 中集成了 `MixupOperator`, `CutmixOperator`, `FmixOperator` 等基于 batch 的数据增广方法,可以在配置文件中配置 mix 参数进行配置,更加具体的实现可以参考 `ppcls/data/preprocess/batch_ops/batch_operators.py`
对于组成一个 batch 的数据,也可以使用 mixup 或者 cutmix 等方法进行数据增广。
PaddleClas 中集成了 `MixupOperator`, `CutmixOperator`, `FmixOperator` 等基于 batch 的数据增广方法,
可以在配置文件中配置 mix 参数进行配置,更加具体的实现可以参考 `ppcls/data/preprocess/batch_ops/batch_operators.py`
图像分类中,数据后处理主要为 `argmax` 操作,在此不再赘述。
......@@ -86,7 +94,8 @@ Arch:
use_ssld: False
```
`Arch.name` 表示模型名称,`Arch.pretrained` 表示是否添加预训练模型,`use_ssld` 表示是否使用基于 `SSLD` 知识蒸馏得到的预训练模型。所有的模型名称均在 `ppcls/arch/backbone/__init__.py` 中定义。
`Arch.name` 表示模型名称,`Arch.pretrained` 表示是否添加预训练模型,`Arch.use_ssld` 表示是否使用基于 `SSLD` 知识蒸馏得到的预训练模型。
所有的模型名称均在 `ppcls/arch/backbone/__init__.py` 中定义。
对应的,在 `ppcls/arch/__init__.py` 中,通过 `build_model` 方法创建模型对象。
......@@ -180,12 +189,14 @@ def build_optimizer(config, epochs, step_each_epoch, parameters):
return optim, lr
```
不同优化器和权重衰减策略均以类的形式实现,具体实现可以参考文件 `ppcls/optimizer/optimizer.py`;不同的学习率衰减策略可以参考文件 `ppcls/optimizer/learning_rate.py`
不同优化器和权重衰减策略均以类的形式实现,具体实现可以参考文件 `ppcls/optimizer/optimizer.py`.
不同的学习率衰减策略可以参考文件 `ppcls/optimizer/learning_rate.py`
<a name="2.5"></a>
### 2.5 训练时评估
模型在训练的时候,可以设置模型保存的间隔,也可以选择每隔若干个 epoch 对验证集进行评估,从而可以保存在验证集上精度最佳的模型。配置文件中,可以通过下面的字段进行配置。
模型在训练的时候,可以设置模型保存的间隔,也可以选择每隔若干个 epoch 对验证集进行评估,
从而可以保存在验证集上精度最佳的模型。配置文件中,可以通过下面的字段进行配置。
```yaml
Global:
......@@ -209,16 +220,14 @@ def save_model(program, model_path, epoch_id, prefix='ppcls'):
```
在保存的时候有两点需要注意:
1. 只在 0 号节点上保存模型。否则多卡训练的时候,如果所有节点都保存模型到相同的路径,则多个节点写文件时可能会发生写文件冲突,导致最终保存的模型无法被正确加载。
2. 优化器参数也需要存储,方便后续的加载断点进行训练。
1. 只在 0 号节点上保存模型。否则多卡训练的时候,如果所有节点都保存模型到相同的路径,
2. 则多个节点写文件时可能会发生写文件冲突,导致最终保存的模型无法被正确加载。
3. 优化器参数也需要存储,方便后续的加载断点进行训练。
* 模型裁剪、量化训练
如果想对模型进行压缩训练,则通过下面字段进行配置
<a name="2.7"></a>
### 2.7 模型裁剪与量化
如果想对模型进行压缩训练,则通过下面字段进行配置
1.模型裁剪:
```yaml
......@@ -236,7 +245,8 @@ Slim:
name: pact
```
训练方法详见模型[裁剪量化使用介绍](../advanced_tutorials/model_prune_quantization.md),算法介绍详见[裁剪量化算法介绍](../algorithm_introduction/model_prune_quantization.md)
训练方法详见模型[裁剪量化使用介绍](../advanced_tutorials/model_prune_quantization.md)
算法介绍详见[裁剪量化算法介绍](../algorithm_introduction/model_prune_quantization.md)
<a name="3"></a>
## 3. 预测部署代码和方式
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册