code_overview_en.md 11.7 KB
Newer Older
W
weishengyu 已提交
1 2 3 4
# Code Overview

## Contents

5 6 7 8 9 10 11 12 13 14 15 16 17
- [Overview of Code and Content](#1)
- [Training Module](#2)
  - [2.1 Data](#2.1)
  - [2.2 Model Structure](#2.2)
  - [2.3 Loss Function](#2.3)
  - [2.4 Optimizer, Learning Rate Decay, and Weight Decay](#2.4)
  - [2.5 Evaluation During Training](#2.5)
  - [2.6 Model Saving](#2.6)
  - [2.7 Model Pruning and Quantification](#2.7)
- [Codes and Methods for Inference and Deployment](#3)


<a name="1"></a>
W
weishengyu 已提交
18 19 20 21
## 1 Overview of Code and Content

The main code and content structure of PaddleClas are as follows:

22 23 24 25 26
- benchmark: shell scripts to test the speed metrics of different models in PaddleClas, such as single-card training speed metrics, multi-card training speed metrics, etc.
- dataset: datasets and the scripts used to process datasets. The scripts are responsible for processing the dataset into a suitable format for Dataloader.
- deploy: code for deployment, including deployment tools, which support python/cpp inference, Hub Serveing, Paddle Lite, Slim offline quantification and other deployment methods.
- ppcls: code for training and evaluation which is  the main body of the PaddleClas framework. It also contains configuration files, and specific code of model training, evaluation, inference, dynamic to static export, etc.
- tools: entry functions and scripts for training, evaluation, inference, and dynamic to static export.
W
weishengyu 已提交
27
- The requirements.txt file is adopted to install the dependencies for PaddleClas. Use pip for upgrading, installation, and application.
28
- test_tipc: TIPC tests of PaddleClas models from training to prediction to verify that whether each function works properly.
W
weishengyu 已提交
29 30


31
<a name="2"></a>
W
weishengyu 已提交
32 33
## 2 Training Module

34 35
Modules of training deep learning model mainly contains data, model structure, loss function,
strategies such as optimizer, learning rate decay, and weight decay strategy, etc., which are explained below.
W
weishengyu 已提交
36 37


38
<a name="2.1"></a>
W
weishengyu 已提交
39 40
## 2.1 Data

41 42 43 44 45 46
For supervised tasks, the training data generally contains the raw data and its annotation.
In a single-label-based image classification task, the raw data refers to the image data,
while the annotation is the class to which the image data belongs. In PaddleClas, a label file,
in the following format, is required for training,
with each row containing one training sample and separated by a separator (space by default),
representing the image path and the class label respectively.
W
weishengyu 已提交
47 48 49 50 51 52

```
train/n01440764/n01440764_10026.JPEG 0
train/n01440764/n01440764_10027.JPEG 0
```

53 54 55
`ppcls/data/dataloader/common_dataset.py` contains the `CommonDataset` class inherited from `paddle.io.Dataset`,
which is a dataset class that can index and fetch a given sample by a key value.
Dataset classes such as `ImageNetDataset`, `LogoDataset`, `CommonDataset`, etc. are all inherited from this class.
W
weishengyu 已提交
56

57 58 59 60 61
The raw image needs to be preprocessed before training.
The standard data preprocessing during training contains
`DecodeImage`, `RandCropImage`, `RandFlipImage`, `NormalizeImage`, and `ToCHWImage`.
The data preprocessing is mainly in the `transforms` field, which is presented in a list,
and then converts the data in order, as reflected in the configuration file below.
W
weishengyu 已提交
62

63
```yaml
W
weishengyu 已提交
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
DataLoader:
  Train:
    dataset:
      name: ImageNetDataset
      image_root: ./dataset/ILSVRC2012/
      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
      transform_ops:
        - DecodeImage:
            to_rgb: True
            channel_first: False
        - RandCropImage:
            size: 224
        - RandFlipImage:
            flip_code: 1
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
```

85 86 87 88
PaddleClas also contains `AutoAugment`, `RandAugment`, and other data augmentation methods,
which can also be configured in the configuration file and thus added to the data preprocessing of the training.
Each data augmentation and process method is implemented as a class for easy migration and reuse.
For more specific implementation of data processing, please refer to the code under `ppcls/data/preprocess/ops/`.
W
weishengyu 已提交
89

90 91 92 93
You can also use methods such as mixup or cutmix to augment the data that make up a batch.
PaddleClas integrates `MixupOperator`, `CutmixOperator`, `FmixOperator`, and other batch-based data augmentation methods,
which can be configured by deploying the mix parameter in the configuration file.
For code implementation, please refer to `ppcls/data/preprocess /batch_ops/batch_operators.py`.
W
weishengyu 已提交
94 95 96 97

In image classification, the data post-processing is mainly `argmax` operation, which is not elaborated here.


98
<a name="2.2"></a>
W
weishengyu 已提交
99 100 101 102
## 2.2 Model Structure

The model in the configuration file is structured as follows:

103
```yaml
W
weishengyu 已提交
104 105 106 107 108 109 110
Arch:
  name: ResNet50
  class_num: 1000
  pretrained: False
  use_ssld: False
```

111 112 113 114 115 116 117
`Arch.name`: the name of the model

`Arch.pretrained`: whether to add a pre-trained model

`Arch.use_ssld`: whether to use a pre-trained model based on `SSLD` knowledge distillation.

All model names are defined in `ppcls/arch/backbone/__init__.py`.
W
weishengyu 已提交
118 119 120

Correspondingly, the model object is created in `ppcls/arch/__init__.py` with the `build_model` method.

121
```python
W
weishengyu 已提交
122 123 124 125 126 127 128 129 130
def build_model(config):
    config = copy.deepcopy(config)
    model_type = config.pop("name")
    mod = importlib.import_module(__name__)
    arch = getattr(mod, model_type)(**config)
    return arch
```


131
<a name="2.3"></a>
W
weishengyu 已提交
132 133
## 2.3 Loss Function

134
PaddleClas implement `CELoss` , `JSDivLoss`, `TripletLoss`, `CenterLoss` and other loss functions, all defined in `ppcls/loss`.
W
weishengyu 已提交
135

136 137 138
In the `ppcls/loss/__init__.py` file, `CombinedLoss` is used to construct and combine loss functions.
The loss functions and calculation methods required in different training strategies are disparate,
and the following factors are considered by PaddleClas in the construction of the loss function.
W
weishengyu 已提交
139 140 141 142 143 144

1. whether to use label smooth
2. whether to use mixup or cutmix
3. whether to use distillation method for training
4. whether to train metric learning

145 146
User can specify the type and weight of the loss function in the configuration file,
such as adding TripletLossV2 to the training, the configuration file is as follows:
W
weishengyu 已提交
147

148
```yaml
W
weishengyu 已提交
149 150 151 152 153 154 155 156 157 158
Loss:
  Train:
    - CELoss:
        weight: 1.0
    - TripletLossV2:
        weight: 1.0
        margin: 0.5
```


159
<a name="2.4"></a>
W
weishengyu 已提交
160 161
## 2.4 Optimizer, Learning Rate Decay, and Weight Decay

162 163
In image classification tasks, `Momentum` is a commonly used optimizer,
and several optimizer strategies such as `Momentum`, `RMSProp`, `Adam`, and `AdamW` are provided in PaddleClas.
W
weishengyu 已提交
164

165 166
The weight decay strategy is a common regularization method, mainly adopted to prevent model overfitting.
Two weight decay strategies, `L1Decay` and `L2Decay`, are provided in PaddleClas.
W
weishengyu 已提交
167

168 169
Learning rate decay is an essential training method for accuracy improvement in image classification tasks.
PaddleClas currently supports `Cosine`, `Piecewise`, `Linear`, and other learning rate decay strategies.
W
weishengyu 已提交
170

171 172
In the configuration file, the optimizer, weight decay,
and learning rate decay strategies can be configured with the following fields.
W
weishengyu 已提交
173

174
```yaml
W
weishengyu 已提交
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189
Optimizer:
  name: Momentum
  momentum: 0.9
  lr:
    name: Piecewise
    learning_rate: 0.1
    decay_epochs: [30, 60, 90]
    values: [0.1, 0.01, 0.001, 0.0001]
  regularizer:
    name: 'L2'
    coeff: 0.0001
```

Employ `build_optimizer` in `ppcls/optimizer/__init__.py` to create the optimizer and learning rate objects.

190
```python
W
weishengyu 已提交
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218
def build_optimizer(config, epochs, step_each_epoch, parameters):
    config = copy.deepcopy(config)
    # step1 build lr
    lr = build_lr_scheduler(config.pop('lr'), epochs, step_each_epoch)
    logger.debug("build lr ({}) success..".format(lr))
    # step2 build regularization
    if 'regularizer' in config and config['regularizer'] is not None:
        reg_config = config.pop('regularizer')
        reg_name = reg_config.pop('name') + 'Decay'
        reg = getattr(paddle.regularizer, reg_name)(**reg_config)
    else:
        reg = None
    logger.debug("build regularizer ({}) success..".format(reg))
    # step3 build optimizer
    optim_name = config.pop('name')
    if 'clip_norm' in config:
        clip_norm = config.pop('clip_norm')
        grad_clip = paddle.nn.ClipGradByNorm(clip_norm=clip_norm)
    else:
        grad_clip = None
    optim = getattr(optimizer, optim_name)(learning_rate=lr,
                                           weight_decay=reg,
                                           grad_clip=grad_clip,
                                           **config)(parameters=parameters)
    logger.debug("build optimizer ({}) success..".format(optim))
    return optim, lr
```

219 220 221
Different optimizers and weight decay strategies are implemented as classes,
which can be found in the file `ppcls/optimizer/optimizer.py`.
Different learning rate decay strategies can be found in the file `ppcls/optimizer/learning_rate.py`.
W
weishengyu 已提交
222 223


224
<a name="2.5"></a>
W
weishengyu 已提交
225 226
## 2.5 Evaluation During Training

227 228 229
When training the model, you can set the interval of model saving,
or you can evaluate the validation set every several epochs so that the model with the best accuracy can be saved.
Follow the examples below to configure.
W
weishengyu 已提交
230 231 232 233 234 235 236 237 238

```
Global:
  save_interval: 1 # epoch interval of model saving
  eval_during_train: True # whether evaluate during training
  eval_interval: 1 # epoch interval of evaluation
```


239
<a name="2.6"></a>
W
weishengyu 已提交
240 241
## 2.6 Model Saving

242 243 244 245 246 247 248 249 250 251 252 253
The model is saved through the `paddle.save()` function of the Paddle framework.
The dynamic graph version of the model is saved in the form of a dictionary to facilitate further training.
The specific implementation is as follows:

```python
def save_model(program, model_path, epoch_id, prefix='ppcls'):
    model_path = os.path.join(model_path, str(epoch_id))
    _mkdir_if_not_exist(model_path)
    model_prefix = os.path.join(model_path, prefix)
    paddle.static.save(program, model_prefix)
    logger.info(
        logger.coloring("Already save model in {}".format(model_path), "HEADER"))
W
weishengyu 已提交
254 255 256 257
```

When saving, there are two things to keep in mind:

258 259 260
1. Only save the model on node 0, otherwise, if all nodes save models to the same path,
a file conflict may occur during multi-card training when multiple nodes write files,
preventing the final saved model from being loaded correctly.
W
weishengyu 已提交
261 262 263
2. Optimizer parameters also need to be saved to facilitate subsequent loading of breakpoints for training.


264
<a name="2.7"></a>
W
weishengyu 已提交
265 266
## 2.7 Model Pruning and Quantification

267 268
If you want to conduct compression training, please configure with the following fields.

W
weishengyu 已提交
269 270
1. Model pruning:

271 272 273 274 275
```yaml
Slim:
  prune:
    name: fpgm
    pruned_ratio: 0.3
W
weishengyu 已提交
276 277 278 279
```

2. Model quantification:

280 281 282 283
```yaml
Slim:
  quant:
    name: pact
W
weishengyu 已提交
284
```
285 286
For details of the training method, see [Pruning and Quantification Application](model_prune_quantization_en.md),
and the algorithm is described in [Pruning and Quantification algorithms](model_prune_quantization_en.md).
W
weishengyu 已提交
287 288


289
<a name="3"></a>
W
weishengyu 已提交
290 291
## 3 Codes and Methods for Inference and Deployment

292 293 294 295 296 297 298 299 300 301 302 303
- If you wish to quantify the classification model offline, please refer to
[Model Pruning and Quantification Tutorial](model_prune_quantization_en.md) for offline quantification.
- If you wish to use python for server-side deployment,
please refer to [Python Inference Tutorial](../inference_deployment/python_deploy_en.md).
- If you wish to use cpp for server-side deployment,
please refer to [Cpp Inference Tutorial](../inference_deployment/cpp_deploy_en.md).
- If you wish to deploy the classification model as a service,
please refer to the [Hub Serving Inference Deployment Tutorial](../inference_deployment/paddle_hub_serving_deploy_en.md).
- If you wish to use classification models for inference on mobile,
please refer to [PaddleLite Inference Deployment Tutorial](../inference_deployment/paddle_lite_deploy_en.md)
- If you wish to use the whl package for inference of classification models,
please refer to [whl Package Inference](../inference_deployment/whl_deploy_en.md) .