add_new_algorithm_en.md 9.7 KB
Newer Older
W
WenmuZhou 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
# Add new algorithm

PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms.

* Data loading and processing
* Network
* Post-processing
* Loss
* Metric
* Optimizer

The following will introduce each part separately, and introduce how to add the modules required for the new algorithm.


## Data loading and processing

Data loading and processing are composed of different modules, which complete the image reading, data augment and label production. This part is under [ppocr/data](../../ppocr/data). The explanation of each file and folder are as follows:

```bash
ppocr/data/
├── imaug             # Scripts for image reading, data augment and label production
│   ├── label_ops.py  # Modules that transform the label
│   ├── operators.py  # Modules that transform the image
│   ├──.....
├── __init__.py
├── lmdb_dataset.py   # The dataset that reads the lmdb
└── simple_dataset.py # Read the dataset saved in the form of `image_path\tgt`
```

PaddleOCR has a large number of built-in image operation related modules. For modules that are not built-in, you can add them through the following steps:

1. Create a new file under the [ppocr/data/imaug](../../ppocr/data/imaug) folder, such as my_module.py.
2. Add code in the my_module.py file, the sample code is as follows:

```python
class MyModule:
    def __init__(self, *args, **kwargs):
        # your init code
        pass

    def __call__(self, data):
        img = data['image']
        label = data['label']
        # your process code

        data['image'] = img
        data['label'] = label
        return data
```

3. Import the added module in the [ppocr/data/imaug/\__init\__.py](../../ppocr/data/imaug/__init__.py) file.

All different modules of data processing are executed by sequence, combined and executed in the form of a list in the config file. Such as:

```yaml
# angle class data process
transforms:
  - DecodeImage: # load image
      img_mode: BGR
      channel_first: False
  - MyModule:
      args1: args1
      args2: args2
  - KeepKeys:
      keep_keys: [ 'image', 'label' ] # dataloader will return list in this order
```

## Network

The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).

```bash
├── architectures # Code for building network
├── transforms    # Image Transformation Module
├── backbones     # Feature extraction module
├── necks         # Feature enhancement module
└── heads         # Output module
```

PaddleOCR has built-in commonly used modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in, you can add them through the following steps, the four parts are added in the same steps, take backbones as an example:

1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
2. Add code in the my_backbone.py file, the sample code is as follows:

```python
import paddle
import paddle.nn as nn
import paddle.nn.functional as F


class MyBackbone(nn.Layer):
    def __init__(self, *args, **kwargs):
        super(MyBackbone, self).__init__()
        # your init code
        self.conv = nn.xxxx

    def forward(self, inputs):
        # your necwork forward
        y = self.conv(inputs)
        return y
```

3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.

After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:

```yaml
Architecture:
  model_type: rec
  algorithm: CRNN
  Transform:
    name: MyTransform
    args1: args1
    args2: args2
  Backbone:
    name: MyBackbone
    args1: args1
  Neck:
    name: MyNeck
    args1: args1
  Head:
    name: MyHead
    args1: args1
```

## Post-processing

W
WenmuZhou 已提交
129
Post-processing realizes decoding network output to obtain text box or recognized text. This part is under [ppocr/postprocess](../../ppocr/postprocess).
W
WenmuZhou 已提交
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304
PaddleOCR has built-in post-processing modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For components that are not built-in, they can be added through the following steps:

1. Create a new file under the [ppocr/postprocess](../../ppocr/postprocess) folder, such as my_postprocess.py.
2. Add code in the my_postprocess.py file, the sample code is as follows:

```python
import paddle


class MyPostProcess:
    def __init__(self, *args, **kwargs):
        # your init code
        pass

    def __call__(self, preds, label=None, *args, **kwargs):
        if isinstance(preds, paddle.Tensor):
            preds = preds.numpy()
        # you preds decode code
        preds = self.decode_preds(preds)
        if label is None:
            return preds
        # you label decode code
        label = self.decode_label(label)
        return preds, label

    def decode_preds(self, preds):
        # you preds decode code
        pass

    def decode_label(self, preds):
        # you label decode code
        pass
```

3. Import the added module in the [ppocr/postprocess/\__init\__.py](../../ppocr/postprocess/__init__.py) file.

After the post-processing module is added, you only need to configure it in the configuration file to use, such as:

```yaml
PostProcess:
  name: MyPostProcess
  args1: args1
  args2: args2
```

## Loss

The loss function is used to calculate the distance between the network output and the label. This part is under [ppocr/losses](../../ppocr/losses).
PaddleOCR has built-in loss function modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in modules, you can add them through the following steps:

1. Create a new file in the [ppocr/losses](../../ppocr/losses) folder, such as my_loss.py.
2. Add code in the my_loss.py file, the sample code is as follows:

```python
import paddle
from paddle import nn


class MyLoss(nn.Layer):
    def __init__(self, **kwargs):
        super(MyLoss, self).__init__()
        # you init code
        pass

    def __call__(self, predicts, batch):
        label = batch[1]
        # your loss code
        loss = self.loss(input=predicts, label=label)
        return {'loss': loss}
```

3. Import the added module in the [ppocr/losses/\__init\__.py](../../ppocr/losses/__init__.py) file.

After the loss function module is added, you only need to configure it in the configuration file to use it, such as:

```yaml
Loss:
  name: MyLoss
  args1: args1
  args2: args2
```

## Metric

Metric is used to calculate the performance of the network on the current batch. This part is under [ppocr/metrics](../../ppocr/metrics). PaddleOCR has built-in evaluation modules related to algorithms such as detection, classification and recognition. For modules that do not have built-in modules, you can add them through the following steps:

1. Create a new file under the [ppocr/metrics](../../ppocr/metrics) folder, such as my_metric.py.
2. Add code in the my_metric.py file, the sample code is as follows:

```python

class MyMetric(object):
    def __init__(self, main_indicator='acc', **kwargs):
        # main_indicator is used for select best model
        self.main_indicator = main_indicator
        self.reset()

    def __call__(self, preds, batch, *args, **kwargs):
        # preds is out of postprocess
        # batch is out of dataloader
        labels = batch[1]
        cur_correct_num = 0
        cur_all_num = 0
        # you metric code
        self.correct_num += cur_correct_num
        self.all_num += cur_all_num
        return {'acc': cur_correct_num / cur_all_num, }

    def get_metric(self):
        """
        return metircs {
                 'acc': 0,
                 'norm_edit_dis': 0,
            }
        """
        acc = self.correct_num / self.all_num
        self.reset()
        return {'acc': acc}

    def reset(self):
        # reset metric
        self.correct_num = 0
        self.all_num = 0

```

3. Import the added module in the [ppocr/metrics/\__init\__.py](../../ppocr/metrics/__init__.py) file.

After the metric module is added, you only need to configure it in the configuration file to use it, such as:

```yaml
Metric:
  name: MyMetric
  main_indicator: acc
```

## 优化器

The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under [ppocr/optimizer](../../ppocr/optimizer). PaddleOCR has built-in
Commonly used optimizer modules such as `Momentum`, `Adam` and `RMSProp`, common regularization modules such as `Linear`, `Cosine`, `Step` and `Piecewise`, and common learning rate decay modules such as `L1Decay` and `L2Decay`.
Modules without built-in can be added through the following steps, take `optimizer` as an example:

1. Create your own optimizer in the [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) file, the sample code is as follows:

```python
from paddle import optimizer as optim


class MyOptim(object):
    def __init__(self, learning_rate=0.001, *args, **kwargs):
        self.learning_rate = learning_rate

    def __call__(self, parameters):
        # It is recommended to wrap the built-in optimizer of paddle
        opt = optim.XXX(
            learning_rate=self.learning_rate,
            parameters=parameters)
        return opt

```

After the optimizer module is added, you only need to configure it in the configuration file to use, such as:

```yaml
Optimizer:
  name: MyOptim
  args1: args1
  args2: args2
  lr:
    name: Cosine
    learning_rate: 0.001
  regularizer:
    name: 'L2'
    factor: 0
```