diff --git a/doc/doc_ch/add_new_algorithm.md b/doc/doc_ch/add_new_algorithm.md new file mode 100644 index 0000000000000000000000000000000000000000..7cb0ffe52618990b57f729e61a15658ca9e16c61 --- /dev/null +++ b/doc/doc_ch/add_new_algorithm.md @@ -0,0 +1,303 @@ +# 添加新算法 + +PaddleOCR将一个算法分解为以下几个部分,并对各部分进行模块化处理,方便快速组合出新的算法。 + +* 数据加载和处理 +* 网络 +* 后处理 +* 损失函数 +* 指标评估 +* 优化器 + +下面将分别对每个部分进行介绍,并介绍如何在该部分里添加新算法所需模块。 + +## 数据加载和处理 + +数据加载和处理由不同的模块(module)组成,其完成了图片的读取、数据增强和label的制作。这一部分在[ppocr/data](../../ppocr/data)下。 各个文件及文件夹作用说明如下: + +```bash +ppocr/data/ +├── imaug # 图片的读取、数据增强和label制作相关的文件 +│ ├── label_ops.py # 对label进行变换的modules +│ ├── operators.py # 对image进行变换的modules +│ ├──..... +├── __init__.py +├── lmdb_dataset.py # 读取lmdb的数据集的dataset +└── simple_dataset.py # 读取以`image_path\tgt`形式保存的数据集的dataset +``` + +PaddleOCR内置了大量图像操作相关模块,对于没有没有内置的模块可通过如下步骤添加: + +1. 在 [ppocr/data/imaug](../../ppocr/data/imaug) 文件夹下新建文件,如my_module.py。 +2. 在 my_module.py 文件内添加相关代码,示例代码如下: + +```python +class MyModule: + def __init__(self, *args, **kwargs): + # your init code + pass + + def __call__(self, data): + img = data['image'] + label = data['label'] + # your process code + + data['image'] = img + data['label'] = label + return data +``` + +3. 在 [ppocr/data/imaug/\__init\__.py](../../ppocr/data/imaug/__init__.py) 文件内导入添加的模块。 + +数据处理的所有处理步骤由不同的模块顺序执行而成,在config文件中按照列表的形式组合并执行。如: + +```yaml +# angle class data process +transforms: + - DecodeImage: # load image + img_mode: BGR + channel_first: False + - MyModule: + args1: args1 + args2: args2 + - KeepKeys: + keep_keys: [ 'image', 'label' ] # dataloader will return list in this order +``` + +## 网络 + +网络部分完成了网络的组网操作,PaddleOCR将网络划分为四部分,这一部分在[ppocr/modeling](../../ppocr/modeling)下。 进入网络的数据将按照顺序(transforms->backbones-> +necks->heads)依次通过这四个部分。 + +```bash +├── architectures # 网络的组网代码 +├── transforms # 网络的图像变换模块 +├── backbones # 网络的特征提取模块 +├── necks # 网络的特征增强模块 +└── heads # 网络的输出模块 +``` + +PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的常用模块,对于没有内置的模块可通过如下步骤添加,四个部分添加步骤一致,以backbones为例: + +1. 在 [ppocr/modeling/backbones](../../ppocr/modeling/backbones) 文件夹下新建文件,如my_backbone.py。 +2. 在 my_backbone.py 文件内添加相关代码,示例代码如下: + +```python +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +class MyBackbone(nn.Layer): + def __init__(self, *args, **kwargs): + super(MyBackbone, self).__init__() + # your init code + self.conv = nn.xxxx + + def forward(self, inputs): + # your necwork forward + y = self.conv(inputs) + return y +``` + +3. 在 [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py)文件内导入添加的模块。 + +在完成网络的四部分模块添加之后,只需要配置文件中进行配置即可使用,如: + +```yaml +Architecture: + model_type: rec + algorithm: CRNN + Transform: + name: MyTransform + args1: args1 + args2: args2 + Backbone: + name: MyBackbone + args1: args1 + Neck: + name: MyNeck + args1: args1 + Head: + name: MyHead + args1: args1 +``` + +## 后处理 + +后处理主要完成从网络输出到人类友好结果的变换。这一部分在[ppocr/postprocess](../../ppocr/postprocess)下。 +PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的后处理模块,对于没有内置的组件可通过如下步骤添加: + +1. 在 [ppocr/postprocess](../../ppocr/postprocess) 文件夹下新建文件,如 my_postprocess.py。 +2. 在 my_postprocess.py 文件内添加相关代码,示例代码如下: + +```python +import paddle + + +class MyPostProcess: + def __init__(self, *args, **kwargs): + # your init code + pass + + def __call__(self, preds, label=None, *args, **kwargs): + if isinstance(preds, paddle.Tensor): + preds = preds.numpy() + # you preds decode code + preds = self.decode_preds(preds) + if label is None: + return preds + # you label decode code + label = self.decode_label(label) + return preds, label + + def decode_preds(self, preds): + # you preds decode code + pass + + def decode_label(self, preds): + # you label decode code + pass +``` + +3. 在 [ppocr/postprocess/\__init\__.py](../../ppocr/postprocess/__init__.py)文件内导入添加的模块。 + +在后处理模块添加之后,只需要配置文件中进行配置即可使用,如: + +```yaml +PostProcess: + name: MyPostProcess + args1: args1 + args2: args2 +``` + +## 损失函数 + +损失函数用于计算网络输出和label之间的距离。这一部分在[ppocr/losses](../../ppocr/losses)下。 +PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的损失函数模块,对于没有内置的模块可通过如下步骤添加: + +1. 在 [ppocr/losses](../../ppocr/losses) 文件夹下新建文件,如 my_loss.py。 +2. 在 my_loss.py 文件内添加相关代码,示例代码如下: + +```python +import paddle +from paddle import nn + + +class MyLoss(nn.Layer): + def __init__(self, **kwargs): + super(MyLoss, self).__init__() + # you init code + pass + + def __call__(self, predicts, batch): + label = batch[1] + # your loss code + loss = self.loss(input=predicts, label=label) + return {'loss': loss} +``` + +3. 在 [ppocr/losses/\__init\__.py](../../ppocr/losses/__init__.py)文件内导入添加的模块。 + +在损失函数添加之后,只需要配置文件中进行配置即可使用,如: + +```yaml +Loss: + name: MyLoss + args1: args1 + args2: args2 +``` + +## 指标评估 + +指标评估用于计算网络在当前batch上的性能。这一部分在[ppocr/metrics](../../ppocr/metrics)下。 PaddleOCR内置了检测,分类和识别等算法相关的指标评估模块,对于没有内置的模块可通过如下步骤添加: + +1. 在 [ppocr/metrics](../../ppocr/metrics) 文件夹下新建文件,如my_metric.py。 +2. 在 my_metric.py 文件内添加相关代码,示例代码如下: + +```python + +class MyMetric(object): + def __init__(self, main_indicator='acc', **kwargs): + # main_indicator is used for select best model + self.main_indicator = main_indicator + self.reset() + + def __call__(self, preds, batch, *args, **kwargs): + # preds is out of postprocess + # batch is out of dataloader + labels = batch[1] + cur_correct_num = 0 + cur_all_num = 0 + # you metric code + self.correct_num += cur_correct_num + self.all_num += cur_all_num + return {'acc': cur_correct_num / cur_all_num, } + + def get_metric(self): + """ + return metircs { + 'acc': 0, + 'norm_edit_dis': 0, + } + """ + acc = self.correct_num / self.all_num + self.reset() + return {'acc': acc} + + def reset(self): + # reset metric + self.correct_num = 0 + self.all_num = 0 + +``` + +3. 在 [ppocr/metrics/\__init\__.py](../../ppocr/metrics/__init__.py)文件内导入添加的模块。 + +在指标评估模块添加之后,只需要配置文件中进行配置即可使用,如: + +```yaml +Metric: + name: MyMetric + main_indicator: acc +``` + +## 优化器 + +优化器用于训练网络。优化器内部还包含了网络正则化和学习率衰减模块。 这一部分在[ppocr/optimizer](../../ppocr/optimizer)下。 PaddleOCR内置了`Momentum`,`Adam` +和`RMSProp`等常用的优化器模块,`Linear`,`Cosine`,`Step`和`Piecewise`等常用的正则化模块与`L1Decay`和`L2Decay`等常用的学习率衰减模块。 +对于没有内置的模块可通过如下步骤添加,以`optimizer`为例: + +1. 在 [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) 文件内创建自己的优化器,示例代码如下: + +```python +from paddle import optimizer as optim + + +class MyOptim(object): + def __init__(self, learning_rate=0.001, *args, **kwargs): + self.learning_rate = learning_rate + + def __call__(self, parameters): + # It is recommended to wrap the built-in optimizer of paddle + opt = optim.XXX( + learning_rate=self.learning_rate, + parameters=parameters) + return opt + +``` + +在优化器模块添加之后,只需要配置文件中进行配置即可使用,如: + +```yaml +Optimizer: + name: MyOptim + args1: args1 + args2: args2 + lr: + name: Cosine + learning_rate: 0.001 + regularizer: + name: 'L2' + factor: 0 +``` \ No newline at end of file diff --git a/doc/doc_en/add_new_algorithm_en.md b/doc/doc_en/add_new_algorithm_en.md new file mode 100644 index 0000000000000000000000000000000000000000..a0a8cee322bbec6202c74ad1e6fd5ff894aebbae --- /dev/null +++ b/doc/doc_en/add_new_algorithm_en.md @@ -0,0 +1,304 @@ +# Add new algorithm + +PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms. + +* Data loading and processing +* Network +* Post-processing +* Loss +* Metric +* Optimizer + +The following will introduce each part separately, and introduce how to add the modules required for the new algorithm. + + +## Data loading and processing + +Data loading and processing are composed of different modules, which complete the image reading, data augment and label production. This part is under [ppocr/data](../../ppocr/data). The explanation of each file and folder are as follows: + +```bash +ppocr/data/ +├── imaug # Scripts for image reading, data augment and label production +│ ├── label_ops.py # Modules that transform the label +│ ├── operators.py # Modules that transform the image +│ ├──..... +├── __init__.py +├── lmdb_dataset.py # The dataset that reads the lmdb +└── simple_dataset.py # Read the dataset saved in the form of `image_path\tgt` +``` + +PaddleOCR has a large number of built-in image operation related modules. For modules that are not built-in, you can add them through the following steps: + +1. Create a new file under the [ppocr/data/imaug](../../ppocr/data/imaug) folder, such as my_module.py. +2. Add code in the my_module.py file, the sample code is as follows: + +```python +class MyModule: + def __init__(self, *args, **kwargs): + # your init code + pass + + def __call__(self, data): + img = data['image'] + label = data['label'] + # your process code + + data['image'] = img + data['label'] = label + return data +``` + +3. Import the added module in the [ppocr/data/imaug/\__init\__.py](../../ppocr/data/imaug/__init__.py) file. + +All different modules of data processing are executed by sequence, combined and executed in the form of a list in the config file. Such as: + +```yaml +# angle class data process +transforms: + - DecodeImage: # load image + img_mode: BGR + channel_first: False + - MyModule: + args1: args1 + args2: args2 + - KeepKeys: + keep_keys: [ 'image', 'label' ] # dataloader will return list in this order +``` + +## Network + +The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones-> +necks->heads). + +```bash +├── architectures # Code for building network +├── transforms # Image Transformation Module +├── backbones # Feature extraction module +├── necks # Feature enhancement module +└── heads # Output module +``` + +PaddleOCR has built-in commonly used modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in, you can add them through the following steps, the four parts are added in the same steps, take backbones as an example: + +1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py. +2. Add code in the my_backbone.py file, the sample code is as follows: + +```python +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +class MyBackbone(nn.Layer): + def __init__(self, *args, **kwargs): + super(MyBackbone, self).__init__() + # your init code + self.conv = nn.xxxx + + def forward(self, inputs): + # your necwork forward + y = self.conv(inputs) + return y +``` + +3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file. + +After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as: + +```yaml +Architecture: + model_type: rec + algorithm: CRNN + Transform: + name: MyTransform + args1: args1 + args2: args2 + Backbone: + name: MyBackbone + args1: args1 + Neck: + name: MyNeck + args1: args1 + Head: + name: MyHead + args1: args1 +``` + +## Post-processing + +Post-processing mainly completes the transformation from network output to human-friendly results. This part is under [ppocr/postprocess](../../ppocr/postprocess). +PaddleOCR has built-in post-processing modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For components that are not built-in, they can be added through the following steps: + +1. Create a new file under the [ppocr/postprocess](../../ppocr/postprocess) folder, such as my_postprocess.py. +2. Add code in the my_postprocess.py file, the sample code is as follows: + +```python +import paddle + + +class MyPostProcess: + def __init__(self, *args, **kwargs): + # your init code + pass + + def __call__(self, preds, label=None, *args, **kwargs): + if isinstance(preds, paddle.Tensor): + preds = preds.numpy() + # you preds decode code + preds = self.decode_preds(preds) + if label is None: + return preds + # you label decode code + label = self.decode_label(label) + return preds, label + + def decode_preds(self, preds): + # you preds decode code + pass + + def decode_label(self, preds): + # you label decode code + pass +``` + +3. Import the added module in the [ppocr/postprocess/\__init\__.py](../../ppocr/postprocess/__init__.py) file. + +After the post-processing module is added, you only need to configure it in the configuration file to use, such as: + +```yaml +PostProcess: + name: MyPostProcess + args1: args1 + args2: args2 +``` + +## Loss + +The loss function is used to calculate the distance between the network output and the label. This part is under [ppocr/losses](../../ppocr/losses). +PaddleOCR has built-in loss function modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in modules, you can add them through the following steps: + +1. Create a new file in the [ppocr/losses](../../ppocr/losses) folder, such as my_loss.py. +2. Add code in the my_loss.py file, the sample code is as follows: + +```python +import paddle +from paddle import nn + + +class MyLoss(nn.Layer): + def __init__(self, **kwargs): + super(MyLoss, self).__init__() + # you init code + pass + + def __call__(self, predicts, batch): + label = batch[1] + # your loss code + loss = self.loss(input=predicts, label=label) + return {'loss': loss} +``` + +3. Import the added module in the [ppocr/losses/\__init\__.py](../../ppocr/losses/__init__.py) file. + +After the loss function module is added, you only need to configure it in the configuration file to use it, such as: + +```yaml +Loss: + name: MyLoss + args1: args1 + args2: args2 +``` + +## Metric + +Metric is used to calculate the performance of the network on the current batch. This part is under [ppocr/metrics](../../ppocr/metrics). PaddleOCR has built-in evaluation modules related to algorithms such as detection, classification and recognition. For modules that do not have built-in modules, you can add them through the following steps: + +1. Create a new file under the [ppocr/metrics](../../ppocr/metrics) folder, such as my_metric.py. +2. Add code in the my_metric.py file, the sample code is as follows: + +```python + +class MyMetric(object): + def __init__(self, main_indicator='acc', **kwargs): + # main_indicator is used for select best model + self.main_indicator = main_indicator + self.reset() + + def __call__(self, preds, batch, *args, **kwargs): + # preds is out of postprocess + # batch is out of dataloader + labels = batch[1] + cur_correct_num = 0 + cur_all_num = 0 + # you metric code + self.correct_num += cur_correct_num + self.all_num += cur_all_num + return {'acc': cur_correct_num / cur_all_num, } + + def get_metric(self): + """ + return metircs { + 'acc': 0, + 'norm_edit_dis': 0, + } + """ + acc = self.correct_num / self.all_num + self.reset() + return {'acc': acc} + + def reset(self): + # reset metric + self.correct_num = 0 + self.all_num = 0 + +``` + +3. Import the added module in the [ppocr/metrics/\__init\__.py](../../ppocr/metrics/__init__.py) file. + +After the metric module is added, you only need to configure it in the configuration file to use it, such as: + +```yaml +Metric: + name: MyMetric + main_indicator: acc +``` + +## 优化器 + +The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under [ppocr/optimizer](../../ppocr/optimizer). PaddleOCR has built-in +Commonly used optimizer modules such as `Momentum`, `Adam` and `RMSProp`, common regularization modules such as `Linear`, `Cosine`, `Step` and `Piecewise`, and common learning rate decay modules such as `L1Decay` and `L2Decay`. +Modules without built-in can be added through the following steps, take `optimizer` as an example: + +1. Create your own optimizer in the [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) file, the sample code is as follows: + +```python +from paddle import optimizer as optim + + +class MyOptim(object): + def __init__(self, learning_rate=0.001, *args, **kwargs): + self.learning_rate = learning_rate + + def __call__(self, parameters): + # It is recommended to wrap the built-in optimizer of paddle + opt = optim.XXX( + learning_rate=self.learning_rate, + parameters=parameters) + return opt + +``` + +After the optimizer module is added, you only need to configure it in the configuration file to use, such as: + +```yaml +Optimizer: + name: MyOptim + args1: args1 + args2: args2 + lr: + name: Cosine + learning_rate: 0.001 + regularizer: + name: 'L2' + factor: 0 +``` \ No newline at end of file