add doc of how to add new algorithm

3aae17e0 · WenmuZhou · 28b2d43e · 3aae17e0 · 3aae17e0
隐藏空白更改
内联并排

Showing with 607 addition and 0 deletion

doc/doc_ch/add_new_algorithm.md doc/doc_ch/add_new_algorithm.md +303 -0

doc/doc_en/add_new_algorithm_en.md doc/doc_en/add_new_algorithm_en.md +304 -0

未找到文件。
--- a/doc/doc_ch/add_new_algorithm.md
+++ b/doc/doc_ch/add_new_algorithm.md
+# 添加新算法
+
+PaddleOCR将一个算法分解为以下几个部分，并对各部分进行模块化处理，方便快速组合出新的算法。
+
+* 数据加载和处理
+* 网络
+* 后处理
+* 损失函数
+* 指标评估
+* 优化器
+
+下面将分别对每个部分进行介绍，并介绍如何在该部分里添加新算法所需模块。
+
+## 数据加载和处理
+
+数据加载和处理由不同的模块(module)组成,其完成了图片的读取、数据增强和label的制作。这一部分在[ppocr/data](../../ppocr/data)下。 各个文件及文件夹作用说明如下:
+
+```bash
+ppocr/data/
+├── imaug             # 图片的读取、数据增强和label制作相关的文件
+│   ├── label_ops.py  # 对label进行变换的modules
+│   ├── operators.py  # 对image进行变换的modules
+│   ├──.....
+├── __init__.py
+├── lmdb_dataset.py   # 读取lmdb的数据集的dataset
+└── simple_dataset.py # 读取以`image_path\tgt`形式保存的数据集的dataset
+```
+
+PaddleOCR内置了大量图像操作相关模块，对于没有没有内置的模块可通过如下步骤添加:
+
+1. 在 [ppocr/data/imaug](../../ppocr/data/imaug) 文件夹下新建文件，如my_module.py。
+2. 在 my_module.py 文件内添加相关代码，示例代码如下:
+
+```python
+class MyModule:
+    def __init__(self, *args, **kwargs):
+        # your init code
+        pass
+
+    def __call__(self, data):
+        img = data['image']
+        label = data['label']
+        # your process code
+
+        data['image'] = img
+        data['label'] = label
+        return data
+```
+
+3. 在 [ppocr/data/imaug/\__init\__.py](../../ppocr/data/imaug/__init__.py) 文件内导入添加的模块。
+
+数据处理的所有处理步骤由不同的模块顺序执行而成，在config文件中按照列表的形式组合并执行。如:
+
+```yaml
+# angle class data process
+transforms:
+  - DecodeImage: # load image
+      img_mode: BGR
+      channel_first: False
+  - MyModule:
+      args1: args1
+      args2: args2
+  - KeepKeys:
+      keep_keys: [ 'image', 'label' ] # dataloader will return list in this order
+```
+
+## 网络
+
+网络部分完成了网络的组网操作，PaddleOCR将网络划分为四部分，这一部分在[ppocr/modeling](../../ppocr/modeling)下。 进入网络的数据将按照顺序(transforms->backbones->
+necks->heads)依次通过这四个部分。
+
+```bash
+├── architectures # 网络的组网代码
+├── transforms    # 网络的图像变换模块
+├── backbones     # 网络的特征提取模块
+├── necks         # 网络的特征增强模块
+└── heads         # 网络的输出模块
+```
+
+PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的常用模块，对于没有内置的模块可通过如下步骤添加，四个部分添加步骤一致，以backbones为例:
+
+1. 在 [ppocr/modeling/backbones](../../ppocr/modeling/backbones) 文件夹下新建文件，如my_backbone.py。
+2. 在 my_backbone.py 文件内添加相关代码，示例代码如下:
+
+```python
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+class MyBackbone(nn.Layer):
+    def __init__(self, *args, **kwargs):
+        super(MyBackbone, self).__init__()
+        # your init code
+        self.conv = nn.xxxx
+
+    def forward(self, inputs):
+        # your necwork forward
+        y = self.conv(inputs)
+        return y
+```
+
+3. 在 [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py)文件内导入添加的模块。
+
+在完成网络的四部分模块添加之后，只需要配置文件中进行配置即可使用，如:
+
+```yaml
+Architecture:
+  model_type: rec
+  algorithm: CRNN
+  Transform:
+    name: MyTransform
+    args1: args1
+    args2: args2
+  Backbone:
+    name: MyBackbone
+    args1: args1
+  Neck:
+    name: MyNeck
+    args1: args1
+  Head:
+    name: MyHead
+    args1: args1
+```
+
+## 后处理
+
+后处理主要完成从网络输出到人类友好结果的变换。这一部分在[ppocr/postprocess](../../ppocr/postprocess)下。
+PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的后处理模块，对于没有内置的组件可通过如下步骤添加:
+
+1. 在 [ppocr/postprocess](../../ppocr/postprocess) 文件夹下新建文件，如 my_postprocess.py。
+2. 在 my_postprocess.py 文件内添加相关代码，示例代码如下:
+
+```python
+import paddle
+
+
+class MyPostProcess:
+    def __init__(self, *args, **kwargs):
+        # your init code
+        pass
+
+    def __call__(self, preds, label=None, *args, **kwargs):
+        if isinstance(preds, paddle.Tensor):
+            preds = preds.numpy()
+        # you preds decode code
+        preds = self.decode_preds(preds)
+        if label is None:
+            return preds
+        # you label decode code
+        label = self.decode_label(label)
+        return preds, label
+
+    def decode_preds(self, preds):
+        # you preds decode code
+        pass
+
+    def decode_label(self, preds):
+        # you label decode code
+        pass
+```
+
+3. 在 [ppocr/postprocess/\__init\__.py](../../ppocr/postprocess/__init__.py)文件内导入添加的模块。
+
+在后处理模块添加之后，只需要配置文件中进行配置即可使用，如:
+
+```yaml
+PostProcess:
+  name: MyPostProcess
+  args1: args1
+  args2: args2
+```
+
+## 损失函数
+
+损失函数用于计算网络输出和label之间的距离。这一部分在[ppocr/losses](../../ppocr/losses)下。
+PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的损失函数模块，对于没有内置的模块可通过如下步骤添加:
+
+1. 在 [ppocr/losses](../../ppocr/losses) 文件夹下新建文件，如 my_loss.py。
+2. 在 my_loss.py 文件内添加相关代码，示例代码如下:
+
+```python
+import paddle
+from paddle import nn
+
+
+class MyLoss(nn.Layer):
+    def __init__(self, **kwargs):
+        super(MyLoss, self).__init__()
+        # you init code
+        pass
+
+    def __call__(self, predicts, batch):
+        label = batch[1]
+        # your loss code
+        loss = self.loss(input=predicts, label=label)
+        return {'loss': loss}
+```
+
+3. 在 [ppocr/losses/\__init\__.py](../../ppocr/losses/__init__.py)文件内导入添加的模块。
+
+在损失函数添加之后，只需要配置文件中进行配置即可使用，如:
+
+```yaml
+Loss:
+  name: MyLoss
+  args1: args1
+  args2: args2
+```
+
+## 指标评估
+
+指标评估用于计算网络在当前batch上的性能。这一部分在[ppocr/metrics](../../ppocr/metrics)下。 PaddleOCR内置了检测，分类和识别等算法相关的指标评估模块，对于没有内置的模块可通过如下步骤添加:
+
+1. 在 [ppocr/metrics](../../ppocr/metrics) 文件夹下新建文件，如my_metric.py。
+2. 在 my_metric.py 文件内添加相关代码，示例代码如下:
+
+```python
+
+class MyMetric(object):
+    def __init__(self, main_indicator='acc', **kwargs):
+        # main_indicator is used for select best model
+        self.main_indicator = main_indicator
+        self.reset()
+
+    def __call__(self, preds, batch, *args, **kwargs):
+        # preds is out of postprocess
+        # batch is out of dataloader
+        labels = batch[1]
+        cur_correct_num = 0
+        cur_all_num = 0
+        # you metric code
+        self.correct_num += cur_correct_num
+        self.all_num += cur_all_num
+        return {'acc': cur_correct_num / cur_all_num, }
+
+    def get_metric(self):
+        """
+        return metircs {
+                 'acc': 0,
+                 'norm_edit_dis': 0,
+            }
+        """
+        acc = self.correct_num / self.all_num
+        self.reset()
+        return {'acc': acc}
+
+    def reset(self):
+        # reset metric
+        self.correct_num = 0
+        self.all_num = 0
+
+```
+
+3. 在 [ppocr/metrics/\__init\__.py](../../ppocr/metrics/__init__.py)文件内导入添加的模块。
+
+在指标评估模块添加之后，只需要配置文件中进行配置即可使用，如:
+
+```yaml
+Metric:
+  name: MyMetric
+  main_indicator: acc
+```
+
+## 优化器
+
+优化器用于训练网络。优化器内部还包含了网络正则化和学习率衰减模块。 这一部分在[ppocr/optimizer](../../ppocr/optimizer)下。 PaddleOCR内置了`Momentum`,`Adam`
+和`RMSProp`等常用的优化器模块，`Linear`,`Cosine`,`Step`和`Piecewise`等常用的正则化模块与`L1Decay`和`L2Decay`等常用的学习率衰减模块。
+对于没有内置的模块可通过如下步骤添加，以`optimizer`为例:
+
+1. 在 [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) 文件内创建自己的优化器，示例代码如下:
+
+```python
+from paddle import optimizer as optim
+
+
+class MyOptim(object):
+    def __init__(self, learning_rate=0.001, *args, **kwargs):
+        self.learning_rate = learning_rate
+
+    def __call__(self, parameters):
+        # It is recommended to wrap the built-in optimizer of paddle
+        opt = optim.XXX(
+            learning_rate=self.learning_rate,
+            parameters=parameters)
+        return opt
+
+```
+
+在优化器模块添加之后，只需要配置文件中进行配置即可使用，如:
+
+```yaml
+Optimizer:
+  name: MyOptim
+  args1: args1
+  args2: args2
+  lr:
+    name: Cosine
+    learning_rate: 0.001
+  regularizer:
+    name: 'L2'
+    factor: 0
+```
\ No newline at end of file
--- a/doc/doc_en/add_new_algorithm_en.md
+++ b/doc/doc_en/add_new_algorithm_en.md
+# Add new algorithm
+
+PaddleOCR decomposes an algorithm into the following parts, and modularizes each part to make it more convenient to develop new algorithms.
+
+* Data loading and processing
+* Network
+* Post-processing
+* Loss
+* Metric
+* Optimizer
+
+The following will introduce each part separately, and introduce how to add the modules required for the new algorithm.
+
+
+## Data loading and processing
+
+Data loading and processing are composed of different modules, which complete the image reading, data augment and label production. This part is under [ppocr/data](../../ppocr/data). The explanation of each file and folder are as follows:
+
+```bash
+ppocr/data/
+├── imaug             # Scripts for image reading, data augment and label production
+│   ├── label_ops.py  # Modules that transform the label
+│   ├── operators.py  # Modules that transform the image
+│   ├──.....
+├── __init__.py
+├── lmdb_dataset.py   # The dataset that reads the lmdb
+└── simple_dataset.py # Read the dataset saved in the form of `image_path\tgt`
+```
+
+PaddleOCR has a large number of built-in image operation related modules. For modules that are not built-in, you can add them through the following steps:
+
+1. Create a new file under the [ppocr/data/imaug](../../ppocr/data/imaug) folder, such as my_module.py.
+2. Add code in the my_module.py file, the sample code is as follows:
+
+```python
+class MyModule:
+    def __init__(self, *args, **kwargs):
+        # your init code
+        pass
+
+    def __call__(self, data):
+        img = data['image']
+        label = data['label']
+        # your process code
+
+        data['image'] = img
+        data['label'] = label
+        return data
+```
+
+3. Import the added module in the [ppocr/data/imaug/\__init\__.py](../../ppocr/data/imaug/__init__.py) file.
+
+All different modules of data processing are executed by sequence, combined and executed in the form of a list in the config file. Such as:
+
+```yaml
+# angle class data process
+transforms:
+  - DecodeImage: # load image
+      img_mode: BGR
+      channel_first: False
+  - MyModule:
+      args1: args1
+      args2: args2
+  - KeepKeys:
+      keep_keys: [ 'image', 'label' ] # dataloader will return list in this order
+```
+
+## Network
+
+The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
+necks->heads).
+
+```bash
+├── architectures # Code for building network
+├── transforms    # Image Transformation Module
+├── backbones     # Feature extraction module
+├── necks         # Feature enhancement module
+└── heads         # Output module
+```
+
+PaddleOCR has built-in commonly used modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in, you can add them through the following steps, the four parts are added in the same steps, take backbones as an example:
+
+1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
+2. Add code in the my_backbone.py file, the sample code is as follows:
+
+```python
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+class MyBackbone(nn.Layer):
+    def __init__(self, *args, **kwargs):
+        super(MyBackbone, self).__init__()
+        # your init code
+        self.conv = nn.xxxx
+
+    def forward(self, inputs):
+        # your necwork forward
+        y = self.conv(inputs)
+        return y
+```
+
+3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.
+
+After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:
+
+```yaml
+Architecture:
+  model_type: rec
+  algorithm: CRNN
+  Transform:
+    name: MyTransform
+    args1: args1
+    args2: args2
+  Backbone:
+    name: MyBackbone
+    args1: args1
+  Neck:
+    name: MyNeck
+    args1: args1
+  Head:
+    name: MyHead
+    args1: args1
+```
+
+## Post-processing
+
+Post-processing mainly completes the transformation from network output to human-friendly results. This part is under [ppocr/postprocess](../../ppocr/postprocess).
+PaddleOCR has built-in post-processing modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For components that are not built-in, they can be added through the following steps:
+
+1. Create a new file under the [ppocr/postprocess](../../ppocr/postprocess) folder, such as my_postprocess.py.
+2. Add code in the my_postprocess.py file, the sample code is as follows:
+
+```python
+import paddle
+
+
+class MyPostProcess:
+    def __init__(self, *args, **kwargs):
+        # your init code
+        pass
+
+    def __call__(self, preds, label=None, *args, **kwargs):
+        if isinstance(preds, paddle.Tensor):
+            preds = preds.numpy()
+        # you preds decode code
+        preds = self.decode_preds(preds)
+        if label is None:
+            return preds
+        # you label decode code
+        label = self.decode_label(label)
+        return preds, label
+
+    def decode_preds(self, preds):
+        # you preds decode code
+        pass
+
+    def decode_label(self, preds):
+        # you label decode code
+        pass
+```
+
+3. Import the added module in the [ppocr/postprocess/\__init\__.py](../../ppocr/postprocess/__init__.py) file.
+
+After the post-processing module is added, you only need to configure it in the configuration file to use, such as:
+
+```yaml
+PostProcess:
+  name: MyPostProcess
+  args1: args1
+  args2: args2
+```
+
+## Loss
+
+The loss function is used to calculate the distance between the network output and the label. This part is under [ppocr/losses](../../ppocr/losses).
+PaddleOCR has built-in loss function modules related to algorithms such as DB, EAST, SAST, CRNN and Attention. For modules that do not have built-in modules, you can add them through the following steps:
+
+1. Create a new file in the [ppocr/losses](../../ppocr/losses) folder, such as my_loss.py.
+2. Add code in the my_loss.py file, the sample code is as follows:
+
+```python
+import paddle
+from paddle import nn
+
+
+class MyLoss(nn.Layer):
+    def __init__(self, **kwargs):
+        super(MyLoss, self).__init__()
+        # you init code
+        pass
+
+    def __call__(self, predicts, batch):
+        label = batch[1]
+        # your loss code
+        loss = self.loss(input=predicts, label=label)
+        return {'loss': loss}
+```
+
+3. Import the added module in the [ppocr/losses/\__init\__.py](../../ppocr/losses/__init__.py) file.
+
+After the loss function module is added, you only need to configure it in the configuration file to use it, such as:
+
+```yaml
+Loss:
+  name: MyLoss
+  args1: args1
+  args2: args2
+```
+
+## Metric
+
+Metric is used to calculate the performance of the network on the current batch. This part is under [ppocr/metrics](../../ppocr/metrics). PaddleOCR has built-in evaluation modules related to algorithms such as detection, classification and recognition. For modules that do not have built-in modules, you can add them through the following steps:
+
+1. Create a new file under the [ppocr/metrics](../../ppocr/metrics) folder, such as my_metric.py.
+2. Add code in the my_metric.py file, the sample code is as follows:
+
+```python
+
+class MyMetric(object):
+    def __init__(self, main_indicator='acc', **kwargs):
+        # main_indicator is used for select best model
+        self.main_indicator = main_indicator
+        self.reset()
+
+    def __call__(self, preds, batch, *args, **kwargs):
+        # preds is out of postprocess
+        # batch is out of dataloader
+        labels = batch[1]
+        cur_correct_num = 0
+        cur_all_num = 0
+        # you metric code
+        self.correct_num += cur_correct_num
+        self.all_num += cur_all_num
+        return {'acc': cur_correct_num / cur_all_num, }
+
+    def get_metric(self):
+        """
+        return metircs {
+                 'acc': 0,
+                 'norm_edit_dis': 0,
+            }
+        """
+        acc = self.correct_num / self.all_num
+        self.reset()
+        return {'acc': acc}
+
+    def reset(self):
+        # reset metric
+        self.correct_num = 0
+        self.all_num = 0
+
+```
+
+3. Import the added module in the [ppocr/metrics/\__init\__.py](../../ppocr/metrics/__init__.py) file.
+
+After the metric module is added, you only need to configure it in the configuration file to use it, such as:
+
+```yaml
+Metric:
+  name: MyMetric
+  main_indicator: acc
+```
+
+## 优化器
+
+The optimizer is used to train the network. The optimizer also contains network regularization and learning rate decay modules. This part is under [ppocr/optimizer](../../ppocr/optimizer). PaddleOCR has built-in
+Commonly used optimizer modules such as `Momentum`, `Adam` and `RMSProp`, common regularization modules such as `Linear`, `Cosine`, `Step` and `Piecewise`, and common learning rate decay modules such as `L1Decay` and `L2Decay`.
+Modules without built-in can be added through the following steps, take `optimizer` as an example:
+
+1. Create your own optimizer in the [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py) file, the sample code is as follows:
+
+```python
+from paddle import optimizer as optim
+
+
+class MyOptim(object):
+    def __init__(self, learning_rate=0.001, *args, **kwargs):
+        self.learning_rate = learning_rate
+
+    def __call__(self, parameters):
+        # It is recommended to wrap the built-in optimizer of paddle
+        opt = optim.XXX(
+            learning_rate=self.learning_rate,
+            parameters=parameters)
+        return opt
+
+```
+
+After the optimizer module is added, you only need to configure it in the configuration file to use, such as:
+
+```yaml
+Optimizer:
+  name: MyOptim
+  args1: args1
+  args2: args2
+  lr:
+    name: Cosine
+    learning_rate: 0.001
+  regularizer:
+    name: 'L2'
+    factor: 0
+```
\ No newline at end of file