detection_en.md 14.3 KB
Newer Older
1
# Text Detection
L
LDOUBLEV 已提交
2

3
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
L
LDOUBLEV 已提交
4

文幕地方's avatar
文幕地方 已提交
5 6 7
- [1. Data and Weights Preparation](#1-data-and-weights-preparation)
  - [1.1 Data Preparation](#11-data-preparation)
  - [1.2 Download Pre-trained Model](#12-download-pre-trained-model)
8 9 10 11
- [2. Training](#2-training)
  * [2.1 Start Training](#21-start-training)
  * [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
  * [2.3 Training with New Backbone](#23-training-with-new-backbone)
A
andyjpaddle 已提交
12 13 14 15
  * [2.4 Mixed Precision Training](#24-amp-training)
  * [2.5 Distributed Training](#25-distributed-training)
  * [2.6 Training with knowledge distillation](#26)
  * [2.7 Training on other platform(Windows/macOS/Linux DCU)](#27)
16
- [3. Evaluation and Test](#3-evaluation-and-test)
文幕地方's avatar
文幕地方 已提交
17 18
  - [3.1 Evaluation](#31-evaluation)
  - [3.2 Test](#32-test)
19
- [4. Inference](#4-inference)
A
andyjpaddle 已提交
20
- [5. FAQ](#5-faq)
K
Khanh Tran 已提交
21

22
## 1. Data and Weights Preparation
K
Khanh Tran 已提交
23

24
### 1.1 Data Preparation
L
LDOUBLEV 已提交
25

文幕地方's avatar
文幕地方 已提交
26
To prepare datasets, refer to [ocr_datasets](./dataset/ocr_datasets_en.md) .
K
Khanh Tran 已提交
27

fanruinet's avatar
fanruinet 已提交
28
### 1.2 Download Pre-trained Model
29

fanruinet's avatar
fanruinet 已提交
30 31
First download the pre-trained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
And the responding download link of backbone pre-trained weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
K
Khanh Tran 已提交
32

L
licx 已提交
33
```shell
K
Khanh Tran 已提交
34 35
cd PaddleOCR/
# Download the pre-trained model of MobileNetV3
T
tink2123 已提交
36
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams
W
WenmuZhou 已提交
37
# or, download the pre-trained model of ResNet18_vd
T
tink2123 已提交
38
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet18_vd_pretrained.pdparams
W
WenmuZhou 已提交
39
# or, download the pre-trained model of ResNet50_vd
T
tink2123 已提交
40
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams
41

42
```
K
Khanh Tran 已提交
43

qq_25193841's avatar
qq_25193841 已提交
44
## 2. Training
45 46 47

### 2.1 Start Training

M
MissPenguin 已提交
48
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
L
licx 已提交
49
```shell
50
python3 tools/train.py -c configs/det/det_mv3_db.yml  \
qq_25193841's avatar
qq_25193841 已提交
51
         -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
K
Khanh Tran 已提交
52 53
```

M
MissPenguin 已提交
54 55
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](./config_en.md).
K
Khanh Tran 已提交
56

57
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
L
licx 已提交
58
```shell
L
update  
LDOUBLEV 已提交
59
# single GPU training
60
python3 tools/train.py -c configs/det/det_mv3_db.yml -o   \
qq_25193841's avatar
qq_25193841 已提交
61
         Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained  \
62
         Optimizer.base_lr=0.0001
L
update  
LDOUBLEV 已提交
63 64

# multi-GPU training
65
# Set the GPU ID used by the '--gpus' parameter.
qq_25193841's avatar
qq_25193841 已提交
66
python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
S
stephon 已提交
67

B
Bin Lu 已提交
68
# multi-Node, multi-GPU training
B
Bin Lu 已提交
69
# Set the IPs of your nodes used by the '--ips' parameter. Set the GPU ID used by the '--gpus' parameter.
S
stephon 已提交
70
python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
B
Bin Lu 已提交
71 72
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
```
S
stephon 已提交
73 74
**Note:** For multi-Node multi-GPU training, you need to replace the `ips` value in the preceding command with the address of your machine, and the machines must be able to ping each other. In addition, it requires activating commands separately on multiple machines when we start the training. The command for viewing the IP address of the machine is `ifconfig`.

B
Bin Lu 已提交
75
If you want to further speed up the training, you can use [automatic mixed precision training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_en.html). for single card training, the command is as follows:
B
Bin Lu 已提交
76 77 78 79
```
python3 tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
K
Khanh Tran 已提交
80 81
```

82
### 2.2 Load Trained Model and Continue Training
83
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
L
LDOUBLEV 已提交
84 85

For example:
L
licx 已提交
86
```shell
L
LDOUBLEV 已提交
87
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
L
LDOUBLEV 已提交
88 89
```

qq_25193841's avatar
qq_25193841 已提交
90
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrained_model`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrained_model` will be loaded.
L
LDOUBLEV 已提交
91 92


93
### 2.3 Training with New Backbone
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142

The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).

```bash
├── architectures # Code for building network
├── transforms    # Image Transformation Module
├── backbones     # Feature extraction module
├── necks         # Feature enhancement module
└── heads         # Output module
```

If the Backbone to be replaced has a corresponding implementation in PaddleOCR, you can directly modify the parameters in the `Backbone` part of the configuration yml file.

However, if you want to use a new Backbone, an example of replacing the backbones is as follows:

1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
2. Add code in the my_backbone.py file, the sample code is as follows:

```python
import paddle
import paddle.nn as nn
import paddle.nn.functional as F


class MyBackbone(nn.Layer):
    def __init__(self, *args, **kwargs):
        super(MyBackbone, self).__init__()
        # your init code
        self.conv = nn.xxxx

    def forward(self, inputs):
        # your network forward
        y = self.conv(inputs)
        return y
```

3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.

After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:

```yaml
  Backbone:
    name: MyBackbone
    args1: args1
```

**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).

A
andyjpaddle 已提交
143
### 2.4 Mixed Precision Training
144

A
andyjpaddle 已提交
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164
If you want to speed up your training further, you can use [Auto Mixed Precision Training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), taking a single machine and a single gpu as an example, the commands are as follows:

```shell
python3 tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
 ```

### 2.5 Distributed Training

During multi-machine multi-gpu training, use the `--ips` parameter to set the used machine IP address, and the `--gpus` parameter to set the used GPU ID:

```bash
python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
```

**Note:** When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. In addition, training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`.

### 2.6 Training with knowledge distillation
165 166 167

Knowledge distillation is supported in PaddleOCR for text detection training process. For more details, please refer to [doc](./knowledge_distillation_en.md).

A
andyjpaddle 已提交
168
### 2.7 Training on other platform(Windows/macOS/Linux DCU)
A
andyjpaddle 已提交
169 170 171 172 173 174 175 176 177 178 179 180

- Windows GPU/CPU
The Windows platform is slightly different from the Linux platform:
Windows platform only supports `single gpu` training and inference, specify GPU for training `set CUDA_VISIBLE_DEVICES=0`
On the Windows platform, DataLoader only supports single-process mode, so you need to set `num_workers` to 0;

- macOS
GPU mode is not supported, you need to set `use_gpu` to False in the configuration file, and the rest of the training evaluation prediction commands are exactly the same as Linux GPU.

- Linux DCU
Running on a DCU device requires setting the environment variable `export HIP_VISIBLE_DEVICES=0,1,2,3`, and the rest of the training and evaluation prediction commands are exactly the same as the Linux GPU.

181 182 183
## 3. Evaluation and Test

### 3.1 Evaluation
K
Khanh Tran 已提交
184

185
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
K
Khanh Tran 已提交
186

L
LDOUBLEV 已提交
187
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`
K
Khanh Tran 已提交
188

189
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
K
Khanh Tran 已提交
190

L
LDOUBLEV 已提交
191
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
L
licx 已提交
192
```shell
L
LDOUBLEV 已提交
193
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
K
Khanh Tran 已提交
194 195
```

196
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST and SAST model.
K
Khanh Tran 已提交
197

198
### 3.2 Test
K
Khanh Tran 已提交
199 200

Test the detection result on a single image:
201
```shell
202
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"
K
Khanh Tran 已提交
203 204 205
```

When testing the DB model, adjust the post-processing threshold:
206
```shell
207
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"  PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=2.0
K
Khanh Tran 已提交
208 209 210 211
```


Test the detection result on all images in the folder:
212
```shell
213
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy"
K
Khanh Tran 已提交
214
```
215

216
## 4. Inference
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238

The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.

The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.

Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.

Firstly, we can convert DB trained model to inference model:
```shell
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model="./output/det_db/best_accuracy" Global.save_inference_dir="./output/det_db_inference/"
```

The detection inference model prediction:
```shell
python3 tools/infer/predict_det.py --det_algorithm="DB" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```

If it is other detection algorithms, such as the EAST, the det_algorithm parameter needs to be modified to EAST, and the default is the DB algorithm:
```shell
python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```

239
## 5. FAQ
240 241

Q1: The prediction results of trained model and inference model are inconsistent?
242

243 244 245
**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:
- Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147)
- Check whether the [post-processing of the trained model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51) is consistent with the [post-processing parameters of the inference](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50).