detection_en.md 13.1 KB
Newer Older
L
fix doc  
LDOUBLEV 已提交
1
# TEXT DETECTION
L
LDOUBLEV 已提交
2

L
fix doc  
LDOUBLEV 已提交
3 4 5
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.

- [1. DATA AND WEIGHTS PREPARATIO](#1-data-and-weights-preparatio)
L
LDOUBLEV 已提交
6 7
  * [1.1 DATA PREPARATION](#11-data-preparation)
  * [1.2 DOWNLOAD PRETRAINED MODEL](#12-download-pretrained-model)
L
fix doc  
LDOUBLEV 已提交
8 9 10 11 12 13 14 15 16
- [2. TRAINING](#2-training)
  * [2.1 START TRAINING](#21-start-training)
  * [2.2 LOAD TRAINED MODEL AND CONTINUE TRAINING](#22-load-trained-model-and-continue-training)
  * [2.3 TRAINING WITH NEW BACKBONE](#23-training-with-new-backbone)
- [3. EVALUATION AND TEST](#3-evaluation-and-test)
  * [3.1 EVALUATION](#31-evaluation)
  * [3.2 TEST](#32-test)
- [4. INFERENCE](#4-inference)
  * [4.1 INFERENCE MODEL PREDICTION](#41-inference-model-prediction)
L
LDOUBLEV 已提交
17 18
- [2. FAQ](#2-faq)

L
fix doc  
LDOUBLEV 已提交
19
# 1 DATA AND WEIGHTS PREPARATIO
K
Khanh Tran 已提交
20

L
LDOUBLEV 已提交
21
## 1.1 DATA PREPARATION
L
LDOUBLEV 已提交
22 23

The icdar2015 dataset contains train set which has 1000 images obtained with wearable cameras and test set which has 500 images obtained with wearable cameras. The icdar2015 can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
K
Khanh Tran 已提交
24

L
LDOUBLEV 已提交
25 26 27 28

After registering and logging in, download the part marked in the red box in the figure below. And, the content downloaded by `Training Set Images` should be saved as the folder `icdar_c4_train_imgs`, and the content downloaded by `Test Set Images` is saved as the folder `ch4_test_images`

<p align="center">
L
LDOUBLEV 已提交
29
 <img src="../datasets/ic15_location_download.png" align="middle" width = "700"/>
L
LDOUBLEV 已提交
30 31
<p align="center">

K
Khanh Tran 已提交
32
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
L
licx 已提交
33
```shell
K
Khanh Tran 已提交
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
# Under the PaddleOCR path
cd PaddleOCR/
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
```

After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
```
/PaddleOCR/train_data/icdar2015/text_localization/
  └─ icdar_c4_train_imgs/         Training data of icdar dataset
  └─ ch4_test_images/             Testing data of icdar dataset
  └─ train_icdar2015_label.txt    Training annotation of icdar dataset
  └─ test_icdar2015_label.txt     Test annotation of icdar dataset
```

49
The provided annotation file format is as follow, seperated by "\t":
K
Khanh Tran 已提交
50 51
```
" Image file name             Image annotation information encoded by json.dumps"
L
LDOUBLEV 已提交
52
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
K
Khanh Tran 已提交
53
```
W
WenmuZhou 已提交
54
The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries.
K
Khanh Tran 已提交
55

L
licx 已提交
56 57 58 59 60
The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.

`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.**

If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
K
Khanh Tran 已提交
61 62


L
LDOUBLEV 已提交
63
## 1.2 DOWNLOAD PRETRAINED MODEL
64 65 66

First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
And the responding download link of backbone pretrain weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
K
Khanh Tran 已提交
67

L
licx 已提交
68
```shell
K
Khanh Tran 已提交
69 70
cd PaddleOCR/
# Download the pre-trained model of MobileNetV3
71
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams
W
WenmuZhou 已提交
72
# or, download the pre-trained model of ResNet18_vd
73
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams
W
WenmuZhou 已提交
74
# or, download the pre-trained model of ResNet50_vd
75
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams
76

77
```
K
Khanh Tran 已提交
78

L
fix doc  
LDOUBLEV 已提交
79 80 81 82
# 2. TRAINING

## 2.1 START TRAINING

M
MissPenguin 已提交
83
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
L
licx 已提交
84
```shell
85 86
python3 tools/train.py -c configs/det/det_mv3_db.yml  \
         -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained
K
Khanh Tran 已提交
87 88
```

M
MissPenguin 已提交
89 90
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](./config_en.md).
K
Khanh Tran 已提交
91

92
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
L
licx 已提交
93
```shell
L
update  
LDOUBLEV 已提交
94
# single GPU training
95 96 97
python3 tools/train.py -c configs/det/det_mv3_db.yml -o   \
         Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained  \
         Optimizer.base_lr=0.0001
L
update  
LDOUBLEV 已提交
98 99

# multi-GPU training
100
# Set the GPU ID used by the '--gpus' parameter.
101
python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained
L
LDOUBLEV 已提交
102

K
Khanh Tran 已提交
103 104
```

L
fix doc  
LDOUBLEV 已提交
105
## 2.2 LOAD TRAINED MODEL AND CONTINUE TRAINING
106
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
L
LDOUBLEV 已提交
107 108

For example:
L
licx 已提交
109
```shell
L
LDOUBLEV 已提交
110
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
L
LDOUBLEV 已提交
111 112
```

L
licx 已提交
113
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
L
LDOUBLEV 已提交
114 115


L
fix doc  
LDOUBLEV 已提交
116
## 2.3 TRAINING WITH NEW BACKBONE
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165

The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).

```bash
├── architectures # Code for building network
├── transforms    # Image Transformation Module
├── backbones     # Feature extraction module
├── necks         # Feature enhancement module
└── heads         # Output module
```

If the Backbone to be replaced has a corresponding implementation in PaddleOCR, you can directly modify the parameters in the `Backbone` part of the configuration yml file.

However, if you want to use a new Backbone, an example of replacing the backbones is as follows:

1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py.
2. Add code in the my_backbone.py file, the sample code is as follows:

```python
import paddle
import paddle.nn as nn
import paddle.nn.functional as F


class MyBackbone(nn.Layer):
    def __init__(self, *args, **kwargs):
        super(MyBackbone, self).__init__()
        # your init code
        self.conv = nn.xxxx

    def forward(self, inputs):
        # your network forward
        y = self.conv(inputs)
        return y
```

3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file.

After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as:

```yaml
  Backbone:
    name: MyBackbone
    args1: args1
```

**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).

L
fix doc  
LDOUBLEV 已提交
166 167 168
# 3. EVALUATION AND TEST

## 3.1 EVALUATION
K
Khanh Tran 已提交
169

170
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
K
Khanh Tran 已提交
171

L
LDOUBLEV 已提交
172
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`
K
Khanh Tran 已提交
173

174
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
K
Khanh Tran 已提交
175

L
LDOUBLEV 已提交
176
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
L
licx 已提交
177
```shell
L
LDOUBLEV 已提交
178
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
K
Khanh Tran 已提交
179 180
```

181
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST and SAST model.
K
Khanh Tran 已提交
182

L
fix doc  
LDOUBLEV 已提交
183
## 3.2 TEST
K
Khanh Tran 已提交
184 185

Test the detection result on a single image:
186
```shell
187
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"
K
Khanh Tran 已提交
188 189 190
```

When testing the DB model, adjust the post-processing threshold:
191
```shell
192
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy"  PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=2.0
K
Khanh Tran 已提交
193 194 195 196
```


Test the detection result on all images in the folder:
197
```shell
198
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy"
K
Khanh Tran 已提交
199
```
200

L
fix doc  
LDOUBLEV 已提交
201 202 203
# 4. INFERENCE

## 4.1 INFERENCE MODEL PREDICTION
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225

The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.

The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.

Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.

Firstly, we can convert DB trained model to inference model:
```shell
python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model="./output/det_db/best_accuracy" Global.save_inference_dir="./output/det_db_inference/"
```

The detection inference model prediction:
```shell
python3 tools/infer/predict_det.py --det_algorithm="DB" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```

If it is other detection algorithms, such as the EAST, the det_algorithm parameter needs to be modified to EAST, and the default is the DB algorithm:
```shell
python3 tools/infer/predict_det.py --det_algorithm="EAST" --det_model_dir="./output/det_db_inference/" --image_dir="./doc/imgs/" --use_gpu=True
```

L
LDOUBLEV 已提交
226
# 2. FAQ
227 228 229 230 231

Q1: The prediction results of trained model and inference model are inconsistent?
**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:
- Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147)
- Check whether the [post-processing of the trained model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51) is consistent with the [post-processing parameters of the inference](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50).