detection_en.md 6.7 KB
Newer Older
X
xxxpsyduck 已提交
1
# TEXT DETECTION
K
Khanh Tran 已提交
2

L
licx 已提交
3
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
K
Khanh Tran 已提交
4

X
xxxpsyduck 已提交
5
## DATA PREPARATION
K
Khanh Tran 已提交
6 7 8
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.

Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
L
licx 已提交
9
```shell
K
Khanh Tran 已提交
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
# Under the PaddleOCR path
cd PaddleOCR/
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
```

After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
```
/PaddleOCR/train_data/icdar2015/text_localization/
  └─ icdar_c4_train_imgs/         Training data of icdar dataset
  └─ ch4_test_images/             Testing data of icdar dataset
  └─ train_icdar2015_label.txt    Training annotation of icdar dataset
  └─ test_icdar2015_label.txt     Test annotation of icdar dataset
```

25
The provided annotation file format is as follow, seperated by "\t":
K
Khanh Tran 已提交
26 27
```
" Image file name             Image annotation information encoded by json.dumps"
L
LDOUBLEV 已提交
28
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
K
Khanh Tran 已提交
29
```
W
WenmuZhou 已提交
30
The image annotation after **json.dumps()** encoding is a list containing multiple dictionaries.
K
Khanh Tran 已提交
31

L
licx 已提交
32 33 34 35 36
The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.

`transcription` represents the text of the current text box. **When its content is "###" it means that the text box is invalid and will be skipped during training.**

If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
K
Khanh Tran 已提交
37 38


X
xxxpsyduck 已提交
39
## TRAINING
K
Khanh Tran 已提交
40

W
WenmuZhou 已提交
41
First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
L
licx 已提交
42
```shell
K
Khanh Tran 已提交
43 44 45
cd PaddleOCR/
# Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
W
WenmuZhou 已提交
46 47 48
# or, download the pre-trained model of ResNet18_vd
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar
# or, download the pre-trained model of ResNet50_vd
K
Khanh Tran 已提交
49
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
50 51

# decompressing the pre-training model file, take MobileNetV3 as an example
L
licx 已提交
52
tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models/
53 54 55 56 57 58 59 60 61

# Note: After decompressing the backbone pre-training weight file correctly, the file list in the folder is as follows:
./pretrain_models/MobileNetV3_large_x0_5_pretrained/
  └─ conv_last_bn_mean
  └─ conv_last_bn_offset
  └─ conv_last_bn_scale
  └─ conv_last_bn_variance
  └─ ......

K
Khanh Tran 已提交
62 63
```

L
licx 已提交
64
#### START TRAINING
M
MissPenguin 已提交
65
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
L
licx 已提交
66
```shell
L
update  
LDOUBLEV 已提交
67
python3 tools/train.py -c configs/det/det_mv3_db.yml
K
Khanh Tran 已提交
68 69
```

M
MissPenguin 已提交
70 71
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](./config_en.md).
K
Khanh Tran 已提交
72

73
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
L
licx 已提交
74
```shell
L
update  
LDOUBLEV 已提交
75
# single GPU training
L
LDOUBLEV 已提交
76
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
L
update  
LDOUBLEV 已提交
77 78

# multi-GPU training
79
# Set the GPU ID used by the '--gpus' parameter.
L
LDOUBLEV 已提交
80 81 82
python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001


K
Khanh Tran 已提交
83 84
```

W
WenmuZhou 已提交
85
#### load trained model and continue training
86
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
L
LDOUBLEV 已提交
87 88

For example:
L
licx 已提交
89
```shell
L
LDOUBLEV 已提交
90
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
L
LDOUBLEV 已提交
91 92
```

L
licx 已提交
93
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
L
LDOUBLEV 已提交
94 95


X
xxxpsyduck 已提交
96
## EVALUATION
K
Khanh Tran 已提交
97 98 99

PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean.

L
LDOUBLEV 已提交
100
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`
K
Khanh Tran 已提交
101

102
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
K
Khanh Tran 已提交
103

T
tink2123 已提交
104
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.pretrained_model` to point to the saved parameter file.
L
licx 已提交
105
```shell
T
tink2123 已提交
106
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.pretrained_model="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
K
Khanh Tran 已提交
107 108 109
```


110
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.
K
Khanh Tran 已提交
111

112
## TEST
K
Khanh Tran 已提交
113 114

Test the detection result on a single image:
115
```shell
W
WenmuZhou 已提交
116
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" Global.load_static_weights=false
K
Khanh Tran 已提交
117 118 119
```

When testing the DB model, adjust the post-processing threshold:
120
```shell
W
WenmuZhou 已提交
121
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/det_db/best_accuracy" Global.load_static_weights=false PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
K
Khanh Tran 已提交
122 123 124 125
```


Test the detection result on all images in the folder:
126
```shell
W
WenmuZhou 已提交
127
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/det_db/best_accuracy" Global.load_static_weights=false
K
Khanh Tran 已提交
128
```