detection.md 5.0 KB
Newer Older
K
Khanh Tran 已提交
1
# Text detection
2

K
Khanh Tran 已提交
3
This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
4

K
Khanh Tran 已提交
5 6
## Data preparation
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
7

K
Khanh Tran 已提交
8
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes scattered annotation files into separate annotation files. You can download by wget:
9
```
K
Khanh Tran 已提交
10
# Under the PaddleOCR path
T
fix doc  
tink2123 已提交
11 12 13
cd PaddleOCR/
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt
wget -P ./train_data/  https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
14 15
```

K
Khanh Tran 已提交
16
After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are:
17
```
T
tink2123 已提交
18
/PaddleOCR/train_data/icdar2015/text_localization/
K
Khanh Tran 已提交
19 20 21 22
  └─ icdar_c4_train_imgs/         Training data of icdar dataset
  └─ ch4_test_images/             Testing data of icdar dataset
  └─ train_icdar2015_label.txt    Training annotation of icdar dataset
  └─ test_icdar2015_label.txt     Test annotation of icdar dataset
23 24
```

K
Khanh Tran 已提交
25
The label file format provided is:
26
```
K
Khanh Tran 已提交
27
" Image file name                    Image annotation information encoded by json.dumps"
28 29
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]
```
K
Khanh Tran 已提交
30
The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point in the upper left corner.
31

K
Khanh Tran 已提交
32 33
`transcription` represents the text of the current text box, and this information is not needed in the text detection task.
If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format。
34 35


K
Khanh Tran 已提交
36 37 38
## Quickstart training

First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
39
```
L
LDOUBLEV 已提交
40
cd PaddleOCR/
K
Khanh Tran 已提交
41
# Download the pre-trained model of MobileNetV3
T
fix doc  
tink2123 已提交
42
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
K
Khanh Tran 已提交
43
# Download the pre-trained model of ResNet50
T
fix doc  
tink2123 已提交
44
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
45 46
```

K
Khanh Tran 已提交
47
**Start training**
48
```
T
tink2123 已提交
49
python3 tools/train.py -c configs/det/det_mv3_db.yml
50 51
```

K
Khanh Tran 已提交
52 53
In the above instruction, use -c to select the training to use the configs/det/det_db_mv3.yml configuration file.
For a detailed explanation of the configuration file, please refer to [link](./doc/config.md).
54

K
Khanh Tran 已提交
55
You can also use the -o parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
56
```
T
tink2123 已提交
57
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
58 59
```

K
Khanh Tran 已提交
60 61 62
## Index evaluation

PaddleOCR calculates three indicators related to OCR detection: Precision, Recall, and Hmean.
63

K
Khanh Tran 已提交
64
Run the following code to calculate the evaluation index based on the test result file specified by save_res_path in the configuration file det_db_mv3.yml
65

K
Khanh Tran 已提交
66
When evaluating, set post-processing parameters box_thresh=0.6, unclip_ratio=1.5, use different data sets, different models for training, these two parameters can be adjusted for optimization.
67 68

```
L
fix doc  
LDOUBLEV 已提交
69
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
70
```
K
Khanh Tran 已提交
71
The model parameters during training are saved in the Global.save_model_dir directory by default. When evaluating indicators, you need to set Global.checkpoints to point to the saved parameter file.
L
LDOUBLEV 已提交
72

K
Khanh Tran 已提交
73
Such as:
L
LDOUBLEV 已提交
74
```
L
fix doc  
LDOUBLEV 已提交
75
python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
L
LDOUBLEV 已提交
76 77
```

K
Khanh Tran 已提交
78
* Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and do not need to be set when evaluating the EAST model.
79

K
Khanh Tran 已提交
80
## Test detection result
L
LDOUBLEV 已提交
81

K
Khanh Tran 已提交
82
Test the detection result on a single image:
L
LDOUBLEV 已提交
83
```
84
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
L
LDOUBLEV 已提交
85 86
```

K
Khanh Tran 已提交
87
When testing the DB model, adjust the post-processing threshold:
88 89 90 91 92
```
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
```


K
Khanh Tran 已提交
93
Test the detection effect of all images in the folder:
L
LDOUBLEV 已提交
94
```
95
python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
L
LDOUBLEV 已提交
96
```