recognition.md 8.1 KB
Newer Older
K
Khanh Tran 已提交
1
## Text recognition
T
tink2123 已提交
2

K
Khanh Tran 已提交
3
### Data preparation
T
tink2123 已提交
4 5


K
Khanh Tran 已提交
6
PaddleOCR pupports two data formats: `lmdb` used to train public data and debug algorithms; `General Data` to train your own data:
T
tink2123 已提交
7

K
Khanh Tran 已提交
8
Please set the dataset as follows:
T
tink2123 已提交
9

K
Khanh Tran 已提交
10
The default storage path for training data is `PaddleOCR/train_data`, if you already have a data set on your disk, just create a soft link to the data set directory:
T
tink2123 已提交
11 12 13 14 15 16

```
ln -sf <path/to/dataset> <path/to/paddle_detection>/train_data/dataset
```


K
Khanh Tran 已提交
17
* Data download
T
tink2123 已提交
18

K
Khanh Tran 已提交
19
If you do not have a data set locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required by benchmark
T
tink2123 已提交
20

K
Khanh Tran 已提交
21
* Use your own dataset:
T
tink2123 已提交
22

K
Khanh Tran 已提交
23
If you want to use your own data for training, please refer to the following to organize your data.
T
tink2123 已提交
24

K
Khanh Tran 已提交
25
- Training set
T
tink2123 已提交
26

K
Khanh Tran 已提交
27
First put the training pictures in the same folder (train_images), and use a txt file (rec_gt_train.txt) to record the picture path and label.
T
tink2123 已提交
28

K
Khanh Tran 已提交
29
* Note: by default, please split the image path and image label with \t, if you use other methods to split, it will cause training error
T
tink2123 已提交
30 31

```
K
Khanh Tran 已提交
32
" Image file name           Image annotation "
T
tink2123 已提交
33 34 35 36

train_data/train_0001.jpg   简单可依赖
train_data/train_0002.jpg   用科技让复杂的世界更简单
```
K
Khanh Tran 已提交
37
PaddleOCR provides a label file for training the icdar2015 dataset, which can be downloaded in the following ways:
T
fix doc  
tink2123 已提交
38 39

```
K
Khanh Tran 已提交
40
# Training set label
T
fix doc  
tink2123 已提交
41
wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
K
Khanh Tran 已提交
42
# Test Set Label
T
tink2123 已提交
43
wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
T
fix doc  
tink2123 已提交
44
```
T
tink2123 已提交
45

K
Khanh Tran 已提交
46
The final training set should have the following file structure:
T
tink2123 已提交
47

T
tink2123 已提交
48
```
T
tink2123 已提交
49
|-train_data
T
tink2123 已提交
50 51
    |-ic15_data
        |- rec_gt_train.txt
T
fix doc  
tink2123 已提交
52 53 54 55
        |- train
            |- word_001.png
            |- word_002.jpg
            |- word_003.jpg
T
tink2123 已提交
56
            | ...
T
tink2123 已提交
57
```
T
tink2123 已提交
58

K
Khanh Tran 已提交
59
- Test set
T
tink2123 已提交
60

K
Khanh Tran 已提交
61
Similar to the training set, the test set also needs to provide a folder containing all pictures (test) and a rec_gt_test.txt. The structure of the test set is as follows:
T
tink2123 已提交
62

T
tink2123 已提交
63
```
T
tink2123 已提交
64
|-train_data
T
tink2123 已提交
65
    |-ic15_data
T
fix doc  
tink2123 已提交
66 67 68 69 70
        |- rec_gt_test.txt
        |- test
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
T
tink2123 已提交
71
            | ...
T
tink2123 已提交
72
```
T
tink2123 已提交
73

K
Khanh Tran 已提交
74
- Dictionary
T
tink2123 已提交
75

K
Khanh Tran 已提交
76
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
T
tink2123 已提交
77

K
Khanh Tran 已提交
78
Therefore, the dictionary needs to contain all the characters that you want to be recognized correctly. {word_dict_name}.txt needs to be written in the following format and saved in the `utf-8` encoding format:
T
tink2123 已提交
79

T
tink2123 已提交
80 81
```
l
T
tink2123 已提交
82 83
d
a
T
tink2123 已提交
84 85
d
r
T
tink2123 已提交
86
n
T
tink2123 已提交
87
```
T
tink2123 已提交
88

K
Khanh Tran 已提交
89
word_dict.txt There is a single word in each line, which maps characters and numeric indexes together, and "and" will be mapped to [2 5 1]
T
tink2123 已提交
90

K
Khanh Tran 已提交
91
`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters,
T
tink2123 已提交
92

K
Khanh Tran 已提交
93
`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters,
T
tink2123 已提交
94

K
Khanh Tran 已提交
95
You can use them as needed.
T
tink2123 已提交
96

K
Khanh Tran 已提交
97
To customize the dic file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.。
T
tink2123 已提交
98

K
Khanh Tran 已提交
99 100 101 102 103
### Start training

PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:

First download the pretrain model, you can download the trained model to finetune on the icdar2015 data
T
tink2123 已提交
104 105

```
T
tink2123 已提交
106
cd PaddleOCR/
K
Khanh Tran 已提交
107
# Download the pre-trained model of MobileNetV3
T
tink2123 已提交
108
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar
K
Khanh Tran 已提交
109
# Decompress model parameters
T
tink2123 已提交
110 111 112 113
cd pretrain_models
tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar
```

K
Khanh Tran 已提交
114
Start training:
T
tink2123 已提交
115

T
tink2123 已提交
116
```
K
Khanh Tran 已提交
117
# Set PYTHONPATH path
T
tink2123 已提交
118
export PYTHONPATH=$PYTHONPATH:.
K
Khanh Tran 已提交
119
# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES
T
tink2123 已提交
120
export CUDA_VISIBLE_DEVICES=0,1,2,3
K
Khanh Tran 已提交
121
# Training icdar15 English data
T
fix doc  
tink2123 已提交
122
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml
T
tink2123 已提交
123 124
```

K
Khanh Tran 已提交
125
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter. By default, the best acc model is saved as `output/rec_CRNN/best_accuracy` during the evaluation process.
T
tink2123 已提交
126

K
Khanh Tran 已提交
127
If the verification set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.
T
tink2123 已提交
128

K
Khanh Tran 已提交
129
* Tip: You can use the `-c` parameter to select multiple model configurations under the `configs/rec/` path for training. The recognition algorithms supported by PaddleOCR are:
T
tink2123 已提交
130 131


K
Khanh Tran 已提交
132
| Configuration file |  Algorithm name |   backbone |   trans   |   seq      |     pred     |
T
tink2123 已提交
133 134 135 136 137 138 139 140 141 142 143 144
| :--------: |  :-------:   | :-------:  |   :-------:   |   :-----:   |  :-----:   |
| rec_chinese_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  |
| rec_icdar15_train.yml |  CRNN |   Mobilenet_v3 large 0.5 |  None   |  BiLSTM |  ctc  |
| rec_mv3_none_bilstm_ctc.yml |  CRNN |   Mobilenet_v3 large 0.5 |  None   |  BiLSTM |  ctc  |
| rec_mv3_none_none_ctc.yml |  Rosetta |   Mobilenet_v3 large 0.5 |  None   |  None |  ctc  |
| rec_mv3_tps_bilstm_ctc.yml |  STARNet |   Mobilenet_v3 large 0.5 |  tps   |  BiLSTM |  ctc  |
| rec_mv3_tps_bilstm_attn.yml |  RARE |   Mobilenet_v3 large 0.5 |  tps   |  BiLSTM |  attention  |
| rec_r34_vd_none_bilstm_ctc.yml |  CRNN |   Resnet34_vd |  None   |  BiLSTM |  ctc  |
| rec_r34_vd_none_none_ctc.yml |  Rosetta |   Resnet34_vd |  None   |  None |  ctc  |
| rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention |
| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc |

K
Khanh Tran 已提交
145
For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the effect of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
T
tink2123 已提交
146

K
Khanh Tran 已提交
147
Take `rec_mv3_none_none_ctc.yml` as an example:
T
tink2123 已提交
148 149 150
```
Global:
  ...
K
Khanh Tran 已提交
151
  # Modify image_shape to fit long text
T
tink2123 已提交
152 153
  image_shape: [3, 32, 320]
  ...
K
Khanh Tran 已提交
154
  # Modify character type
T
tink2123 已提交
155
  character_type: ch
K
Khanh Tran 已提交
156
  # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary
T
tink2123 已提交
157 158
  character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt
  ...
K
Khanh Tran 已提交
159
  # Modify reader type
T
tink2123 已提交
160 161 162 163 164
  reader_yml: ./configs/rec/rec_chinese_reader.yml
  ...

...
```
K
Khanh Tran 已提交
165
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
T
tink2123 已提交
166 167


T
tink2123 已提交
168

K
Khanh Tran 已提交
169
### Evaluation
T
tink2123 已提交
170

K
Khanh Tran 已提交
171
The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader.
T
tink2123 已提交
172 173 174

```
export CUDA_VISIBLE_DEVICES=0
K
Khanh Tran 已提交
175
# GPU evaluation, Global.checkpoints is the weight to be tested
T
fix doc  
tink2123 已提交
176
python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
T
tink2123 已提交
177 178
```

K
Khanh Tran 已提交
179
### prediction
T
tink2123 已提交
180

K
Khanh Tran 已提交
181
* Training engine prediction
T
tink2123 已提交
182

K
Khanh Tran 已提交
183
The model trained using PaddleOCR can be quickly predicted by the following script.
T
tink2123 已提交
184

K
Khanh Tran 已提交
185
The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`:
T
tink2123 已提交
186 187

```
K
Khanh Tran 已提交
188
# Predict English results
T
tink2123 已提交
189
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg
T
tink2123 已提交
190
```
T
tink2123 已提交
191

K
Khanh Tran 已提交
192
Input image:
T
tink2123 已提交
193 194 195

![](./imgs_words/en/word_1.png)

K
Khanh Tran 已提交
196
Get the prediction result of the input image:
T
tink2123 已提交
197 198

```
T
tink2123 已提交
199
infer_img: doc/imgs_words/en/word_1.png
T
tink2123 已提交
200 201 202 203
     index: [19 24 18 23 29]
     word : joint
```

K
Khanh Tran 已提交
204 205
The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model through `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`,
You can use the following command to predict the Chinese model.
T
tink2123 已提交
206 207

```
K
Khanh Tran 已提交
208
# Predict Chinese results
T
tink2123 已提交
209
python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg
T
tink2123 已提交
210 211
```

K
Khanh Tran 已提交
212
Input image:
T
tink2123 已提交
213

T
tink2123 已提交
214
![](./imgs_words/ch/word_1.jpg)
X
xiaoting 已提交
215

K
Khanh Tran 已提交
216
Get the prediction result of the input image:
T
tink2123 已提交
217 218

```
T
tink2123 已提交
219
infer_img: doc/imgs_words/ch/word_1.jpg
T
tink2123 已提交
220 221
     index: [2092  177  312 2503]
     word : 韩国小馆
T
tink2123 已提交
222
```