algorithm_rec_rare_en.md 4.5 KB
Newer Older
L
lubin10 已提交
1 2
# RARE

L
lubin10 已提交
3 4 5
- [1. Introduction](#1)
- [2. Environment](#2)
- [3. Model Training / Evaluation / Prediction](#3)
L
lubin10 已提交
6 7
    - [3.1 Training](#3-1)
    - [3.2 Evaluation](#3-2)
L
lubin10 已提交
8
    - [3.3 Prediction](#3-3)
L
lubin10 已提交
9 10 11 12 13
- [4. Inference and Deployment](#4)
    - [4.1 Python Inference](#4-1)
    - [4.2 C++ Inference](#4-2)
    - [4.3 Serving](#4-3)
    - [4.4 More](#4-4)
L
lubin10 已提交
14 15 16
- [5. FAQ](#5)

<a name="1"></a>
L
lubin10 已提交
17
## 1. Introduction
L
lubin10 已提交
18 19 20 21 22 23 24 25 26 27

Paper information:
> [Robust Scene Text Recognition with Automatic Rectification](https://arxiv.org/abs/1603.03915v2)
> Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai∗
> CVPR, 2016

Using MJSynth and SynthText two text recognition datasets for training, and evaluating on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE datasets, the algorithm reproduction effect is as follows:

|Models|Backbone Networks|Configuration Files|Avg Accuracy|Download Links|
| --- | --- | --- | --- | --- |
28 29
|RARE|Resnet34_vd|[configs/rec/rec_r34_vd_tps_bilstm_att.yml](../../configs/rec/rec_r34_vd_tps_bilstm_att.yml)|83.60%|[training model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)|
|RARE|MobileNetV3|[configs/rec/rec_mv3_tps_bilstm_att.yml](../../configs/rec/rec_mv3_tps_bilstm_att.yml)|82.50%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_tps_bilstm_att_v2.0_train.tar)|
L
lubin10 已提交
30 31 32


<a name="2"></a>
L
lubin10 已提交
33
## 2. Environment
L
lubin10 已提交
34
Please refer to [Operating Environment Preparation](./environment_en.md) to configure the PaddleOCR operating environment, and refer to [Project Clone](./clone_en.md) to clone the project code.
L
lubin10 已提交
35 36

<a name="3"></a>
L
lubin10 已提交
37
## 3. Model Training / Evaluation / Prediction
L
lubin10 已提交
38

L
lubin10 已提交
39
Please refer to [Text Recognition Training Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**. Take the backbone network based on Resnet34_vd as an example:
L
lubin10 已提交
40 41 42 43 44 45 46 47 48 49 50

<a name="3-1"></a>
### 3.1 Training

````
#Single card training (long training period, not recommended)
python3 tools/train.py -c configs/rec/rec_r34_vd_tps_bilstm_att.yml
#Multi-card training, specify the card number through the --gpus parameter
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_r34_vd_tps_bilstm_att.yml
````

L
lubin10 已提交
51
<a name="3-2"></a>
L
lubin10 已提交
52 53 54 55 56 57 58
### 3.2 Evaluation

````
# GPU evaluation, Global.pretrained_model is the model to be evaluated
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r34_vd_tps_bilstm_att.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
````

L
lubin10 已提交
59
<a name="3-3"></a>
L
lubin10 已提交
60 61 62 63 64 65 66
### 3.3 Prediction

````
python3 tools/infer_rec.py -c configs/rec/rec_r34_vd_tps_bilstm_att.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
````

<a name="4"></a>
L
lubin10 已提交
67
## 4. Inference
L
lubin10 已提交
68 69

<a name="4-1"></a>
L
lubin10 已提交
70
### 4.1 Python Inference
L
lubin10 已提交
71 72 73 74 75 76 77 78 79
First, convert the model saved during the RARE text recognition training process into an inference model. Take the model trained on the MJSynth and SynthText text recognition datasets based on the Resnet34_vd backbone network as an example ([Model download address](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar) ), which can be converted using the following command:

```shell
python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_att.yml -o Global.pretrained_model=./rec_r34_vd_tps_bilstm_att_v2.0_train/best_accuracy Global.save_inference_dir=./inference/rec_rare
````

RARE text recognition model inference, you can execute the following commands:

```shell
L
lubin10 已提交
80 81 82 83 84 85 86 87
python3 tools/infer/predict_rec.py --image_dir="doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_rare/" --rec_image_shape="3, 32, 100" --rec_char_dict_path= "./ppocr/utils/ic15_dict.txt"
````
The inference results are as follows:

![](../../doc/imgs_words/en/word_1.png)

````
Predicts of doc/imgs_words/en/word_1.png:('joint ', 0.9999969601631165)
L
lubin10 已提交
88 89 90
````

<a name="4-2"></a>
L
lubin10 已提交
91
### 4.2 C++ Inference
L
lubin10 已提交
92 93 94 95

Not currently supported

<a name="4-3"></a>
L
lubin10 已提交
96
### 4.3 Serving
L
lubin10 已提交
97 98 99 100

Not currently supported

<a name="4-4"></a>
L
lubin10 已提交
101
### 4.4 More
L
lubin10 已提交
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119

The RARE model also supports the following inference deployment methods:

- Paddle2ONNX Inference: After preparing the inference model, refer to the [paddle2onnx](../../deploy/paddle2onnx/) tutorial.

<a name="5"></a>
## 5. FAQ

## Quote

````bibtex
@inproceedings{2016Robust,
  title={Robust Scene Text Recognition with Automatic Rectification},
  author={ Shi, B. and Wang, X. and Lyu, P. and Cong, Y. and Xiang, B. },
  booktitle={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2016},
}
````