inference_ppocr_en.md 7.8 KB
Newer Older
qq_25193841's avatar
qq_25193841 已提交
1

2
# Python Inference for PP-OCR Model Zoo
qq_25193841's avatar
qq_25193841 已提交
3 4 5 6

This article introduces the use of the Python inference engine for the PP-OCR model library. The content is in order of text detection, text recognition, direction classifier and the prediction method of the three in series on the CPU and GPU.


文幕地方's avatar
文幕地方 已提交
7 8 9 10 11 12 13
- [Python Inference for PP-OCR Model Zoo](#python-inference-for-pp-ocr-model-zoo)
  - [Text Detection Model Inference](#text-detection-model-inference)
  - [Text Recognition Model Inference](#text-recognition-model-inference)
    - [1. Lightweight Chinese Recognition Model Inference](#1-lightweight-chinese-recognition-model-inference)
    - [2. Multilingual Model Inference](#2-multilingual-model-inference)
  - [Angle Classification Model Inference](#angle-classification-model-inference)
  - [Text Detection Angle Classification and Recognition Inference Concatenation](#text-detection-angle-classification-and-recognition-inference-concatenation)
qq_25193841's avatar
qq_25193841 已提交
14 15 16

<a name="DETECTION_MODEL_INFERENCE"></a>

17
## Text Detection Model Inference
qq_25193841's avatar
qq_25193841 已提交
18 19 20 21 22

The default configuration is based on the inference setting of the DB text detection model. For lightweight Chinese detection model inference, you can execute the following commands:

```
# download DB text detection inference model
23 24
wget  https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar
tar xf ch_PP-OCRv3_det_infer.tar
25
# run inference
26
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_PP-OCRv3_det_infer/"
qq_25193841's avatar
qq_25193841 已提交
27 28
```

fanruinet's avatar
fanruinet 已提交
29
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
qq_25193841's avatar
qq_25193841 已提交
30 31 32 33 34 35 36 37 38 39 40 41 42

![](../imgs_results/det_res_00018069.jpg)

You can use the parameters `limit_type` and `det_limit_side_len` to limit the size of the input image,
The optional parameters of `limit_type` are [`max`, `min`], and
`det_limit_size_len` is a positive integer, generally set to a multiple of 32, such as 960.

The default setting of the parameters is `limit_type='max', det_limit_side_len=960`. Indicates that the longest side of the network input image cannot exceed 960,
If this value is exceeded, the image will be resized with the same width ratio to ensure that the longest side is `det_limit_side_len`.
Set as `limit_type='min', det_limit_side_len=960`, it means that the shortest side of the image is limited to 960.

If the resolution of the input picture is relatively large and you want to use a larger resolution prediction, you can set det_limit_side_len to the desired value, such as 1216:
```
43
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./ch_PP-OCRv3_det_infer/" --det_limit_type=max --det_limit_side_len=1216
qq_25193841's avatar
qq_25193841 已提交
44 45 46 47
```

If you want to use the CPU for prediction, execute the command as follows
```
48
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/1.jpg" --det_model_dir="./ch_PP-OCRv3_det_infer/"  --use_gpu=False
qq_25193841's avatar
qq_25193841 已提交
49 50 51 52
```

<a name="RECOGNITION_MODEL_INFERENCE"></a>

53
## Text Recognition Model Inference
qq_25193841's avatar
qq_25193841 已提交
54 55 56


<a name="LIGHTWEIGHT_RECOGNITION"></a>
57
### 1. Lightweight Chinese Recognition Model Inference
qq_25193841's avatar
qq_25193841 已提交
58

A
andyjpaddle 已提交
59
**Note**: The input shape used by the recognition model of `PP-OCRv3` is `3, 48, 320`. If you use other recognition models, you need to set the parameter `--rec_image_shape` according to the model. In addition, the `rec_algorithm` used by the recognition model of `PP-OCRv3` is `SVTR_LCNet` by default. Note the difference from the original `SVTR`.
60 61


qq_25193841's avatar
qq_25193841 已提交
62 63 64 65
For lightweight Chinese recognition model inference, you can execute the following commands:

```
# download CRNN text recognition inference model
66 67
wget  https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar
tar xf ch_PP-OCRv3_rec_infer.tar
68
# run inference
69
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_10.png" --rec_model_dir="./ch_PP-OCRv3_rec_infer/" --rec_image_shape=3,48,320
qq_25193841's avatar
qq_25193841 已提交
70 71 72 73 74 75 76
```

![](../imgs_words_en/word_10.png)

After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen.

```bash
77
Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.988671)
qq_25193841's avatar
qq_25193841 已提交
78 79 80 81
```

<a name="MULTILINGUAL_MODEL_INFERENCE"></a>

fanruinet's avatar
fanruinet 已提交
82
### 2. Multilingual Model Inference
83
If you need to predict [other language models](./models_list_en.md#Multilingual), when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
qq_25193841's avatar
qq_25193841 已提交
84 85 86
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:

```
87 88
wget wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar

文幕地方's avatar
文幕地方 已提交
89
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/korean/1.jpg" --rec_model_dir="./your inference model" --rec_char_dict_path="ppocr/utils/dict/korean_dict.txt" --vis_font_path="doc/fonts/korean.ttf"
qq_25193841's avatar
qq_25193841 已提交
90 91 92 93 94 95 96 97 98 99 100
```
![](../imgs_words/korean/1.jpg)

After executing the command, the prediction result of the above figure is:

``` text
Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904)
```

<a name="ANGLE_CLASS_MODEL_INFERENCE"></a>

101
## Angle Classification Model Inference
qq_25193841's avatar
qq_25193841 已提交
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120

For angle classification model inference, you can execute the following commands:


```
# download text angle class inference model:
wget  https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar
tar xf ch_ppocr_mobile_v2.0_cls_infer.tar
python3 tools/infer/predict_cls.py --image_dir="./doc/imgs_words_en/word_10.png" --cls_model_dir="ch_ppocr_mobile_v2.0_cls_infer"
```
![](../imgs_words_en/word_10.png)

After executing the command, the prediction results (classification angle and score) of the above image will be printed on the screen.

```
 Predicts of ./doc/imgs_words_en/word_10.png:['0', 0.9999995]
```

<a name="CONCATENATION"></a>
121
## Text Detection Angle Classification and Recognition Inference Concatenation
qq_25193841's avatar
qq_25193841 已提交
122

A
andyjpaddle 已提交
123
**Note**: The input shape used by the recognition model of `PP-OCRv3` is `3, 48, 320`. If you use other recognition models, you need to set the parameter `--rec_image_shape` according to the model. In addition, the `rec_algorithm` used by the recognition model of `PP-OCRv3` is `SVTR_LCNet` by default. Note the difference from the original `SVTR`.
124

qq_25193841's avatar
qq_25193841 已提交
125 126 127 128
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default.

```shell
# use direction classifier
A
andyjpaddle 已提交
129
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_PP-OCRv3_det_infer/" --cls_model_dir="./cls/" --rec_model_dir="./ch_PP-OCRv3_rec_infer/" --use_angle_cls=true
qq_25193841's avatar
qq_25193841 已提交
130 131

# not use use direction classifier
A
andyjpaddle 已提交
132
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_PP-OCRv3_det_infer/" --rec_model_dir="./ch_PP-OCRv3_rec_infer/" --use_angle_cls=false
qq_25193841's avatar
qq_25193841 已提交
133
# use multi-process
A
andyjpaddle 已提交
134
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/00018069.jpg" --det_model_dir="./ch_PP-OCRv3_det_infer/" --rec_model_dir="./ch_PP-OCRv3_rec_infer/" --use_angle_cls=false --use_mp=True --total_process_num=6
qq_25193841's avatar
qq_25193841 已提交
135 136 137 138 139
```


After executing the command, the recognition result image is as follows:

140
![](../imgs_results/system_res_00018069_v3.jpg)