@@ -208,6 +208,8 @@ Execute the built executable file:
...
@@ -208,6 +208,8 @@ Execute the built executable file:
./build/ppocr [--param1][--param2][...]
./build/ppocr [--param1][--param2][...]
```
```
**Note**:ppocr uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3, 48, 320`, so if you use the recognition function, you need to add the parameter `--rec_img_h=48`, if you do not use the default `PP-OCRv3` model, you do not need to set this parameter.
Specifically,
Specifically,
##### 1. det+cls+rec:
##### 1. det+cls+rec:
...
@@ -220,6 +222,7 @@ Specifically,
...
@@ -220,6 +222,7 @@ Specifically,
--det=true\
--det=true\
--rec=true\
--rec=true\
--cls=true\
--cls=true\
--rec_img_h=48\
```
```
##### 2. det+rec:
##### 2. det+rec:
...
@@ -231,6 +234,7 @@ Specifically,
...
@@ -231,6 +234,7 @@ Specifically,
--det=true\
--det=true\
--rec=true\
--rec=true\
--cls=false\
--cls=false\
--rec_img_h=48\
```
```
##### 3. det
##### 3. det
...
@@ -250,6 +254,7 @@ Specifically,
...
@@ -250,6 +254,7 @@ Specifically,
--det=false\
--det=false\
--rec=true\
--rec=true\
--cls=true\
--cls=true\
--rec_img_h=48\
```
```
##### 5. rec
##### 5. rec
...
@@ -260,6 +265,7 @@ Specifically,
...
@@ -260,6 +265,7 @@ Specifically,
--det=false\
--det=false\
--rec=true\
--rec=true\
--cls=false\
--cls=false\
--rec_img_h=48\
```
```
##### 6. cls
##### 6. cls
...
@@ -335,10 +341,10 @@ The detection results will be shown on the screen, which is as follows.
...
@@ -335,10 +341,10 @@ The detection results will be shown on the screen, which is as follows.
```bash
```bash
predict img: ../../doc/imgs/12.jpg
predict img: ../../doc/imgs/12.jpg
../../doc/imgs/12.jpg
../../doc/imgs/12.jpg
0 det boxes: [[79,553],[399,541],[400,573],[80,585]] rec text: 打浦路252935号 rec score: 0.933757
0 det boxes: [[74,553],[427,542],[428,571],[75,582]] rec text: 打浦路252935号 rec score: 0.947724
1 det boxes: [[31,509],[510,488],[511,529],[33,549]] rec text: 绿洲仕格维花园公寓 rec score: 0.951745
1 det boxes: [[23,507],[513,488],[515,529],[24,548]] rec text: 绿洲仕格维花园公寓 rec score: 0.993728
2 det boxes: [[181,456],[395,448],[396,480],[182,488]] rec text: 打浦路15号 rec score: 0.91956
2 det boxes: [[187,456],[399,448],[400,480],[188,488]] rec text: 打浦路15号 rec score: 0.964994
3 det boxes: [[43,413],[480,391],[481,428],[45,450]] rec text: 上海斯格威铂尔多大酒店 rec score: 0.915914
3 det boxes: [[42,413],[483,391],[484,428],[43,450]] rec text: 上海斯格威铂尔大酒店 rec score: 0.980086
The detection visualized image saved in ./output//12.jpg
The detection visualized image saved in ./output//12.jpg
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
...
@@ -40,12 +40,12 @@ Set as `limit_type='min', det_limit_side_len=960`, it means that the shortest si
...
@@ -40,12 +40,12 @@ Set as `limit_type='min', det_limit_side_len=960`, it means that the shortest si
If the resolution of the input picture is relatively large and you want to use a larger resolution prediction, you can set det_limit_side_len to the desired value, such as 1216:
If the resolution of the input picture is relatively large and you want to use a larger resolution prediction, you can set det_limit_side_len to the desired value, such as 1216:
### 1. Lightweight Chinese Recognition Model Inference
### 1. Lightweight Chinese Recognition Model Inference
**Note**: The input shape used by the recognition model of `PP-OCRv3` is `3,48,320`, and the parameter `--rec_image_shape=3,48,320` needs to be added. If the recognition model of `PP-OCRv3` is not used, this parameter does not need to be set.
For lightweight Chinese recognition model inference, you can execute the following commands:
For lightweight Chinese recognition model inference, you can execute the following commands:
After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen.
After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen.
```bash
```bash
Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658)
Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.988671)
```
```
<aname="MULTILINGUAL_MODEL_INFERENCE"></a>
<aname="MULTILINGUAL_MODEL_INFERENCE"></a>
...
@@ -117,20 +120,22 @@ After executing the command, the prediction results (classification angle and sc
...
@@ -117,20 +120,22 @@ After executing the command, the prediction results (classification angle and sc
<aname="CONCATENATION"></a>
<aname="CONCATENATION"></a>
## Text Detection Angle Classification and Recognition Inference Concatenation
## Text Detection Angle Classification and Recognition Inference Concatenation
**Note**: The input shape used by the recognition model of `PP-OCRv3` is `3,48,320`, and the parameter `--rec_image_shape=3,48,320` needs to be added. If the recognition model of `PP-OCRv3` is not used, this parameter does not need to be set.
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default.
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default.
If you do not use the provided test image, you can replace the following `--image_dir` parameter with the corresponding test image path
If you do not use the provided test image, you can replace the following `--image_dir` parameter with the corresponding test image path
**Note**: The whl package uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3,48,320`, so if you use the recognition function, you need to add the parameter `--rec_image_shape 3,48,320`, if you do not use the default `PP- OCRv3` model, you do not need to set this parameter.
<aname="211-english-and-chinese-model"></a>
<aname="211-english-and-chinese-model"></a>
#### 2.1.1 Chinese and English Model
#### 2.1.1 Chinese and English Model
...
@@ -80,15 +82,15 @@ If you do not use the provided test image, you can replace the following `--imag
...
@@ -80,15 +82,15 @@ If you do not use the provided test image, you can replace the following `--imag
* Detection, direction classification and recognition: set the parameter`--use_gpu false` to disable the gpu device
* Detection, direction classification and recognition: set the parameter`--use_gpu false` to disable the gpu device
```bash
```bash
paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_clstrue--lang en --use_gpufalse
paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_clstrue--lang en --use_gpufalse--rec_image_shape 3,48,320
```
```
Output will be a list, each item contains bounding box, text and recognition confidence
Output will be a list, each item contains bounding box, text and recognition confidence
Output will be a list, each item contains text and recognition confidence
Output will be a list, each item contains text and recognition confidence
```bash
```bash
['PAIN', 0.990372]
['PAIN', 0.9934559464454651]
```
```
If you need to use the 2.0 model, please specify the parameter `--version PP-OCR`, paddleocr uses the 2.1 model by default(`--versioin PP-OCRv2`). More whl package usage can be found in [whl package](./whl_en.md)
If you need to use the 2.0 model, please specify the parameter `--version PP-OCR`, paddleocr uses the PP-OCRv3 model by default(`--versioin PP-OCRv3`). More whl package usage can be found in [whl package](./whl_en.md)
<aname="212-multi-language-model"></a>
<aname="212-multi-language-model"></a>
#### 2.1.2 Multi-language Model
#### 2.1.2 Multi-language Model
Paddleocr currently supports 80 languages, which can be switched by modifying the `--lang` parameter.
Paddleocr currently supports 80 languages, which can be switched by modifying the `--lang` parameter. PP-OCRv3 currently only supports Chinese and English models, and other multilingual models will be updated one after another.
**Note**: The whl package uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3,48,320`, so if you use the recognition function, you need to add the parameter `--rec_image_shape 3,48,320`, if you do not use the default `PP- OCRv3` model, you do not need to set this parameter.
* detection classification and recognition
* detection classification and recognition
```bash
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_clstrue--lang en
Output will be a list, each item contains text and recognition confidence
Output will be a list, each item contains text and recognition confidence
```bash
```bash
['PAIN', 0.990372]
['PAIN', 0.9934559464454651]
```
```
* only classification
* only classification
...
@@ -366,5 +368,4 @@ im_show.save('result.jpg')
...
@@ -366,5 +368,4 @@ im_show.save('result.jpg')
| cls | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction) | FALSE |
| cls | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction) | FALSE |
| show_log | Whether to print log| FALSE |
| show_log | Whether to print log| FALSE |
| type | Perform ocr or table structuring, the value is selected in ['ocr','structure'] | ocr |
| type | Perform ocr or table structuring, the value is selected in ['ocr','structure'] | ocr |
| ocr_version | OCR Model version number, the current model support list is as follows: PP-OCRv2 support Chinese detection and recognition model, PP-OCR support Chinese detection, recognition and direction classifier, multilingual recognition model | PP-OCRv2 |
| ocr_version | OCR Model version number, the current model support list is as follows: PP-OCRv3 support Chinese and English detection and recognition model and direction classifier model, PP-OCRv2 support Chinese detection and recognition model, PP-OCR support Chinese detection, recognition and direction classifier, multilingual recognition model | PP-OCRv3 |
| structure_version | table structure Model version number, the current model support list is as follows: STRUCTURE support english table structure model | STRUCTURE |
| layout | Whether to perform layout analysis in forward | True |
| layout | Whether to perform layout analysis in forward | True |
| table | Whether to perform table recognition in forward | True |
| table | Whether to perform table recognition in forward | True |
| ocr | Whether to perform ocr for non-table areas in layout analysis. When layout is False, it will be automatically set to False | True |
| ocr | Whether to perform ocr for non-table areas in layout analysis. When layout is False, it will be automatically set to False | True |
| structure_version | table structure Model version number, the current model support list is as follows: PP-STRUCTURE support english table structure model | PP-STRUCTURE |
Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md)
Most of the parameters are consistent with the PaddleOCR whl package, see [whl package documentation](../../doc/doc_en/whl.md)