@@ -47,7 +47,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
...
@@ -47,7 +47,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel


> It is recommended to start with the “quick experience” in the document tutorial
> It is recommended to start with the “quick start” in the document tutorial
## Quick Experience
## Quick Experience
...
@@ -63,10 +63,11 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
...
@@ -63,10 +63,11 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
<aname="Community"></a>
<aname="Community"></a>
## Community
## Community👬
-**Join us**👬: Scan the QR code below with your Wechat, you can join the official technical discussion group. Looking forward to your participation.
-For international developers, we regard [PaddleOCR Discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions) as our international community platform. All ideas and questions can be discussed here in English.
- For Chinese develops, Scan the QR code below with your Wechat, you can join the official technical discussion group. For richer community content, please refer to [中文README](README_ch.md), looking forward to your participation.
@@ -208,7 +208,7 @@ Execute the built executable file:
...
@@ -208,7 +208,7 @@ Execute the built executable file:
./build/ppocr [--param1][--param2][...]
./build/ppocr [--param1][--param2][...]
```
```
**Note**:ppocr uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3, 48, 320`, so if you use the recognition function, you need to add the parameter `--rec_img_h=48`, if you do not use the default `PP-OCRv3` model, you do not need to set this parameter.
**Note**:ppocr uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3, 48, 320`, if you want to use the old version model, you should add the parameter `--rec_img_h=32`.
Specifically,
Specifically,
...
@@ -222,7 +222,6 @@ Specifically,
...
@@ -222,7 +222,6 @@ Specifically,
--det=true\
--det=true\
--rec=true\
--rec=true\
--cls=true\
--cls=true\
--rec_img_h=48\
```
```
##### 2. det+rec:
##### 2. det+rec:
...
@@ -234,7 +233,6 @@ Specifically,
...
@@ -234,7 +233,6 @@ Specifically,
--det=true\
--det=true\
--rec=true\
--rec=true\
--cls=false\
--cls=false\
--rec_img_h=48\
```
```
##### 3. det
##### 3. det
...
@@ -254,7 +252,6 @@ Specifically,
...
@@ -254,7 +252,6 @@ Specifically,
--det=false\
--det=false\
--rec=true\
--rec=true\
--cls=true\
--cls=true\
--rec_img_h=48\
```
```
##### 5. rec
##### 5. rec
...
@@ -265,7 +262,6 @@ Specifically,
...
@@ -265,7 +262,6 @@ Specifically,
--det=false\
--det=false\
--rec=true\
--rec=true\
--cls=false\
--cls=false\
--rec_img_h=48\
```
```
##### 6. cls
##### 6. cls
...
@@ -330,7 +326,7 @@ More parameters are as follows,
...
@@ -330,7 +326,7 @@ More parameters are as follows,
|rec_model_dir|string|-|Address of recognition inference model|
|rec_model_dir|string|-|Address of recognition inference model|
* Multi-language inference is also supported in PaddleOCR, you can refer to [recognition tutorial](../../doc/doc_en/recognition_en.md) for more supported languages and models in PaddleOCR. Specifically, if you want to infer using multi-language models, you just need to modify values of `rec_char_dict_path` and `rec_model_dir`.
* Multi-language inference is also supported in PaddleOCR, you can refer to [recognition tutorial](../../doc/doc_en/recognition_en.md) for more supported languages and models in PaddleOCR. Specifically, if you want to infer using multi-language models, you just need to modify values of `rec_char_dict_path` and `rec_model_dir`.
@@ -34,7 +34,7 @@ For the compilation process of different development environments, please refer
...
@@ -34,7 +34,7 @@ For the compilation process of different development environments, please refer
### 1.2 Prepare Paddle-Lite library
### 1.2 Prepare Paddle-Lite library
There are two ways to obtain the Paddle-Lite library:
There are two ways to obtain the Paddle-Lite library:
- 1. Download directly, the download link of the Paddle-Lite library is as follows:
- 1. [Recommended] Download directly, the download link of the Paddle-Lite library is as follows:
| Platform | Paddle-Lite library download link |
| Platform | Paddle-Lite library download link |
|---|---|
|---|---|
...
@@ -43,7 +43,9 @@ There are two ways to obtain the Paddle-Lite library:
...
@@ -43,7 +43,9 @@ There are two ways to obtain the Paddle-Lite library:
Note: 1. The above Paddle-Lite library is compiled from the Paddle-Lite 2.10 branch. For more information about Paddle-Lite 2.10, please refer to [link](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.10).
Note: 1. The above Paddle-Lite library is compiled from the Paddle-Lite 2.10 branch. For more information about Paddle-Lite 2.10, please refer to [link](https://github.com/PaddlePaddle/Paddle-Lite/releases/tag/v2.10).
- 2. [Recommended] Compile Paddle-Lite to get the prediction library. The compilation method of Paddle-Lite is as follows:
**Note: It is recommended to use paddlelite>=2.10 version of the prediction library, other prediction library versions [download link](https://github.com/PaddlePaddle/Paddle-Lite/tags)**
- 2. Compile Paddle-Lite to get the prediction library. The compilation method of Paddle-Lite is as follows:
@@ -104,21 +106,17 @@ If you directly use the model in the above table for deployment, you can skip th
...
@@ -104,21 +106,17 @@ If you directly use the model in the above table for deployment, you can skip th
If the model to be deployed is not in the above table, you need to follow the steps below to obtain the optimized model.
If the model to be deployed is not in the above table, you need to follow the steps below to obtain the optimized model.
The `opt` tool can be obtained by compiling Paddle Lite.
- Step 1: Refer to [document](https://www.paddlepaddle.org.cn/lite/v2.10/user_guides/opt/opt_python.html) to install paddlelite, which is used to convert paddle inference model to paddlelite required for running nb model
pip install paddlelite==2.10 # The paddlelite version should be the same as the prediction library version
cd Paddle-Lite
git checkout release/v2.10
./lite/tools/build.sh build_optimize_tool
```
```
After installation, the following commands can view the help information
After the compilation is complete, the opt file is located under build.opt/lite/api/, You can view the operating options and usage of opt in the following ways:
```
```
cd build.opt/lite/api/
paddle_lite_opt
./opt
```
```
Introduction to paddle_lite_opt parameters:
|Options|Description|
|Options|Description|
|---|---|
|---|---|
|--model_dir|The path of the PaddlePaddle model to be optimized (non-combined form)|
|--model_dir|The path of the PaddlePaddle model to be optimized (non-combined form)|
...
@@ -131,6 +129,8 @@ cd build.opt/lite/api/
...
@@ -131,6 +129,8 @@ cd build.opt/lite/api/
`--model_dir` is suitable for the non-combined mode of the model to be optimized, and the inference model of PaddleOCR is the combined mode, that is, the model structure and model parameters are stored in a single file.
`--model_dir` is suitable for the non-combined mode of the model to be optimized, and the inference model of PaddleOCR is the combined mode, that is, the model structure and model parameters are stored in a single file.
- Step 2: Use paddle_lite_opt to convert the inference model to the mobile model format.
The following takes the ultra-lightweight Chinese model of PaddleOCR as an example to introduce the use of the compiled opt file to complete the conversion of the inference model to the Paddle-Lite optimized model
The following takes the ultra-lightweight Chinese model of PaddleOCR as an example to introduce the use of the compiled opt file to complete the conversion of the inference model to the Paddle-Lite optimized model
```
```
...
@@ -240,6 +240,7 @@ det_db_thresh 0.3 # Used to filter the binarized image of DB prediction,
...
@@ -240,6 +240,7 @@ det_db_thresh 0.3 # Used to filter the binarized image of DB prediction,
det_db_box_thresh 0.5 # DDB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate
det_db_box_thresh 0.5 # DDB post-processing filter box threshold, if there is a missing box detected, it can be reduced as appropriate
det_db_unclip_ratio 1.6 # Indicates the compactness of the text box, the smaller the value, the closer the text box to the text
det_db_unclip_ratio 1.6 # Indicates the compactness of the text box, the smaller the value, the closer the text box to the text
use_direction_classify 0 # Whether to use the direction classifier, 0 means not to use, 1 means to use
use_direction_classify 0 # Whether to use the direction classifier, 0 means not to use, 1 means to use
rec_image_height 32 # The height of the input image of the recognition model, the PP-OCRv3 model needs to be set to 48, and the PP-OCRv2 model needs to be set to 32
```
```
5. Run Model on phone
5. Run Model on phone
...
@@ -258,8 +259,15 @@ After the above steps are completed, you can use adb to push the file to the pho
...
@@ -258,8 +259,15 @@ After the above steps are completed, you can use adb to push the file to the pho
cd /data/local/tmp/debug
cd /data/local/tmp/debug
export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
# The use of ocr_db_crnn is:
# The use of ocr_db_crnn is:
# ./ocr_db_crnn Detection model file Orientation classifier model file Recognition model file Test image path Dictionary file path
# ./ocr_db_crnn Mode Detection model file Orientation classifier model file Recognition model file Hardware Precision Threads Batchsize Test image path Dictionary file path
If you modify the code, you need to recompile and push to the phone.
If you modify the code, you need to recompile and push to the phone.
...
@@ -283,3 +291,7 @@ A2: Replace the .jpg test image under ./debug with the image you want to test, a
...
@@ -283,3 +291,7 @@ A2: Replace the .jpg test image under ./debug with the image you want to test, a
Q3: How to package it into the mobile APP?
Q3: How to package it into the mobile APP?
A3: This demo aims to provide the core algorithm part that can run OCR on mobile phones. Further, PaddleOCR/deploy/android_demo is an example of encapsulating this demo into a mobile app for reference.
A3: This demo aims to provide the core algorithm part that can run OCR on mobile phones. Further, PaddleOCR/deploy/android_demo is an example of encapsulating this demo into a mobile app for reference.
Q4: When running the demo, an error is reported `Error: This model is not supported, because kernel for 'io_copy' is not supported by Paddle-Lite.`
A4: The problem is that the installed paddlelite version does not match the downloaded prediction library version. Make sure that the paddleliteopt tool matches your prediction library version, and try to switch to the nb model again.
@@ -77,7 +77,7 @@ LK-PAN (Large Kernel PAN) is a lightweight [PAN](https://arxiv.org/pdf/1803.0153
...
@@ -77,7 +77,7 @@ LK-PAN (Large Kernel PAN) is a lightweight [PAN](https://arxiv.org/pdf/1803.0153
**(2) DML: Deep Mutual Learning Strategy for Teacher Model**
**(2) DML: Deep Mutual Learning Strategy for Teacher Model**
[DML](https://arxiv.org/abs/1706.00384)(Collaborative Mutual Learning), as shown in the figure below, can effectively improve the accuracy of the text detection model by learning from each other with two models with the same structure. The DML strategy is adopted in the teacher model training, and the hmean is increased from 85% to 86%. By updating the teacher model of CML in PP-OCRv2 to the above-mentioned higher-precision one, the hmean of the student model can be further improved from 83.2% to 84.3%.
[DML](https://arxiv.org/abs/1706.00384)(Deep Mutual Learning), as shown in the figure below, can effectively improve the accuracy of the text detection model by learning from each other with two models with the same structure. The DML strategy is adopted in the teacher model training, and the hmean is increased from 85% to 86%. By updating the teacher model of CML in PP-OCRv2 to the above-mentioned higher-precision one, the hmean of the student model can be further improved from 83.2% to 84.3%.
<divalign="center">
<divalign="center">
...
@@ -101,7 +101,7 @@ Considering that the features of some channels will be suppressed if the convolu
...
@@ -101,7 +101,7 @@ Considering that the features of some channels will be suppressed if the convolu
The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability.
The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm [SVTR](https://arxiv.org/abs/2205.00159). RNN is abandoned in SVTR, and the context information of the text line image is more effectively mined by introducing the Transformers structure, thereby improving the text recognition ability.
The recognition accuracy of SVTR_inty outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.
The recognition accuracy of SVTR_tiny outperforms PP-OCRv2 recognition model by 5.3%, while the prediction speed nearly 11 times slower. It takes nearly 100ms to predict a text line on CPU. Therefore, as shown in the figure below, PP-OCRv3 adopts the following six optimization strategies to accelerate the recognition model.
@@ -29,10 +29,10 @@ PP-OCR pipeline is as follows:
...
@@ -29,10 +29,10 @@ PP-OCR pipeline is as follows:
PP-OCR system is in continuous optimization. At present, PP-OCR and PP-OCRv2 have been released:
PP-OCR system is in continuous optimization. At present, PP-OCR and PP-OCRv2 have been released:
PP-OCR adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).
PP-OCR adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the [PP-OCR technical report](https://arxiv.org/abs/2009.09941).
#### PP-OCRv2
#### PP-OCRv2
On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (https://arxiv.org/abs/2109.03144).
On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the [PP-OCRv2 technical report](https://arxiv.org/abs/2109.03144).
#### PP-OCRv3
#### PP-OCRv3
...
@@ -46,7 +46,7 @@ PP-OCRv3 pipeline is as follows:
...
@@ -46,7 +46,7 @@ PP-OCRv3 pipeline is as follows:
<imgsrc="../ppocrv3_framework.png"width="800">
<imgsrc="../ppocrv3_framework.png"width="800">
</div>
</div>
For more details, please refer to [PP-OCRv3 technical report](./PP-OCRv3_introduction_en.md).
For more details, please refer to [PP-OCRv3 technical report](https://arxiv.org/abs/2206.03001v2).
@@ -119,7 +119,18 @@ If you do not use the provided test image, you can replace the following `--imag
...
@@ -119,7 +119,18 @@ If you do not use the provided test image, you can replace the following `--imag
['PAIN', 0.9934559464454651]
['PAIN', 0.9934559464454651]
```
```
If you need to use the 2.0 model, please specify the parameter `--ocr_version PP-OCR`, paddleocr uses the PP-OCRv3 model by default(`--ocr_version PP-OCRv3`). More whl package usage can be found in [whl package](./whl_en.md)
**Version**
paddleocr uses the PP-OCRv3 model by default(`--ocr_version PP-OCRv3`). If you want to use other versions, you can set the parameter `--ocr_version`, the specific version description is as follows:
| version name | description |
| --- | --- |
| PP-OCRv3 | support Chinese and English detection and recognition, direction classifier, support multilingual recognition |
| PP-OCRv2 | only supports Chinese and English detection and recognition, direction classifier, multilingual model is not updated |
| PP-OCR | support Chinese and English detection and recognition, direction classifier, support multilingual recognition |
If you want to add your own trained model, you can add model links and keys in [paddleocr](../../paddleocr.py) and recompile.
More whl package usage can be found in [whl package](./whl_en.md)
@@ -440,7 +446,7 @@ class PaddleOCR(predict_system.TextSystem):
...
@@ -440,7 +446,7 @@ class PaddleOCR(predict_system.TextSystem):
"""
"""
ocr with paddleocr
ocr with paddleocr
args:
args:
img: img for ocr, support ndarray, img_path and list or ndarray
img: img for ocr, support ndarray, img_path and list of ndarray
det: use text detection or not. If false, only rec will be exec. Default is True
det: use text detection or not. If false, only rec will be exec. Default is True
rec: use text recognition or not. If false, only det will be exec. Default is True
rec: use text recognition or not. If false, only det will be exec. Default is True
cls: use angle classifier or not. Default is True. If true, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.
cls: use angle classifier or not. Default is True. If true, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.
Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`.
Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`.
...
@@ -203,7 +203,7 @@ First use the `tools/infer_vqa_token_ser.py` script to complete the prediction o
...
@@ -203,7 +203,7 @@ First use the `tools/infer_vqa_token_ser.py` script to complete the prediction o
Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`.
Finally, the prediction result visualization image and the prediction result text file will be saved in the directory configured by the `config.Global.save_res_path` field. The prediction result text file is named `infer_results.txt`.