This chapter introduces the C++ deployment method of the PaddleOCR model, and the corresponding python predictive deployment method refers to [document](../../doc/doc_ch/inference.md).
This chapter introduces the C++ deployment steps of the PaddleOCR model. The corresponding Python predictive deployment method refers to [document](../../doc/doc_ch/inference.md).
C++ is better than python in terms of performance calculation. Therefore, in most CPU and GPU deployment scenarios, C++ deployment is mostly used.
C++ is better than python in terms of performance. Therefore, in CPU and GPU deployment scenarios, C++ deployment is mostly used.
This section will introduce how to configure the C++ environment and complete it in the Linux\Windows (CPU\GPU) environment
This section will introduce how to configure the C++ environment and deploy PaddleOCR in Linux (CPU\GPU) environment. For Windows deployment please refer to [Windows](./docs/windows_vs2019_build.md) compilation guidelines.
PaddleOCR model deployment.
## 1. Prepare the Environment
## 1. Prepare the Environment
...
@@ -15,7 +14,7 @@ PaddleOCR model deployment.
...
@@ -15,7 +14,7 @@ PaddleOCR model deployment.
### 1.1 Compile OpenCV
### 1.1 Compile OpenCV
* First of all, you need to download the source code compiled package in the Linux environment from the opencv official website. Taking opencv3.4.7 as an example, the download command is as follows.
* First of all, you need to download the source code compiled package in the Linux environment from the OpenCV official website. Taking OpenCV 3.4.7 as an example, the download command is as follows.
Finally, you can see the folder of `opencv-3.4.7/` in the current directory.
Finally, you will see the folder of `opencv-3.4.7/` in the current directory.
* Compile opencv, the opencv source path (`root_path`) and installation path (`install_path`) should be set by yourself. Enter the opencv source code path and compile it in the following way.
* Compile OpenCV, the OpenCV source path (`root_path`) and installation path (`install_path`) should be set by yourself. Enter the OpenCV source code path and compile it in the following way.
```shell
```shell
...
@@ -58,11 +57,11 @@ make -j
...
@@ -58,11 +57,11 @@ make -j
make install
make install
```
```
Among them, `root_path` is the downloaded opencv source code path, and `install_path` is the installation path of opencv. After `make install` is completed, the opencv header file and library file will be generated in this folder for later OCR source code compilation.
In the above commands, `root_path` is the downloaded OpenCV source code path, and `install_path` is the installation path of OpenCV. After `make install` is completed, the OpenCV header file and library file will be generated in this folder for later OCR source code compilation.
The final file structure under the opencv installation path is as follows.
The final file structure under the OpenCV installation path is as follows.
```
```
opencv3/
opencv3/
...
@@ -79,20 +78,20 @@ opencv3/
...
@@ -79,20 +78,20 @@ opencv3/
#### 1.2.1 Direct download and installation
#### 1.2.1 Direct download and installation
[Paddle inference library official website](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html). You can view and select the appropriate version of the inference library on the official website.
[Paddle inference library official website](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html). You can review and select the appropriate version of the inference library on the official website.
* After downloading, use the following method to uncompress.
* After downloading, use the following command to extract files.
```
```
tar -xf paddle_inference.tgz
tar -xf paddle_inference.tgz
```
```
Finally you can see the following files in the folder of `paddle_inference/`.
Finally you will see the the folder of `paddle_inference/` in the current path.
#### 1.2.2 Compile from the source code
#### 1.2.2 Compile the inference source code
* If you want to get the latest Paddle inference library features, you can download the latest code from Paddle github repository and compile the inference library from the source code. It is recommended to download the inference library with paddle version greater than or equal to 2.0.1.
* If you want to get the latest Paddle inference library features, you can download the latest code from Paddle GitHub repository and compile the inference library from the source code. It is recommended to download the inference library with paddle version greater than or equal to 2.0.1.
* You can refer to [Paddle inference library] (https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html) to get the Paddle source code from github, and then compile To generate the latest inference library. The method of using git to access the code is as follows.
* You can refer to [Paddle inference library] (https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/inference/build_and_install_lib_en.html) to get the Paddle source code from GitHub, and then compile To generate the latest inference library. The method of using git to access the code is as follows.
Among them, `paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
`paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
## 2. Compile and Run the Demo
## 2. Compile and Run the Demo
### 2.1 Export the inference model
### 2.1 Export the inference model
* You can refer to [Model inference](../../doc/doc_ch/inference.md),export the inference model. After the model is exported, assuming it is placed in the `inference` directory, the directory structure is as follows.
* You can refer to [Model inference](../../doc/doc_ch/inference.md) and export the inference model. After the model is exported, assuming it is placed in the `inference` directory, the directory structure is as follows.
`OPENCV_DIR` is the opencv installation path; `LIB_DIR` is the download (`paddle_inference` folder)
`OPENCV_DIR` is the OpenCV installation path; `LIB_DIR` is the download (`paddle_inference` folder)
or the generated Paddle inference library path (`build/paddle_inference_install_dir` folder);
or the generated Paddle inference library path (`build/paddle_inference_install_dir` folder);
`CUDA_LIB_DIR` is the cuda library file path, in docker; it is `/usr/local/cuda/lib64`; `CUDNN_LIB_DIR` is the cudnn library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
`CUDA_LIB_DIR` is the CUDA library file path, in docker; it is `/usr/local/cuda/lib64`; `CUDNN_LIB_DIR` is the cuDNN library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
* After the compilation is completed, an executable file named `ppocr` will be generated in the `build` folder.
* After the compilation is completed, an executable file named `ppocr` will be generated in the `build` folder.
### Run the demo
### Run the demo
Execute the built executable file:
Execute the built executable file:
```shell
```shell
./build/ppocr <mode> [--param1][--param2][...]
./build/ppocr <mode> [--param1][--param2][...]
```
```
Here, `mode` is a required parameter,and the value range is ['det', 'rec', 'system'], representing using detection only, using recognition only and using the end-to-end system respectively. Specifically,
`mode` is a required parameter,and the valid values are
mode value | Model used
-----|------
det | Detection only
rec | Recognition only
system | End-to-end system
Specifically,
##### 1. run det demo:
##### 1. run det demo:
```shell
```shell
...
@@ -214,9 +221,9 @@ Here, `mode` is a required parameter,and the value range is ['det', 'rec', 'sy
...
@@ -214,9 +221,9 @@ Here, `mode` is a required parameter,and the value range is ['det', 'rec', 'sy
--image_dir=../../doc/imgs/12.jpg
--image_dir=../../doc/imgs/12.jpg
```
```
More parameters are as follows,
More parameters are as follows,
-common parameters
-Common parameters
|parameter|data type|default|meaning|
|parameter|data type|default|meaning|
| --- | --- | --- | --- |
| --- | --- | --- | --- |
...
@@ -226,7 +233,7 @@ More parameters are as follows,
...
@@ -226,7 +233,7 @@ More parameters are as follows,
|cpu_math_library_num_threads|int|10|Number of threads when using CPU inference. When machine cores is enough, the large the value, the faster the inference speed|
|cpu_math_library_num_threads|int|10|Number of threads when using CPU inference. When machine cores is enough, the large the value, the faster the inference speed|
|use_mkldnn|bool|true|Whether to use mkdlnn library|
|use_mkldnn|bool|true|Whether to use mkdlnn library|
-detection related parameters
-Detection related parameters
|parameter|data type|default|meaning|
|parameter|data type|default|meaning|
| --- | --- | --- | --- |
| --- | --- | --- | --- |
...
@@ -238,7 +245,7 @@ More parameters are as follows,
...
@@ -238,7 +245,7 @@ More parameters are as follows,
|use_polygon_score|bool|false|Whether to use polygon box to calculate bbox score, false means to use rectangle box to calculate. Use rectangular box to calculate faster, and polygonal box more accurate for curved text area.|
|use_polygon_score|bool|false|Whether to use polygon box to calculate bbox score, false means to use rectangle box to calculate. Use rectangular box to calculate faster, and polygonal box more accurate for curved text area.|
|visualize|bool|true|Whether to visualize the results,when it is set as true, The prediction result will be save in the image file `./ocr_vis.png`.|
|visualize|bool|true|Whether to visualize the results,when it is set as true, The prediction result will be save in the image file `./ocr_vis.png`.|
-classifier related parameters
-Classifier related parameters
|parameter|data type|default|meaning|
|parameter|data type|default|meaning|
| --- | --- | --- | --- |
| --- | --- | --- | --- |
...
@@ -246,7 +253,7 @@ More parameters are as follows,
...
@@ -246,7 +253,7 @@ More parameters are as follows,
|cls_model_dir|string|-|Address of direction classifier inference model|
|cls_model_dir|string|-|Address of direction classifier inference model|
|cls_thresh|float|0.9|Score threshold of the direction classifier|
|cls_thresh|float|0.9|Score threshold of the direction classifier|
-recogniton related parameters
-Recognition related parameters
|parameter|data type|default|meaning|
|parameter|data type|default|meaning|
| --- | --- | --- | --- |
| --- | --- | --- | --- |
...
@@ -265,4 +272,4 @@ The detection results will be shown on the screen, which is as follows.
...
@@ -265,4 +272,4 @@ The detection results will be shown on the screen, which is as follows.
### 2.3 Notes
### 2.3 Notes
* Paddle2.0.0 inference model library is recommended for this toturial.
* Paddle 2.0.0 inference model library is recommended for this tutorial.
Many users hope package the PaddleOCR service into a docker image, so that it can be quickly released and used in the docker or k8s environment.
Many users hope package the PaddleOCR service into a docker image, so that it can be quickly released and used in the docker or K8s environment.
This page provides some standardized code to achieve this goal. You can quickly publish the PaddleOCR project into a callable Restful API service through the following steps. (At present, the deployment based on the HubServing mode is implemented first, and author plans to increase the deployment of the PaddleServing mode in the futrue)
This page provides some standardized code to achieve this goal. You can quickly publish the PaddleOCR project into a callable Restful API service through the following steps. (At present, the deployment based on the HubServing mode is implemented first, and author plans to increase the deployment of the PaddleServing mode in the future)
## 1. Prerequisites
## 1. Prerequisites
...
@@ -14,7 +14,7 @@ c. NVIDIA Container Toolkit(GPU,Docker 19.03+ can skip this)
...
@@ -14,7 +14,7 @@ c. NVIDIA Container Toolkit(GPU,Docker 19.03+ can skip this)
d. cuDNN 7.6+(GPU)
d. cuDNN 7.6+(GPU)
## 2. Build Image
## 2. Build Image
a. Goto Dockerfile directory(ps:Need to distinguish between cpu and gpu version, the following takes cpu as an example, gpu version needs to replace the keyword)
a. Go to Dockerfile directory(PS: Need to distinguish between CPU and GPU version, the following takes CPU as an example, GPU version needs to replace the keyword)
```
```
cd deploy/docker/hubserving/cpu
cd deploy/docker/hubserving/cpu
```
```
...
@@ -42,13 +42,13 @@ docker logs -f paddle_ocr
...
@@ -42,13 +42,13 @@ docker logs -f paddle_ocr
```
```
## 4. Test
## 4. Test
a. Calculate the Base64 encoding of the picture to be recognized (if you just test, you can use a free online tool, like:https://freeonlinetools24.com/base64-image/)
a. Calculate the Base64 encoding of the picture to be recognized (For test purpose, you can use a free online tool such as https://freeonlinetools24.com/base64-image/ )
b. Post a service request(sample request in sample_request.txt)
b. Post a service request(sample request in sample_request.txt)
```
```
curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"Input image Base64 encode(need to delete the code 'data:image/jpg;base64,')\"]}" http://localhost:8868/predict/ocr_system
curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"Input image Base64 encode(need to delete the code 'data:image/jpg;base64,')\"]}" http://localhost:8868/predict/ocr_system
```
```
c. Get resposne(If the call is successful, the following result will be returned)
c. Get response(If the call is successful, the following result will be returned)
This tutorial will introduce how to use [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) to deploy paddleOCR ultra-lightweight Chinese and English detection models on mobile phones.
This tutorial will introduce how to use [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) to deploy PaddleOCR ultra-lightweight Chinese and English detection models on mobile phones.
paddle-lite is a lightweight inference engine for PaddlePaddle. It provides efficient inference capabilities for mobile phones and IoTs, and extensively integrates cross-platform hardware to provide lightweight deployment solutions for end-side deployment issues.
paddle-lite is a lightweight inference engine for PaddlePaddle. It provides efficient inference capabilities for mobile phones and IoT, and extensively integrates cross-platform hardware to provide lightweight deployment solutions for end-side deployment issues.
PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en.md). Developers can choose their own models or use their own models according to their needs.
PaddleOCR also provides a series of [models](../../../doc/doc_en/models_list_en.md). Developers can choose their own models or use their own models according to their needs.
### 3. Pruning sensitivity analysis
### 3. Pruning sensitivity analysis
After the pre-training model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
After the pre-trained model is loaded, sensitivity analysis is performed on each network layer of the model to understand the redundancy of each network layer, and save a sensitivity file which named: sen.pickle. After that, user could load the sensitivity file via the [methods provided by PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/prune/sensitive.py#L221) and determining the pruning ratio of each network layer automatically. For specific details of sensitivity analysis, see:[Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/image_classification_sensitivity_analysis_tutorial.md)
The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of correspoding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)
The function would return a dict after loading the sensitivity file. The keys of the dict are name of parameters in each layer. And the value of key is the information about pruning sensitivity of corresponding layer. In example, pruning 10% filter of the layer corresponding to conv10_expand_weights would lead to 0.65% degradation of model performance. The details could be seen at: [Sensitivity analysis](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/algo/algo.md#2-%E5%8D%B7%E7%A7%AF%E6%A0%B8%E5%89%AA%E8%A3%81%E5%8E%9F%E7%90%86)
Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command:
Enter the PaddleOCR root directory,perform sensitivity analysis on the model with the following command:
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.
Generally, a more complex model would achieve better performance in the task, but it also leads to some redundancy in the model.
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
so as to reduce model calculation complexity and improve model inference performance.
so as to reduce model calculation complexity and improve model inference performance.
...
@@ -31,14 +31,14 @@ python setup.py install
...
@@ -31,14 +31,14 @@ python setup.py install
```
```
### 2. Download Pretrain Model
### 2. Download Pre-trained Model
PaddleOCR provides a series of trained [models](../../../doc/doc_en/models_list_en.md).
PaddleOCR provides a series of pre-trained [models](../../../doc/doc_en/models_list_en.md).
If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../doc/doc_en/quickstart_en.md) method to get the trained model.
If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../doc/doc_en/quickstart_en.md) method to get the trained model.
### 3. Quant-Aware Training
### 3. Quant-Aware Training
Quantization training includes offline quantization training and online quantization training.
Quantization training includes offline quantization training and online quantization training.
Online quantization training is more effective. It is necessary to load the pre-training model.
Online quantization training is more effective. It is necessary to load the pre-trained model.
After the quantization strategy is defined, the model can be quantified.
After the quantization strategy is defined, the model can be quantified.
The code for quantization training is located in `slim/quantization/quant.py`. For example, to train a detection model, the training instructions are as follows:
The code for quantization training is located in `slim/quantization/quant.py`. For example, to train a detection model, the training instructions are as follows:
1.**Prediction error: got an unexpected keyword argument 'gradient_clip'**
1.**Prediction error: got an unexpected keyword argument 'gradient_clip'**
The installed version of paddle is incorrect. Currently, this project only supports paddle1.7, which will be adapted to 1.8 in the near future.
The installed version of paddle is incorrect. Currently, this project only supports Paddle 1.7, which will be adapted to 1.8 in the near future.
2.**Error when converting attention recognition model: KeyError: 'predict'**
2.**Error when converting attention recognition model: KeyError: 'predict'**
Solved. Please update to the latest version of the code.
Solved. Please update to the latest version of the code.
...
@@ -31,7 +31,7 @@ At present, PaddleOCR has opensourced two Chinese models, namely 8.6M ultra-ligh
...
@@ -31,7 +31,7 @@ At present, PaddleOCR has opensourced two Chinese models, namely 8.6M ultra-ligh
|General Chinese OCR model|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml|
|General Chinese OCR model|Resnet50_vd+Resnet34_vd|det_r50_vd_db.yml|rec_chinese_common_train.yml|
8.**Is there a plan to opensource a model that only recognizes numbers or only English + numbers?**
8.**Is there a plan to opensource a model that only recognizes numbers or only English + numbers?**
It is not planned to opensource numbers only, numbers + English only, or other vertical text models. Paddleocr has opensourced a variety of detection and recognition algorithms for customized training. The two Chinese models are also based on the training output of the open-source algorithm library. You can prepare the data according to the tutorial, choose the appropriate configuration file, train yourselves, and we believe that you can get good result. If you have any questions during the training, you are welcome to open issues or ask in the communication group. We will answer them in time.
It is not planned to opensource numbers only, numbers + English only, or other vertical text models. PaddleOCR has opensourced a variety of detection and recognition algorithms for customized training. The two Chinese models are also based on the training output of the open-source algorithm library. You can prepare the data according to the tutorial, choose the appropriate configuration file, train yourselves, and we believe that you can get good result. If you have any questions during the training, you are welcome to open issues or ask in the communication group. We will answer them in time.
9.**What is the training data used by the open-source model? Can it be opensourced?**
9.**What is the training data used by the open-source model? Can it be opensourced?**
At present, the open source model, dataset and magnitude are as follows:
At present, the open source model, dataset and magnitude are as follows:
...
@@ -46,11 +46,11 @@ At present, the open source model, dataset and magnitude are as follows:
...
@@ -46,11 +46,11 @@ At present, the open source model, dataset and magnitude are as follows:
10.**Error in using the model with TPS module for prediction**
10.**Error in using the model with TPS module for prediction**
Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3]\(108) != Grid dimension[2]\(100)
Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3]\(108) != Grid dimension[2]\(100)
Solution:TPS does not support variable shape. Please set --rec_image_shape='3,32,100' and --rec_char_type='en'
Solution: TPS does not support variable shape. Please set --rec_image_shape='3,32,100' and --rec_char_type='en'
11.**Custom dictionary used during training, the recognition results show that words do not appear in the dictionary**
11.**Custom dictionary used during training, the recognition results show that words do not appear in the dictionary**
The used custom dictionary path is not set when making prediction. The solution is setting parameter `rec_char_dict_path` to the corresponding dictionary file.
The used custom dictionary path is not set when making prediction. The solution is setting parameter `rec_char_dict_path` to the corresponding dictionary file.
12.**Results of cpp_infer and python_inference are very different**
12.**Results of cpp_infer and python_inference are very different**
Versions of exprted inference model and inference libraray should be same. For example, on Windows platform, version of the inference libraray that PaddlePaddle provides is 1.8, but version of the inference model that PaddleOCR provides is 1.7, you should export model yourself(`tools/export_model.py`) on PaddlePaddle1.8 and then use the exported model for inference.
Versions of exported inference model and inference library should be same. For example, on Windows platform, version of the inference library that PaddlePaddle provides is 1.8, but version of the inference model that PaddleOCR provides is 1.7, you should export model yourself(`tools/export_model.py`) on PaddlePaddle 1.8 and then use the exported model for inference.
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, Please uncomment the `RecAug` and `RandAugment` fields under `Train.dataset.transforms` in the configuration file.
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, Please uncomment the `RecAug` and `RandAugment` fields under `Train.dataset.transforms` in the configuration file.
The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment.
The default perturbation methods are: cvtColor, blur, jitter, Gauss noise, random crop, perspective, color reverse, RandAugment.
Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to:
Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to:
@@ -15,9 +15,9 @@ The following list can be viewed through `--help`
...
@@ -15,9 +15,9 @@ The following list can be viewed through `--help`
| -c | ALL | Specify configuration file to use | None | **Please refer to the parameter introduction for configuration file usage** |
| -c | ALL | Specify configuration file to use | None | **Please refer to the parameter introduction for configuration file usage** |
| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false |
| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false |
| save_model_dir | Set model save path | output/{算法名称} | \ |
| save_model_dir | Set model save path | output/{算法名称} | \ |
| save_epoch_step | Set model save interval | 3 | \ |
| save_epoch_step | Set model save interval | 3 | \ |
| eval_batch_step | Set the model evaluation interval | 2000 or [1000, 2000] | runing evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration |
| eval_batch_step | Set the model evaluation interval | 2000 or [1000, 2000] | running evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration |
| cal_metric_during_train | Set whether to evaluate the metric during the training process. At this time, the metric of the model under the current batch is evaluated | true | \ |
| cal_metric_during_train | Set whether to evaluate the metric during the training process. At this time, the metric of the model under the current batch is evaluated | true | \ |
| load_static_weights | Set whether the pre-training model is saved in static graph mode (currently only required by the detection algorithm) | true | \ |
| load_static_weights | Set whether the pre-training model is saved in static graph mode (currently only required by the detection algorithm) | true | \ |
| pretrained_model | Set the path of the pre-trained model | ./pretrain_models/CRNN/best_accuracy | \ |
| pretrained_model | Set the path of the pre-trained model | ./pretrain_models/CRNN/best_accuracy | \ |
...
@@ -65,7 +65,7 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck
...
@@ -65,7 +65,7 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck
| model_type | Network Type | rec | Currently support`rec`,`det`,`cls` |
| model_type | Network Type | rec | Currently support`rec`,`det`,`cls` |
| algorithm | Model name | CRNN | See [algorithm_overview](./algorithm_overview.md) for the support list |
| algorithm | Model name | CRNN | See [algorithm_overview](./algorithm_overview_en.md) for the support list |
| **Transform** | Set the transformation method | - | Currently only recognition algorithms are supported, see [ppocr/modeling/transform](../../ppocr/modeling/transform) for details |
| **Transform** | Set the transformation method | - | Currently only recognition algorithms are supported, see [ppocr/modeling/transform](../../ppocr/modeling/transform) for details |
| name | Transformation class name | TPS | Currently supports `TPS` |
| name | Transformation class name | TPS | Currently supports `TPS` |
| num_fiducial | Number of TPS control points | 20 | Ten on the top and bottom |
| num_fiducial | Number of TPS control points | 20 | Ten on the top and bottom |
...
@@ -134,14 +134,14 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck
...
@@ -134,14 +134,14 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck
## 3. Multilingual Config File Generation
## 3. Multilingual Config File Generation
PaddleOCR currently supports 80 (except Chinese) language recognition. A multi-language configuration file template is
PaddleOCR currently supports recognition for 80 languages (besides Chinese). A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
There are two ways to create the required configuration file::
There are two ways to create the required configuration file:
1. Automatically generated by script
1. Automatically generated by script
[generate_multi_language_configs.py](../../configs/rec/multi_language/generate_multi_language_configs.py) Can help you generate configuration files for multi-language models
Script [generate_multi_language_configs.py](../../configs/rec/multi_language/generate_multi_language_configs.py) can help you generate configuration files for multi-language models.
- Take Italian as an example, if your data is prepared in the following format:
- Take Italian as an example, if your data is prepared in the following format:
```
```
...
@@ -196,21 +196,21 @@ Italian is made up of Latin letters, so after executing the command, you will ge
...
@@ -196,21 +196,21 @@ Italian is made up of Latin letters, so after executing the command, you will ge
epoch_num: 500
epoch_num: 500
...
...
character_dict_path: {path/of/dict} # path of dict
character_dict_path: {path/of/dict} # path of dict
Train:
Train:
dataset:
dataset:
name: SimpleDataSet
name: SimpleDataSet
data_dir: train_data/ # root directory of training data
data_dir: train_data/ # root directory of training data
@@ -22,7 +22,7 @@ For more details about data preparation and training tutorials, refer to the doc
...
@@ -22,7 +22,7 @@ For more details about data preparation and training tutorials, refer to the doc
PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages: text detection, text rectification, text recognition, and score filtering to output the text position and recognition results, and at the same time, you can choose to visualize the results.
PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages: text detection, text rectification, text recognition, and score filtering to output the text position and recognition results, and at the same time, you can choose to visualize the results.
When performing prediction, you need to specify the path of a single image or a image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path of detection model, and the parameter `rec_model_dir` specifies the path of recogniton model. The visualized results are saved to the `./inference_results` folder by default.
When performing prediction, you need to specify the path of a single image or a image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path of detection model, and the parameter `rec_model_dir` specifies the path of recognition model. The visualized results are saved to the `./inference_results` folder by default.
@@ -59,10 +59,10 @@ The `points` in the dictionary represent the coordinates (x, y) of the four poin
...
@@ -59,10 +59,10 @@ The `points` in the dictionary represent the coordinates (x, y) of the four poin
If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
### 1.2 Download Pretrained Model
### 1.2 Download Pre-trained Model
First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
First download the pre-trained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
And the responding download link of backbone pretrain weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
And the responding download link of backbone pre-trained weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
The high performance of distributed training is one of the core advantages of PaddlePaddle. In the classification task, distributed training can achieve almost linear speedup ratio. Generally, OCR training task need massive training data. Such as recognition, ppocrv2.0 model is trained based on 1800W dataset, which is very time-consuming if using single machine. Therefore, the distributed training is used in paddleocr to speedup the training task. For more information about distributed training, please refer to [distributed training quick start tutorial](https://fleet-x.readthedocs.io/en/latest/paddle_fleet_rst/parameter_server/ps_quick_start.html).
The high performance of distributed training is one of the core advantages of PaddlePaddle. In the classification task, distributed training can achieve almost linear speedup ratio. Generally, OCR training task need massive training data. Such as recognition, PP-OCR v2.0 model is trained based on 1800W dataset, which is very time-consuming if using single machine. Therefore, the distributed training is used in PaddleOCR to speedup the training task. For more information about distributed training, please refer to [distributed training quick start tutorial](https://fleet-x.readthedocs.io/en/latest/paddle_fleet_rst/parameter_server/ps_quick_start.html).
* The IP addresses of different machines need to be separated by commas, which can be queried through `ifconfig` or `ipconfig`.
* The IP addresses of different machines need to be separated by commas, which can be queried through `ifconfig` or `ipconfig`.
* Different machines need to be set to be secret free and can `ping` success with others directly, otherwise communication cannot establish between them.
* Different machines need to be set to be secret free and can `ping` success with others directly, otherwise communication cannot establish between them.
* The code, data and start command betweent different machines must be completely consistent and then all machines need to run start command. The first machine in the `ip_list` is set to `trainer0`, and so on.
* The code, data and start command between different machines must be completely consistent and then all machines need to run start command. The first machine in the `ip_list` is set to `trainer0`, and so on.
- To activate the conda environment you just created, enter the following command at the command line.
- To activate the Conda environment you just created, enter the following command at the command line.
```shell
```shell
# Activate the paddle_env environment
# Activate the paddle_env environment
...
@@ -91,7 +91,7 @@ The above anaconda environment and python environment are installed
...
@@ -91,7 +91,7 @@ The above anaconda environment and python environment are installed
#### 1.2.1 Installing Anaconda
#### 1.2.1 Installing Anaconda
- Note: To use paddlepaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit
- Note: To use PaddlePaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit
- Anaconda is a common python package manager
- Anaconda is a common python package manager
- After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment
- After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment
...
@@ -108,17 +108,17 @@ The above anaconda environment and python environment are installed
...
@@ -108,17 +108,17 @@ The above anaconda environment and python environment are installed
- Just follow the default settings, it will take a while to install
- Just follow the default settings, it will take a while to install
- It is recommended to install a code editor such as vscode or pycharm
- It is recommended to install a code editor such as VSCode or PyCharm
#### 1.2.2 Open a terminal and create a conda environment
#### 1.2.2 Open a terminal and create a Conda environment
- Open the terminal
- Open the terminal
- Press command and spacebar at the same time, type "terminal" in the focus search, double click to enter terminal
- Press command and spacebar at the same time, type "terminal" in the focus search, double click to enter terminal
-**Add conda to the environment variables**
-**Add Conda to the environment variables**
- Environment variables are added so that the system can recognize the conda command
- Environment variables are added so that the system can recognize the Conda command
- Open `~/.bash_profile` in the terminal by typing the following command.
- Open `~/.bash_profile` in the terminal by typing the following command.
...
@@ -126,7 +126,7 @@ The above anaconda environment and python environment are installed
...
@@ -126,7 +126,7 @@ The above anaconda environment and python environment are installed
vim ~/.bash_profile
vim ~/.bash_profile
```
```
- Add conda as an environment variable in `~/.bash_profile`.
- Add Conda as an environment variable in `~/.bash_profile`.
```shell
```shell
# Press i first to enter edit mode
# Press i first to enter edit mode
...
@@ -156,12 +156,12 @@ The above anaconda environment and python environment are installed
...
@@ -156,12 +156,12 @@ The above anaconda environment and python environment are installed
- When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit
- When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit
- Verify that the conda command is recognized.
- Verify that the Conda command is recognized.
- Enter `source ~/.bash_profile` in the terminal to update the environment variables
- Enter `source ~/.bash_profile` in the terminal to update the environment variables
- Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then conda has been added to the environment variables
- Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then Conda has been added to the environment variables
- Create a new conda environment
- Create a new Conda environment
```shell
```shell
# Enter the following command at the command line to create an environment called paddle_env
# Enter the following command at the command line to create an environment called paddle_env
...
@@ -175,7 +175,7 @@ The above anaconda environment and python environment are installed
...
@@ -175,7 +175,7 @@ The above anaconda environment and python environment are installed
- To activate the conda environment you just created, enter the following command at the command line.
- To activate the Conda environment you just created, enter the following command at the command line.
```shell
```shell
# Activate the paddle_env environment
# Activate the paddle_env environment
...
@@ -198,7 +198,7 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
...
@@ -198,7 +198,7 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
#### 1.3.1 Anaconda environment configuration
#### 1.3.1 Anaconda environment configuration
- Note: To use paddlepaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit
- Note: To use PaddlePaddle you need to install the python environment first, here we choose the python integrated environment Anaconda toolkit
- Anaconda is a common python package manager
- Anaconda is a common python package manager
- After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment
- After installing Anaconda, you can install the python environment, as well as numpy and other required toolkit environment
...
@@ -214,9 +214,9 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
...
@@ -214,9 +214,9 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
- Select the appropriate version for your operating system
- Select the appropriate version for your operating system
- Type `uname -m` in the terminal to check the command set used by your system
- Type `uname -m` in the terminal to check the command set used by your system
- Download method 1: Download locally, then transfer the installation package to the linux server
- Download method 1: Download locally, then transfer the installation package to the Linux server
- Download method 2: Directly use linux command line to download
- Download method 2: Directly use Linux command line to download
```shell
```shell
# First install wget
# First install wget
...
@@ -277,12 +277,12 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
...
@@ -277,12 +277,12 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
- When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit
- When you are done, press `esc` to exit edit mode, then type `:wq!` and enter to save and exit
- Verify that the conda command is recognized.
- Verify that the Conda command is recognized.
- Enter `source ~/.bash_profile` in the terminal to update the environment variables
- Enter `source ~/.bash_profile` in the terminal to update the environment variables
- Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then conda has been added to the environment variables
- Enter `conda info --envs` in the terminal again, if it shows that there is a base environment, then Conda has been added to the environment variables
- Create a new conda environment
- Create a new Conda environment
```shell
```shell
# Enter the following command at the command line to create an environment called paddle_env
# Enter the following command at the command line to create an environment called paddle_env
...
@@ -296,7 +296,7 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
...
@@ -296,7 +296,7 @@ Linux users can choose to run either Anaconda or Docker. If you are familiar wit
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_00018069.jpg)
![](../imgs_results/det_res_00018069.jpg)
...
@@ -244,7 +244,7 @@ The visualized text detection results are saved to the `./inference_results` fol
...
@@ -244,7 +244,7 @@ The visualized text detection results are saved to the `./inference_results` fol
<aname="RECOGNITION_MODEL_INFERENCE"></a>
<aname="RECOGNITION_MODEL_INFERENCE"></a>
## 3. Text Recognition Model Inference
## 3. Text Recognition Model Inference
The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details.
The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inference. Please check below for details.
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
![](../imgs_results/det_res_00018069.jpg)
![](../imgs_results/det_res_00018069.jpg)
...
@@ -75,7 +75,7 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658)
...
@@ -75,7 +75,7 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658)
<aname="MULTILINGUAL_MODEL_INFERENCE"></a>
<aname="MULTILINGUAL_MODEL_INFERENCE"></a>
### 2. Multilingaul Model Inference
### 2. Multilingual Model Inference
If you need to predict [other language models](./models_list_en.md#Multilingual), when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
If you need to predict [other language models](./models_list_en.md#Multilingual), when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility.
After testing, PaddleOCR can run on glibc 2.23. You can also test other glibc versions or install glibc 2.23 for the best compatibility.
PaddleOCR working environment:
PaddleOCR working environment:
- PaddlePaddle 2.0.0
- PaddlePaddle 2.0.0
-python3.7
-Python 3.7
- glibc 2.23
- glibc 2.23
It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker[link](https://www.runoob.com/docker/docker-tutorial.html/).
It is recommended to use the docker provided by us to run PaddleOCR. Please refer to the docker tutorial[link](https://www.runoob.com/docker/docker-tutorial.html/).
*If you want to directly run the prediction code on mac or windows, you can start from step 2.*
*If you want to directly run the prediction code on Mac or Windows, you can start from step 2.*
**1. (Recommended) Prepare a docker environment. The first time you use this docker image, it will be downloaded automatically. Please be patient.**
**1. (Recommended) Prepare a docker environment. For the first time you use this docker image, it will be downloaded automatically. Please be patient.**
# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method.
# Note: The mirror on Gitee may not keep in synchronization with the latest update with the project on GitHub. There might be a delay of 3-5 days. Please try GitHub at first.
> 1. Compared with the model v2.0, the 2.1 version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model is optimized in accuracy and CPU speed.
> 1. Compared with the model v2.0, the 2.1 version of the detection model has a improvement in accuracy, and the 2.1 version of the recognition model has optimizations in accuracy and speed with CPU.
> 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
> 2. Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
> 3. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md).
> 3. All models in this tutorial are all ppocr-series models, for more introduction of algorithms and models based on public dataset, you can refer to [algorithm overview tutorial](./algorithm_overview_en.md).
...
@@ -18,7 +18,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
...
@@ -18,7 +18,7 @@ The downloadable models provided by PaddleOCR include `inference model`, `traine
|--- | --- | --- |
|--- | --- | --- |
|inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine,[detail](./inference_en.md)|
|inference model|inference.pdmodel、inference.pdiparams|Used for inference based on Paddle inference engine,[detail](./inference_en.md)|
|trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
|trained model, pre-trained model|\*.pdparams、\*.pdopt、\*.states |The checkpoints model saved in the training process, which stores the parameters of the model, mostly used for model evaluation and continuous training.|
|slim model|\*.nb| Model compressed by PaddleSim (a model compression tool using PaddlePaddle), which is suitable for mobile-side deployment scenarios (Paddle-Lite is needed for slim model deployment). |
|slim model|\*.nb| Model compressed by PaddleSlim (a model compression tool using PaddlePaddle), which is suitable for mobile-side deployment scenarios (Paddle-Lite is needed for slim model deployment). |
Relationship of the above models is as follows.
Relationship of the above models is as follows.
...
@@ -50,7 +50,7 @@ Relationship of the above models is as follows.
...
@@ -50,7 +50,7 @@ Relationship of the above models is as follows.
|ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
|ch_ppocr_server_v2.0_rec|General model, supporting Chinese, English and number recognition|[rec_chinese_common_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml)|94.8M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
**Note:** The `trained model` is finetuned on the `pre-trained model` with real data and synthsized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthsized data, which is more suitable for finetune on your own dataset.
**Note:** The `trained model` is fine-tuned on the `pre-trained model` with real data and synthesized vertical text data, which achieved better performance in real scene. The `pre-trained model` is directly trained on the full amount of real data and synthesized data, which is more suitable for fine-tune on your own dataset.
ppocr also supports direction classification. For more usage methods, please refer to: [whl package instructions](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_ch/whl.md).
PPOCR also supports direction classification. For more detailed usage, please refer to: [whl package instructions](whl_en.md).
<aname="Custom_training"></a>
<aname="Custom_training"></a>
## 3 Custom training
## 3 Custom training
ppocr supports using your own data for custom training or finetune, where the recognition model can refer to [French configuration file](../../configs/rec/multi_language/rec_french_lite_train.yml)
PPOCR supports using your own data for custom training or fine-tune, where the recognition model can refer to [French configuration file](../../configs/rec/multi_language/rec_french_lite_train.yml)
Modify the training data path, dictionary and other parameters.
Modify the training data path, dictionary and other parameters.
For specific data preparation and training process, please refer to: [Text Detection](../doc_en/detection_en.md), [Text Recognition](../doc_en/recognition_en.md), more functions such as predictive deployment,
For specific data preparation and training process, please refer to: [Text Detection](../doc_en/detection_en.md), [Text Recognition](../doc_en/recognition_en.md), more functions such as predictive deployment,
...
@@ -183,7 +183,7 @@ For functions such as data annotation, you can read the complete [Document Tutor
...
@@ -183,7 +183,7 @@ For functions such as data annotation, you can read the complete [Document Tutor
## 4 Inference and Deployment
## 4 Inference and Deployment
In addition to installing the whl package for quick forecasting,
In addition to installing the whl package for quick forecasting,
ppocr also provides a variety of forecasting deployment methods.
PPOCR also provides a variety of forecasting deployment methods.
PaddleOCR contains rich text detection, text recognition and end-to-end algorithms. Combining actual testing and industrial experience, PaddleOCR chooses DB and CRNN as the basic detection and recognition models, and proposes a series of models, named PP-OCR, for industrial applications after a series of optimization strategies. The PP-OCR model is aimed at general scenarios and forms a model library according to different languages. Based on the capabilities of PP-OCR, PaddleOCR releases the PP-Structure tool library for document scene tasks, including two major tasks: layout analysis and table recognition. In order to get through the entire process of industrial landing, PaddleOCR provides large-scale data production tools and a variety of prediction deployment tools to help developers quickly turn ideas into reality.
PaddleOCR contains rich text detection, text recognition and end-to-end algorithms. With the experience from real world scenarios and the industry, PaddleOCR chooses DB and CRNN as the basic detection and recognition models, and proposes a series of models, named PP-OCR, for industrial applications after a series of optimization strategies. The PP-OCR model is aimed at general scenarios and forms a model library of different languages. Based on the capabilities of PP-OCR, PaddleOCR releases the PP-Structure toolkit for document scene tasks, including two major tasks: layout analysis and table recognition. In order to get through the entire process of industrial landing, PaddleOCR provides large-scale data production tools and a variety of prediction deployment tools to help developers quickly turn ideas into reality.
<divalign="center">
<divalign="center">
<imgsrc="../overview_en.png">
<imgsrc="../overview_en.png">
...
@@ -18,11 +18,11 @@ PaddleOCR contains rich text detection, text recognition and end-to-end algorith
...
@@ -18,11 +18,11 @@ PaddleOCR contains rich text detection, text recognition and end-to-end algorith
# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method.
# Note: The mirror on Gitee may not keep in synchronization with the latest project on GitHub. There might be a delay of 3-5 days. Please try GitHub at first.
OCR algorithm can be divided into two-stage algorithm and end-to-end algorithm. The two-stage OCR algorithm is generally divided into two parts, text detection and text recognition algorithm. The text detection algorithm gets the detection box of the text line from the image, and then the recognition algorithm identifies the content of the text box. The end-to-end OCR algorithm can complete text detection and recognition in one algorithm. Its basic idea is to design a model with both detection unit and recognition module, share the CNN features of both and train them together. Because one algorithm can complete character recognition, the end-to-end model is smaller and faster.
OCR algorithms can be divided into two categories: two-stage algorithm and end-to-end algorithm. The two-stage OCR algorithm is generally divided into two parts, text detection and text recognition algorithm. The text detection algorithm locates the box of the text line from the image, and then the recognition algorithm identifies the content of the text box. The end-to-end OCR algorithm combines text detection and recognition in one algorithm. Its basic idea is to design a model with both detection unit and recognition module, share the CNN features of both and train them together. Because one algorithm can complete character recognition, the end-to-end model is smaller and faster.
### Introduction Of PGNet Algorithm
### Introduction Of PGNet Algorithm
In recent years, the end-to-end OCR algorithm has been well developed, including MaskTextSpotter series, TextSnake, TextDragon, PGNet series and so on. Among these algorithms, PGNet algorithm has the advantages that other algorithms do not
During the recent years, the end-to-end OCR algorithm has been well developed, including MaskTextSpotter series, TextSnake, TextDragon, PGNet series and so on. Among these algorithms, PGNet algorithm has some advantages over the other algorithms.
- Pgnet loss is designed to guide training, and no character-level annotations is needed
- PGNet loss is designed to guide training, and no character-level annotations is needed.
- NMS and ROI related operations are not needed, It can accelerate the prediction
- NMS and ROI related operations are not needed. It can accelerate the prediction
- The reading order prediction module is proposed
- The reading order prediction module is proposed
- A graph based modification module (GRM) is proposed to further improve the performance of model recognition
- A graph based modification module (GRM) is proposed to further improve the performance of model recognition
- Higher accuracy and faster prediction speed
- Higher accuracy and faster prediction speed
For details of PGNet algorithm, please refer to [paper](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) ,The schematic diagram of the algorithm is as follows:
For details of PGNet algorithm, please refer to [paper](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf). The schematic diagram of the algorithm is as follows:
![](../pgnet_framework.png)
![](../pgnet_framework.png)
After feature extraction, the input image is sent to four branches: TBO module for text edge offset prediction, TCL module for text centerline prediction, TDO module for text direction offset prediction, and TCC module for text character classification graph prediction.
After feature extraction, the input image is sent to four branches: TBO module for text edge offset prediction, TCL module for text center-line prediction, TDO module for text direction offset prediction, and TCC module for text character classification graph prediction.
The output of TBO and TCL can get text detection results after post-processing, and TCL, TDO and TCC are responsible for text recognition.
The output of TBO and TCL can get text detection results after post-processing, and TCL, TDO and TCC are responsible for text recognition.
The results of detection and recognition are as follows:
The results of detection and recognition are as follows:
...
@@ -40,7 +40,7 @@ Please refer to [Operation Environment Preparation](./environment_en.md) to conf
...
@@ -40,7 +40,7 @@ Please refer to [Operation Environment Preparation](./environment_en.md) to conf
<aname="Quick_Use"></a>
<aname="Quick_Use"></a>
## 3. Quick Use
## 3. Quick Use
### inference model download
### Inference model download
This section takes the trained end-to-end model as an example to quickly use the model prediction. First, download the trained end-to-end inference model [download address](https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/e2e_server_pgnetA_infer.tar)
This section takes the trained end-to-end model as an example to quickly use the model prediction. First, download the trained end-to-end inference model [download address](https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/e2e_server_pgnetA_infer.tar)
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
If you would like to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training.
This article will introduce the basic concepts that is necessary for model training and tuning.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene.
At the same time, it will briefly introduce the structure of the training data and how to prepare the data to fine-tune model in vertical scenes.
<aname="1-Yml-Configuration"></a>
<aname="1-Yml-Configuration"></a>
## 1. Yml Configuration
## 1. Yml Configuration
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify.
The PaddleOCR uses configuration files to control network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to train the model. Fine-tuning can also be completed by modifying the parameters in the configuration file, which is simple and convenient.
For the complete configuration file description, please refer to [Configuration File](./config_en.md)
For the complete configuration file description, please refer to [Configuration File](./config_en.md)
...
@@ -28,13 +28,13 @@ For the complete configuration file description, please refer to [Configuration
...
@@ -28,13 +28,13 @@ For the complete configuration file description, please refer to [Configuration
## 2. Basic Concepts
## 2. Basic Concepts
In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference:
During the model training process, some hyper-parameters can be manually specified to obtain the optimal result at the least cost. Different data volumes may require different hyper-parameters. When you want to fine-tune the model based on your own data, there are several parameter adjustment strategies for reference:
<aname="11-learning-rate"></a>
<aname="11-learning-rate"></a>
### 2.1 Learning Rate
### 2.1 Learning Rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
The learning rate is one of the most important hyper-parameters for training neural networks. It represents the step length of the gradient moving towards the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
A variety of learning rate update strategies are provided by PaddleOCR, which can be specified in configuration files. For example,
```
```
Optimizer:
Optimizer:
...
@@ -46,16 +46,15 @@ Optimizer:
...
@@ -46,16 +46,15 @@ Optimizer:
warmup_epoch: 5
warmup_epoch: 5
```
```
Piecewise stands for piecewise constant attenuation. Different learning rates are specified in different learning stages,
`Piecewise` stands for piece-wise constant attenuation. Different learning rates are specified in different learning stages, and the learning rate stay the same in each stage.
and the learning rate is the same in each stage.
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
`warmup_epoch` means that in the first 5 epochs, the learning rate will be increased gradually from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
<aname="12-regularization"></a>
<aname="12-regularization"></a>
### 2.2 Regularization
### 2.2 Regularization
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods.
Regularization can effectively avoid algorithm over-fitting. PaddleOCR provides L1 and L2 regularization methods.
L1 and L2 regularization are the most commonly used regularization methods.
L1 and L2 regularization are the most widely used regularization methods.
L1 regularization adds a regularization term to the objective function to reduce the sum of absolute values of the parameters;
L1 regularization adds a regularization term to the objective function to reduce the sum of absolute values of the parameters;
while in L2 regularization, the purpose of adding a regularization term is to reduce the sum of squared parameters.
while in L2 regularization, the purpose of adding a regularization term is to reduce the sum of squared parameters.
The configuration method is as follows:
The configuration method is as follows:
...
@@ -95,7 +94,7 @@ The current open source models, data sets and magnitudes are as follows:
...
@@ -95,7 +94,7 @@ The current open source models, data sets and magnitudes are as follows:
- Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data.
- Chinese data set, LSVT street view data set crops the image according to the truth value, and performs position calibration, a total of 30w images. In addition, based on the LSVT corpus, 500w of synthesized data.
- Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set.
- Small language data set, using different corpora and fonts, respectively generated 100w synthetic data set, and using ICDAR-MLT as the verification set.
Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](./datasets.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc.
Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](../doc_ch/datasets.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc.
<aname="22-vertical-scene"></a>
<aname="22-vertical-scene"></a>
...
@@ -129,17 +128,17 @@ There are several experiences for reference when constructing the data set:
...
@@ -129,17 +128,17 @@ There are several experiences for reference when constructing the data set:
**Q**: How to choose a suitable network input shape when training CRNN recognition?
**Q**: How to choose a suitable network input shape when training CRNN recognition?
A: The general height is 32, the longest width is selected, there are two methods:
A: The general height is 32, the longest width is selected, there are two methods:
(1) Calculate the aspect ratio distribution of training sample images. The selection of the maximum aspect ratio considers 80% of the training samples.
(1) Calculate the aspect ratio distribution of training sample images. The selection of the maximum aspect ratio considers 80% of the training samples.
(2) Count the number of texts in training samples. The selection of the longest number of characters considers the training sample that satisfies 80%. Then the aspect ratio of Chinese characters is approximately considered to be 1, and that of English is 3:1, and the longest width is estimated.
(2) Count the number of texts in training samples. The selection of the longest number of characters considers the training sample that satisfies 80%. Then the aspect ratio of Chinese characters is approximately considered to be 1, and that of English is 3:1, and the longest width is estimated.
**Q**: During the recognition training, the accuracy of the training set has reached 90, but the accuracy of the verification set has been kept at 70, what should I do?
**Q**: During the recognition training, the accuracy of the training set has reached 90, but the accuracy of the verification set has been kept at 70, what should I do?
A: If the accuracy of the training set is 90 and the test set is more than 70, it should be over-fitting. There are two methods to try:
A: If the accuracy of the training set is 90 and the test set is more than 70, it should be over-fitting. There are two methods to try:
(1) Add more augmentation methods or increase the [probability] of augmented prob (https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341), The default is 0.4.
(1) Add more augmentation methods or increase the [probability] of augmented prob (https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341), The default is 0.4.
(2) Increase the [l2 dcay value] of the system (https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47)
(2) Increase the [l2 dcay value] of the system (https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47)
**Q**: When the recognition model is trained, loss can drop normally, but acc is always 0
**Q**: When the recognition model is trained, loss can drop normally, but acc is always 0
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
- 2021.1.21 update more than 25+ multilingual recognition models [models list](./doc/doc_en/models_list_en.md), including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
- 2021.1.21 update more than 25+ multilingual recognition models [models list](./models_list_en.md), including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image.
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image.
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](../../PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
@@ -347,8 +347,9 @@ class PaddleOCR(predict_system.TextSystem):
...
@@ -347,8 +347,9 @@ class PaddleOCR(predict_system.TextSystem):
ocr with paddleocr
ocr with paddleocr
args:
args:
img: img for ocr, support ndarray, img_path and list or ndarray
img: img for ocr, support ndarray, img_path and list or ndarray
det: use text detection or not, if false, only rec will be exec. default is True
det: use text detection or not. If false, only rec will be exec. Default is True
rec: use text recognition or not, if false, only det will be exec. default is True
rec: use text recognition or not. If false, only det will be exec. Default is True
cls: use angle classifier or not. Default is True. If true, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.
@@ -96,7 +96,7 @@ In PP-Structure, the image will be divided into 5 types of areas **text, title,
...
@@ -96,7 +96,7 @@ In PP-Structure, the image will be divided into 5 types of areas **text, title,
#### 6.1.1 Layout analysis
#### 6.1.1 Layout analysis
Layout analysis classifies image by region, including the use of Python scripts of layout analysis tools, extraction of designated category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README.md).
Layout analysis classifies image by region, including the use of Python scripts of layout analysis tools, extraction of designated category detection boxes, performance indicators, and custom training layout analysis models. For details, please refer to [document](layout/README_en.md).