For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
This chapter introduces the C++ deployment method of the PaddleOCR model, and the corresponding python predictive deployment method refers to [document](../../doc/doc_ch/inference.md).
This chapter introduces the C++ deployment method of the PaddleOCR model, and the corresponding python predictive deployment method refers to [document](../../doc/doc_ch/inference.md).
C++ is better than python in terms of performance calculation. Therefore, in most CPU and GPU deployment scenarios, C++ deployment is mostly used.
C++ is better than python in terms of performance calculation. Therefore, in most CPU and GPU deployment scenarios, C++ deployment is mostly used.
...
@@ -6,14 +6,14 @@ This section will introduce how to configure the C++ environment and complete it
...
@@ -6,14 +6,14 @@ This section will introduce how to configure the C++ environment and complete it
PaddleOCR model deployment.
PaddleOCR model deployment.
## 1. Prepare the environment
## 1. Prepare the Environment
### Environment
### Environment
- Linux, docker is recommended.
- Linux, docker is recommended.
### 1.1 Compile opencv
### 1.1 Compile OpenCV
* First of all, you need to download the source code compiled package in the Linux environment from the opencv official website. Taking opencv3.4.7 as an example, the download command is as follows.
* First of all, you need to download the source code compiled package in the Linux environment from the opencv official website. Taking opencv3.4.7 as an example, the download command is as follows.
...
@@ -73,7 +73,7 @@ opencv3/
...
@@ -73,7 +73,7 @@ opencv3/
|-- share
|-- share
```
```
### 1.2 Compile or download or the Paddle inference library
### 1.2 Compile or Download or the Paddle Inference Library
* There are 2 ways to obtain the Paddle inference library, described in detail below.
* There are 2 ways to obtain the Paddle inference library, described in detail below.
Among them, `paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
Among them, `paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
The angle classification is used in the scene where the image is not 0 degrees. In this scene, it is necessary to perform a correction operation on the text line detected in the picture. In the PaddleOCR system,
The angle classification is used in the scene where the image is not 0 degrees. In this scene, it is necessary to perform a correction operation on the text line detected in the picture. In the PaddleOCR system,
The text line image obtained after text detection is sent to the recognition model after affine transformation. At this time, only a 0 and 180 degree angle classification of the text is required, so the built-in PaddleOCR text angle classifier **only supports 0 and 180 degree classification**. If you want to support more angles, you can modify the algorithm yourself to support.
The text line image obtained after text detection is sent to the recognition model after affine transformation. At this time, only a 0 and 180 degree angle classification of the text is required, so the built-in PaddleOCR text angle classifier **only supports 0 and 180 degree classification**. If you want to support more angles, you can modify the algorithm yourself to support.
...
@@ -16,7 +17,7 @@ Example of 0 and 180 degree data samples:
...
@@ -16,7 +17,7 @@ Example of 0 and 180 degree data samples:
![](../imgs_results/angle_class_example.jpg)
![](../imgs_results/angle_class_example.jpg)
<aname="data-preparation"></a>
<aname="data-preparation"></a>
## Data Preparation
## 2. Data Preparation
Please organize the dataset as follows:
Please organize the dataset as follows:
...
@@ -72,7 +73,7 @@ containing all images (test) and a cls_gt_test.txt. The structure of the test se
...
@@ -72,7 +73,7 @@ containing all images (test) and a cls_gt_test.txt. The structure of the test se
| ...
| ...
```
```
<aname="training"></a>
<aname="training"></a>
## Training
## 3. Training
Write the prepared txt file and image folder path into the configuration file under the `Train/Eval.dataset.label_file_list` and `Train/Eval.dataset.data_dir` fields, the absolute path of the image consists of the `Train/Eval.dataset.data_dir` field and the image name recorded in the txt file.
Write the prepared txt file and image folder path into the configuration file under the `Train/Eval.dataset.label_file_list` and `Train/Eval.dataset.data_dir` fields, the absolute path of the image consists of the `Train/Eval.dataset.data_dir` field and the image name recorded in the txt file.
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts.
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts.
...
@@ -117,7 +118,7 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
...
@@ -117,7 +118,7 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
<aname="evaluation"></a>
<aname="evaluation"></a>
## Evaluation
## 4. Evaluation
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/cls/cls_mv3.yml` file.
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/cls/cls_mv3.yml` file.
This document gives the performance of the series models for Chinese and English recognition.
This document gives the performance of the series models for Chinese and English recognition.
## TEST DATA
## Test Data
We collected 300 images for different real application scenarios to evaluate the overall OCR system, including contract samples, license plates, nameplates, train tickets, test sheets, forms, certificates, street view images, business cards, digital meter, etc. The following figure shows some images of the test set.
We collected 300 images for different real application scenarios to evaluate the overall OCR system, including contract samples, license plates, nameplates, train tickets, test sheets, forms, certificates, street view images, business cards, digital meter, etc. The following figure shows some images of the test set.
...
@@ -10,7 +10,7 @@ We collected 300 images for different real application scenarios to evaluate the
...
@@ -10,7 +10,7 @@ We collected 300 images for different real application scenarios to evaluate the
@@ -7,7 +15,9 @@ The following list can be viewed through `--help`
...
@@ -7,7 +15,9 @@ The following list can be viewed through `--help`
| -c | ALL | Specify configuration file to use | None | **Please refer to the parameter introduction for configuration file usage** |
| -c | ALL | Specify configuration file to use | None | **Please refer to the parameter introduction for configuration file usage** |
| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false |
| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false |
## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE
## 2. Intorduction to Global Parameters of Configuration File
Take rec_chinese_lite_train_v2.0.yml as an example
Take rec_chinese_lite_train_v2.0.yml as an example
### Global
### Global
...
@@ -121,8 +131,9 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck
...
@@ -121,8 +131,9 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck
| drop_last | Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size | True | \ |
| drop_last | Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size | True | \ |
| num_workers | The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process | 8 | \ |
| num_workers | The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process | 8 | \ |
PaddleOCR currently supports 80 (except Chinese) language recognition. A multi-language configuration file template is
PaddleOCR currently supports 80 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
*[2.2 LOAD TRAINED MODEL AND CONTINUE TRAINING](#22-load-trained-model-and-continue-training)
*[2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
*[2.3 TRAINING WITH NEW BACKBONE](#23-training-with-new-backbone)
*[2.3 Training with New Backbone](#23-training-with-new-backbone)
-[3. EVALUATION AND TEST](#3-evaluation-and-test)
-[3. Evaluation and Test](#3-evaluation-and-test)
*[3.1 EVALUATION](#31-evaluation)
*[3.1 Evaluation](#31-evaluation)
*[3.2 TEST](#32-test)
*[3.2 Test](#32-test)
-[4. INFERENCE](#4-inference)
-[4. Inference](#4-inference)
-[2. FAQ](#2-faq)
-[5. FAQ](#2-faq)
# 1 DATA AND WEIGHTS PREPARATIO
## 1. Data and Weights Preparation
## 1.1 DATA PREPARATION
### 1.1 Data Preparation
The icdar2015 dataset contains train set which has 1000 images obtained with wearable cameras and test set which has 500 images obtained with wearable cameras. The icdar2015 can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
The icdar2015 dataset contains train set which has 1000 images obtained with wearable cameras and test set which has 500 images obtained with wearable cameras. The icdar2015 can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
...
@@ -59,7 +59,7 @@ The `points` in the dictionary represent the coordinates (x, y) of the four poin
...
@@ -59,7 +59,7 @@ The `points` in the dictionary represent the coordinates (x, y) of the four poin
If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
If you want to train PaddleOCR on other datasets, please build the annotation file according to the above format.
## 1.2 DOWNLOAD PRETRAINED MODEL
### 1.2 Download Pretrained Model
First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
First download the pretrained model. The detection model of PaddleOCR currently supports 3 backbones, namely MobileNetV3, ResNet18_vd and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.0/ppcls/modeling/architectures) to replace backbone according to your needs.
And the responding download link of backbone pretrain weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
And the responding download link of backbone pretrain weights can be found in (https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5%85%B6vd%E7%B3%BB%E5%88%97).
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
## 2.3 TRAINING WITH NEW BACKBONE
### 2.3 Training with New Backbone
The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones->
necks->heads).
necks->heads).
...
@@ -162,9 +162,9 @@ After adding the four-part modules of the network, you only need to configure th
...
@@ -162,9 +162,9 @@ After adding the four-part modules of the network, you only need to configure th
**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).
**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md).
# 3. EVALUATION AND TEST
## 3. Evaluation and Test
## 3.1 EVALUATION
### 3.1 Evaluation
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST and SAST model.
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST and SAST model.
## 3.2 TEST
### 3.2 Test
Test the detection result on a single image:
Test the detection result on a single image:
```shell
```shell
...
@@ -197,7 +197,7 @@ Test the detection result on all images in the folder:
...
@@ -197,7 +197,7 @@ Test the detection result on all images in the folder:
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
...
@@ -220,7 +220,7 @@ If it is other detection algorithms, such as the EAST, the det_algorithm paramet
...
@@ -220,7 +220,7 @@ If it is other detection algorithms, such as the EAST, the det_algorithm paramet
Q1: The prediction results of trained model and inference model are inconsistent?
Q1: The prediction results of trained model and inference model are inconsistent?
**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:
**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
...
@@ -10,37 +10,36 @@ For more details, please refer to the document [Classification Framework](https:
...
@@ -10,37 +10,36 @@ For more details, please refer to the document [Classification Framework](https:
Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, angle class, and the concatenation of them based on inference model.
Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, angle class, and the concatenation of them based on inference model.
-[CONVERT TRAINING MODEL TO INFERENCE MODEL](#CONVERT)
-[1. Convert Training Model to Inference Model](#CONVERT)
-[Convert detection model to inference model](#Convert_detection_model)
-[1.1 Convert Detection Model to Inference Model](#Convert_detection_model)
-[Convert recognition model to inference model](#Convert_recognition_model)
-[1.2 Convert Recognition Model to Inference Model](#Convert_recognition_model)
-[Convert angle classification model to inference model](#Convert_angle_class_model)
-[1.3 Convert Angle Classification Model to Inference Model](#Convert_angle_class_model)
-[TEXT DETECTION MODEL INFERENCE](#DETECTION_MODEL_INFERENCE)
-[2. Text Detection Model Inference](#DETECTION_MODEL_INFERENCE)
-[1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE](#LIGHTWEIGHT_DETECTION)
-[2.1 Lightweight Chinese Detection Model Inference](#LIGHTWEIGHT_DETECTION)
-[2. DB TEXT DETECTION MODEL INFERENCE](#DB_DETECTION)
-[2.2 DB Text Detection Model Inference](#DB_DETECTION)
-[3. EAST TEXT DETECTION MODEL INFERENCE](#EAST_DETECTION)
-[2.3 East Text Detection Model Inference](#EAST_DETECTION)
-[4. SAST TEXT DETECTION MODEL INFERENCE](#SAST_DETECTION)
-[2.4 Sast Text Detection Model Inference](#SAST_DETECTION)
-[5. Multilingual model inference](#Multilingual model inference)
-[TEXT RECOGNITION MODEL INFERENCE](#RECOGNITION_MODEL_INFERENCE)
-[3. Text Recognition Model Inference](#RECOGNITION_MODEL_INFERENCE)
-[1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_RECOGNITION)
-[3.1 Lightweight Chinese Text Recognition Model Reference](#LIGHTWEIGHT_RECOGNITION)
-[2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE](#CTC-BASED_RECOGNITION)
-[3.2 CTC-Based Text Recognition Model Inference](#CTC-BASED_RECOGNITION)
-[3. SRN-BASED TEXT RECOGNITION MODEL INFERENCE](#SRN-BASED_RECOGNITION)
-[3.3 SRN-Based Text Recognition Model Inference](#SRN-BASED_RECOGNITION)
-[3. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY](#USING_CUSTOM_CHARACTERS)
-[3.4 Text Recognition Model Inference Using Custom Characters Dictionary](#USING_CUSTOM_CHARACTERS)
-[4. MULTILINGUAL MODEL INFERENCE](MULTILINGUAL_MODEL_INFERENCE)
-[3.5 Multilingual Model Inference](#MULTILINGUAL_MODEL_INFERENCE)
-[ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
-[4. Angle Classification Model Inference](#ANGLE_CLASS_MODEL_INFERENCE)
-[1. ANGLE CLASSIFICATION MODEL INFERENCE](#ANGLE_CLASS_MODEL_INFERENCE)
-[TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION](#CONCATENATION)
-[5. Text Detection Angle Classification And Recognition Inference Concatenation](#CONCATENATION)
-[1. LIGHTWEIGHT CHINESE MODEL](#LIGHTWEIGHT_CHINESE_MODEL)
-[5.1 Lightweight Chinese Model](#LIGHTWEIGHT_CHINESE_MODEL)
-[2. OTHER MODELS](#OTHER_MODELS)
-[5.2 Other Models](#OTHER_MODELS)
<aname="CONVERT"></a>
<aname="CONVERT"></a>
## CONVERT TRAINING MODEL TO INFERENCE MODEL
## 1. Convert Training Model to Inference Model
<aname="Convert_detection_model"></a>
<aname="Convert_detection_model"></a>
### Convert detection model to inference model
### 1.1 Convert Detection Model to Inference Model
Download the lightweight Chinese detection model:
Download the lightweight Chinese detection model:
```
```
...
@@ -67,7 +66,7 @@ inference/det_db/
...
@@ -67,7 +66,7 @@ inference/det_db/
```
```
<aname="Convert_recognition_model"></a>
<aname="Convert_recognition_model"></a>
### Convert recognition model to inference model
### 1.2 Convert Recognition Model to Inference Model
Download the lightweight Chinese recognition model:
Download the lightweight Chinese recognition model:
```
```
...
@@ -95,7 +94,7 @@ inference/det_db/
...
@@ -95,7 +94,7 @@ inference/det_db/
```
```
<aname="Convert_angle_class_model"></a>
<aname="Convert_angle_class_model"></a>
### Convert angle classification model to inference model
### 1.3 Convert Angle Classification Model to Inference Model
Download the angle classification model:
Download the angle classification model:
```
```
...
@@ -122,13 +121,13 @@ inference/det_db/
...
@@ -122,13 +121,13 @@ inference/det_db/
<aname="DETECTION_MODEL_INFERENCE"></a>
<aname="DETECTION_MODEL_INFERENCE"></a>
## TEXT DETECTION MODEL INFERENCE
## 2. Text Detection Model Inference
The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model.
The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model.
Because EAST and DB algorithms are very different, when inference, it is necessary to **adapt the EAST text detection algorithm by passing in corresponding parameters**.
Because EAST and DB algorithms are very different, when inference, it is necessary to **adapt the EAST text detection algorithm by passing in corresponding parameters**.
<aname="LIGHTWEIGHT_DETECTION"></a>
<aname="LIGHTWEIGHT_DETECTION"></a>
### 1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE
### 2.1 Lightweight Chinese Detection Model Inference
For lightweight Chinese detection model inference, you can execute the following commands:
For lightweight Chinese detection model inference, you can execute the following commands:
First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)), you can use the following command to convert:
First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)), you can use the following command to convert:
...
@@ -184,7 +183,7 @@ The visualized text detection results are saved to the `./inference_results` fol
...
@@ -184,7 +183,7 @@ The visualized text detection results are saved to the `./inference_results` fol
**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.
**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.
<aname="EAST_DETECTION"></a>
<aname="EAST_DETECTION"></a>
### 3. EAST TEXT DETECTION MODEL INFERENCE
### 2.3 EAST TEXT DETECTION MODEL INFERENCE
First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)), you can use the following command to convert:
First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_east_v2.0_train.tar)), you can use the following command to convert:
...
@@ -205,7 +204,7 @@ The visualized text detection results are saved to the `./inference_results` fol
...
@@ -205,7 +204,7 @@ The visualized text detection results are saved to the `./inference_results` fol
<aname="SAST_DETECTION"></a>
<aname="SAST_DETECTION"></a>
### 4. SAST TEXT DETECTION MODEL INFERENCE
### 2.4 Sast Text Detection Model Inference
#### (1). Quadrangle text detection model (ICDAR2015)
#### (1). Quadrangle text detection model (ICDAR2015)
First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)), you can use the following command to convert:
First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)), you can use the following command to convert:
...
@@ -243,13 +242,13 @@ The visualized text detection results are saved to the `./inference_results` fol
...
@@ -243,13 +242,13 @@ The visualized text detection results are saved to the `./inference_results` fol
**Note**: SAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases.
**Note**: SAST post-processing locality aware NMS has two versions: Python and C++. The speed of C++ version is obviously faster than that of Python version. Due to the compilation version problem of NMS of C++ version, C++ version NMS will be called only in Python 3.5 environment, and python version NMS will be called in other cases.
<aname="RECOGNITION_MODEL_INFERENCE"></a>
<aname="RECOGNITION_MODEL_INFERENCE"></a>
## TEXT RECOGNITION MODEL INFERENCE
## 3. Text Recognition Model Inference
The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details.
The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details.
<aname="LIGHTWEIGHT_RECOGNITION"></a>
<aname="LIGHTWEIGHT_RECOGNITION"></a>
### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE
### 3.1 Lightweight Chinese Text Recognition Model Reference
For lightweight Chinese recognition model inference, you can execute the following commands:
For lightweight Chinese recognition model inference, you can execute the following commands:
...
@@ -269,7 +268,7 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658)
...
@@ -269,7 +268,7 @@ Predicts of ./doc/imgs_words_en/word_10.png:('PAIN', 0.9897658)
```
```
<aname="CTC-BASED_RECOGNITION"></a>
<aname="CTC-BASED_RECOGNITION"></a>
### 2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE
### 3.2 CTC-Based Text Recognition Model Inference
Taking CRNN as an example, we introduce the recognition model inference based on CTC loss. Rosetta and Star-Net are used in a similar way, No need to set the recognition algorithm parameter rec_algorithm.
Taking CRNN as an example, we introduce the recognition model inference based on CTC loss. Rosetta and Star-Net are used in a similar way, No need to set the recognition algorithm parameter rec_algorithm.
...
@@ -292,6 +291,7 @@ After executing the command, the recognition result of the above image is as fol
...
@@ -292,6 +291,7 @@ After executing the command, the recognition result of the above image is as fol
```bash
```bash
Predicts of ./doc/imgs_words_en/word_336.png:('super', 0.9999073)
Predicts of ./doc/imgs_words_en/word_336.png:('super', 0.9999073)
```
```
**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of lightweight Chinese recognition model in two aspects:
**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of lightweight Chinese recognition model in two aspects:
- The image resolution used in training is different: the image resolution used in training the above model is [3,32,100], while during our Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the inference stage is the image resolution used in training phase, that is [3, 32, 320]. Therefore, when running inference of the above English model here, you need to set the shape of the recognition image through the parameter `rec_image_shape`.
- The image resolution used in training is different: the image resolution used in training the above model is [3,32,100], while during our Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the inference stage is the image resolution used in training phase, that is [3, 32, 320]. Therefore, when running inference of the above English model here, you need to set the shape of the recognition image through the parameter `rec_image_shape`.
### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
### 3.4 Text Recognition Model Inference Using Custom Characters Dictionary
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
If the text dictionary is modified during training, when using the inference model to predict, you need to specify the dictionary path used by `--rec_char_dict_path`, and set `rec_char_type=ch`
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
If you need to predict other language models, when using inference model prediction, you need to specify the dictionary path used by `--rec_char_dict_path`. At the same time, in order to get the correct visualization results,
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
You need to specify the visual font path through `--vis_font_path`. There are small language fonts provided by default under the `doc/fonts` path, such as Korean recognition:
...
@@ -343,13 +344,7 @@ Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904)
...
@@ -343,13 +344,7 @@ Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904)
The following will introduce the angle classification model inference.
<aname="ANGLE_CLASS_MODEL_INFERENCE"></a>
### 1.ANGLE CLASSIFICATION MODEL INFERENCE
For angle classification model inference, you can execute the following commands:
For angle classification model inference, you can execute the following commands:
...
@@ -371,10 +366,10 @@ After executing the command, the prediction results (classification angle and sc
...
@@ -371,10 +366,10 @@ After executing the command, the prediction results (classification angle and sc
```
```
<aname="CONCATENATION"></a>
<aname="CONCATENATION"></a>
## TEXT DETECTION ANGLE CLASSIFICATION AND RECOGNITION INFERENCE CONCATENATION
## 5. Text Detection Angle Classification and Recognition Inference Concatenation
<aname="LIGHTWEIGHT_CHINESE_MODEL"></a>
<aname="LIGHTWEIGHT_CHINESE_MODEL"></a>
### 1. LIGHTWEIGHT CHINESE MODEL
### 5.1 Lightweight Chinese Model
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default.
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, the parameter `cls_model_dir` specifies the path to angle classification inference model and the parameter `rec_model_dir` specifies the path to identify the inference model. The parameter `use_angle_cls` is used to control whether to enable the angle classification model. The parameter `use_mp` specifies whether to use multi-process to infer `total_process_num` specifies process number when using multi-process. The parameter . The visualized recognition results are saved to the `./inference_results` folder by default.
After executing the command, the recognition result image is as follows:
After executing the command, the recognition result image is as follows:
![](../imgs_results/system_res_00018069.jpg)
![](../imgs_results/system_res_00018069.jpg)
<aname="OTHER_MODELS"></a>
<aname="OTHER_MODELS"></a>
### 2. OTHER MODELS
### 5.2 Other Models
If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model.
If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model.
@@ -7,15 +7,13 @@ This section contains two parts. Firstly, [PP-OCR Model Download](./models_list_
...
@@ -7,15 +7,13 @@ This section contains two parts. Firstly, [PP-OCR Model Download](./models_list_
Let's first understand some basic concepts.
Let's first understand some basic concepts.
-[INTRODUCTION ABOUT OCR](#introduction-about-ocr)
-[Introduction about OCR](#introduction-about-ocr)
*[Basic concepts of OCR detection model](#basic-concepts-of-ocr-detection-model)
*[Basic Concepts of OCR Detection Model](#basic-concepts-of-ocr-detection-model)
*[Basic concepts of OCR recognition model](#basic-concepts-of-ocr-recognition-model)
*[Basic Concepts of OCR Recognition Model](#basic-concepts-of-ocr-recognition-model)
*[PP-OCR model](#pp-ocr-model)
*[PP-OCR Model](#pp-ocr-model)
*[And a table of contents](#and-a-table-of-contents)
*[On the right](#on-the-right)
## 1. INTRODUCTION ABOUT OCR
## 1. Introduction about OCR
This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model.
This section briefly introduces the basic concepts of OCR detection model and recognition model, and introduces PaddleOCR's PP-OCR model.
...
@@ -24,7 +22,7 @@ OCR (Optical Character Recognition, Optical Character Recognition) is currently
...
@@ -24,7 +22,7 @@ OCR (Optical Character Recognition, Optical Character Recognition) is currently
OCR text recognition generally includes two parts, text detection and text recognition. The text detection module first uses detection algorithms to detect text lines in the image. And then the recognition algorithm to identify the specific text in the text line.
OCR text recognition generally includes two parts, text detection and text recognition. The text detection module first uses detection algorithms to detect text lines in the image. And then the recognition algorithm to identify the specific text in the text line.
### 1.1 Basic concepts of OCR detection model
### 1.1 Basic Concepts of OCR Detection Model
Text detection can locate the text area in the image, and then usually mark the word or text line in the form of a bounding box. Traditional text detection algorithms mostly extract features manually, which are characterized by fast speed and good effect in simple scenes, but the effect will be greatly reduced when faced with natural scenes. Currently, deep learning methods are mostly used.
Text detection can locate the text area in the image, and then usually mark the word or text line in the form of a bounding box. Traditional text detection algorithms mostly extract features manually, which are characterized by fast speed and good effect in simple scenes, but the effect will be greatly reduced when faced with natural scenes. Currently, deep learning methods are mostly used.
...
@@ -34,14 +32,14 @@ Text detection algorithms based on deep learning can be roughly divided into the
...
@@ -34,14 +32,14 @@ Text detection algorithms based on deep learning can be roughly divided into the
3. Hybrid target detection and segmentation method.
3. Hybrid target detection and segmentation method.
### 1.2 Basic concepts of OCR recognition model
### 1.2 Basic Concepts of OCR Recognition Model
The input of the OCR recognition algorithm is generally text lines images which has less background information, and the text information occupies the main part. The recognition algorithm can be divided into two types of algorithms:
The input of the OCR recognition algorithm is generally text lines images which has less background information, and the text information occupies the main part. The recognition algorithm can be divided into two types of algorithms:
1. CTC-based method. The text prediction module of the recognition algorithm is based on CTC, and the commonly used algorithm combination is CNN+RNN+CTC. There are also some algorithms that try to add transformer modules to the network and so on.
1. CTC-based method. The text prediction module of the recognition algorithm is based on CTC, and the commonly used algorithm combination is CNN+RNN+CTC. There are also some algorithms that try to add transformer modules to the network and so on.
2. Attention-based method. The text prediction module of the recognition algorithm is based on Attention, and the commonly used algorithm combination is CNN+RNN+Attention.
2. Attention-based method. The text prediction module of the recognition algorithm is based on Attention, and the commonly used algorithm combination is CNN+RNN+Attention.
### 1.3 PP-OCR model
### 1.3 PP-OCR Model
PaddleOCR integrates many OCR algorithms, text detection algorithms include DB, EAST, SAST, etc., text recognition algorithms include CRNN, RARE, StarNet, Rosetta, SRN and other algorithms.
PaddleOCR integrates many OCR algorithms, text detection algorithms include DB, EAST, SAST, etc., text recognition algorithms include CRNN, RARE, StarNet, Rosetta, SRN and other algorithms.
@@ -36,4 +36,4 @@ If you getting this error `OSError: [WinError 126] The specified module could no
...
@@ -36,4 +36,4 @@ If you getting this error `OSError: [WinError 126] The specified module could no
Please try to download Shapely whl file using [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Please try to download Shapely whl file using [http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
Reference: [Solve shapely installation on windows](
Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
@@ -39,7 +39,7 @@ pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
...
@@ -39,7 +39,7 @@ pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
<aname="21-use-by-command-line"></a>
<aname="21-use-by-command-line"></a>
### 2.1 Use by command line
### 2.1 Use by Command Line
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
...
@@ -95,7 +95,7 @@ If you do not use the provided test image, you can replace the following `--imag
...
@@ -95,7 +95,7 @@ If you do not use the provided test image, you can replace the following `--imag
['PAIN', 0.990372]
['PAIN', 0.990372]
```
```
If you need to use the 2.0 model, please specify the parameter `--version 2.0`, paddleocr uses the 2.1 model by default. More whl package usage can be found in [whl package](./whl_en.md)
If you need to use the 2.0 model, please specify the parameter `--version PP-OCR`, paddleocr uses the 2.1 model by default(`--versioin PP-OCRv2`). More whl package usage can be found in [whl package](./whl_en.md)
If you want to use your own data for training, please refer to the following to organize your data.
If you want to use your own data for training, please refer to the following to organize your data.
...
@@ -85,7 +84,7 @@ Similar to the training set, the test set also needs to be provided a folder con
...
@@ -85,7 +84,7 @@ Similar to the training set, the test set also needs to be provided a folder con
```
```
<aname="Dataset_download"></a>
<aname="Dataset_download"></a>
### 1.2 Dataset download
### 1.2 Dataset Download
- ICDAR2015
- ICDAR2015
...
@@ -167,14 +166,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
...
@@ -167,14 +166,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<aname="Add_space_category"></a>
<aname="Add_space_category"></a>
### 1.4 Add space category
### 1.4 Add Space Category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch**
**Note: use_space_char only takes effect when character_type=ch**
<aname="TRAINING"></a>
<aname="TRAINING"></a>
## 2 TRAINING
## 2.Training
<aname="Data_Augmentation"></a>
<aname="Data_Augmentation"></a>
### 2.1 Data Augmentation
### 2.1 Data Augmentation
...
@@ -363,7 +362,7 @@ Eval:
...
@@ -363,7 +362,7 @@ Eval:
<aname="EVALUATION"></a>
<aname="EVALUATION"></a>
## 3 EVALUATION
## 3. Evalution
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
-[3. Data and vertical scenes](#2-data-and-vertical-scenes)
-[3. Data and Vertical Scenes](#2-data-and-vertical-scenes)
*[3.1 Training data](#21-training-data)
*[3.1 Training Data](#21-training-data)
*[3.2 Vertical scene](#22-vertical-scene)
*[3.2 Vertical Scene](#22-vertical-scene)
*[3.3 Build your own data set](#23-build-your-own-data-set)
*[3.3 Build Your Own Dataset](#23-build-your-own-data-set)
*[4. FAQ](#3-faq)
*[4. FAQ](#3-faq)
...
@@ -18,7 +18,7 @@ At the same time, it will briefly introduce the components of the PaddleOCR mode
...
@@ -18,7 +18,7 @@ At the same time, it will briefly introduce the components of the PaddleOCR mode
<aname="1-Yml-Configuration"></a>
<aname="1-Yml-Configuration"></a>
## 1. Yml configuration
## 1. Yml Configuration
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify.
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify.
...
@@ -26,12 +26,12 @@ For the complete configuration file description, please refer to [Configuration
...
@@ -26,12 +26,12 @@ For the complete configuration file description, please refer to [Configuration
<aname="1-basic-concepts"></a>
<aname="1-basic-concepts"></a>
## 2. Basic concepts
## 2. Basic Concepts
In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference:
In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference:
<aname="11-learning-rate"></a>
<aname="11-learning-rate"></a>
### 2.1 Learning rate
### 2.1 Learning Rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
...
@@ -68,7 +68,7 @@ Optimizer:
...
@@ -68,7 +68,7 @@ Optimizer:
factor: 2.0e-05
factor: 2.0e-05
```
```
<aname="13-evaluation-indicators-"></a>
<aname="13-evaluation-indicators-"></a>
### 2.3 Evaluation indicators
### 2.3 Evaluation Indicators
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
...
@@ -78,11 +78,11 @@ Optimizer:
...
@@ -78,11 +78,11 @@ Optimizer:
<aname="2-data-and-vertical-scenes"></a>
<aname="2-data-and-vertical-scenes"></a>
## 3. Data and vertical scenes
## 3. Data and Vertical Scenes
<aname="21-training-data"></a>
<aname="21-training-data"></a>
### 3.1 Training data
### 3.1 Training Data
The current open source models, data sets and magnitudes are as follows:
The current open source models, data sets and magnitudes are as follows:
...
@@ -99,14 +99,14 @@ Among them, the public data sets are all open source, users can search and downl
...
@@ -99,14 +99,14 @@ Among them, the public data sets are all open source, users can search and downl
<aname="22-vertical-scene"></a>
<aname="22-vertical-scene"></a>
### 3.2 Vertical scene
### 3.2 Vertical Scene
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
<aname="23-build-your-own-data-set"></a>
<aname="23-build-your-own-data-set"></a>
### 3.3 Build your own data set
### 3.3 Build Your Own Dataset
There are several experiences for reference when constructing the data set:
There are several experiences for reference when constructing the data set: