The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results](#超轻量级中文OCR效果展示) and [General Chinese OCR results](#通用中文OCR效果展示).
#### 1.环境配置
#### 1. Environment configuration
请先参考[快速安装](./doc/installation.md)配置PaddleOCR运行环境。
Please see [Quick installation](./doc/installation.md)
#### 2.inference模型下载
#### 2. Download inference models
#### (1)超轻量级中文OCR模型下载
#### (1) Download Ultra-lightweight Chinese OCR models
```
mkdir inference && cd inference
# 下载超轻量级中文OCR模型的检测模型并解压
# Download the detection part of the Ultra-lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压
# Download the recognition part of the Ultra-lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
cd ..
```
#### (2)通用中文OCR模型下载
#### (2) Download General Chinese OCR models
```
mkdir inference && cd inference
# 下载通用中文OCR模型的检测模型并解压
# Download the detection part of the general Chinese OCR model and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar
# 下载通用中文OCR模型的识别模型并解压
# Download the recognition part of the generic Chinese OCR model and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar
The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default.
```
# 设置PYTHONPATH环境变量
# Set PYTHONPATH environment variable
export PYTHONPATH=.
# 预测image_dir指定的单张图像
# Predict a single image by specifying image path to image_dir
To run inference of the Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
```
# 预测image_dir指定的单张图像
# Predict a single image by specifying image path to image_dir
*Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
For the training guide and use of PaddleOCR text detection algorithm, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md)
## 文本识别算法
## Text recognition algorithm
PaddleOCR开源的文本识别算法列表:
PaddleOCR open-source text recognition algorithm list:
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
Please refer to the document for training guide and use of PaddleOCR text recognition algorithm [Text recognition model training/evaluation/prediction](./doc/recognition.md)
| -c | ALL | Specify configuration file | None | **Please refer to the parameter introduction for configuration file usage** |
| -o | ALL | Set the parameter in the configuration file | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` |
## 配置文件 Global 参数介绍
## Introduction to Global Parameters of Configuration File
| save_inference_dir | inference model 保存路径 | None | 用于保存inference model |
## 配置文件 Reader 系列参数介绍
以 `rec_chinese_reader.yml` 为例
| 字段 | 用途 | 默认值 | 备注 |
| algorithm | Select algorithm to use | Synchronize with configuration file | For selecting model, please refer to the supported model [list](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README.md) |
| use_gpu | Set using GPU or not | true | \ |
| epoch_num | Maximum training epoch number | 3000 | \ |
| reader_function | Select data reading method | ppocr.data.rec.dataset_traversal,SimpleReader | Support two data reading methods: SimpleReader / LMDBReader |
| num_workers | Set the number of data reading threads | 8 | \ |
# How to make your own custom ultra-lightweight models?
生产自定义的超轻量模型可分为三步:训练文本检测模型、训练文本识别模型、模型串联预测。
The process of making a customized ultra-lightweight models can be divided into three steps: training text detection model, training text recognition model, and make prediction with trained models.
PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks. Select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model :
For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection.md)
PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks, MobileNetV3 and ResNet34_vd, and select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network:
For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition.md)
PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages of text detection, detection frame correction, text recognition, and score filtering to output the text position and recognition results, and at the same time, the results can be selected for visualization.
When performing prediction, you need to specify the path of a single image or a image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default.
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes scattered annotation files into separate annotation files. You can download by wget:
The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point in the upper left corner.
`transcription` represents the text of the current text box, and this information is not needed in the text detection task.
If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format。
First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
You can also use the -o parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
Run the following code to calculate the evaluation index based on the test result file specified by save_res_path in the configuration file det_db_mv3.yml
When evaluating, set post-processing parameters box_thresh=0.6, unclip_ratio=1.5, use different data sets, different models for training, these two parameters can be adjusted for optimization.
The model parameters during training are saved in the Global.save_model_dir directory by default. When evaluating indicators, you need to set Global.checkpoints to point to the saved parameter file.
inference model (model saved by fluid.io.save_inference_model)
It is generally the solidified model saved after the model training is completed, which is mostly used to predict deployment.
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting deployment and accelerating reasoning, is flexible and convenient, and is suitable for integration with actual systems. For more detailed introduction, please refer to the document [Classification prediction framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html).
Next, we first introduce how to convert the trained model into an inference model, and then we will introduce text detection, text recognition, and the connection of the two based on prediction engine inference.
## 训练模型转inference模型
### 检测模型转inference模型
## Training model to inference model
### Detection model to inference model
下载超轻量级中文检测模型:
Download the super lightweight Chinese detection model:
```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/
The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command:
When transferring an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the Global.checkpoints and Global.save_inference_dir parameters in the configuration file.
Global.checkpoints points to the model parameter file saved in training, and Global.save_inference_dir is the directory where the generated inference model is to be saved.
After the conversion is successful, there are two files in the `save_inference_dir` directory:
```
inference/det_db/
└─ model 检测inference模型的program文件
└─ params 检测inference模型的参数文件
└─ model Check the program file of inference model
└─ params Check the parameter file of the inference model
```
### 识别模型转inference模型
### Recognition model to inference model
下载超轻量中文识别模型:
Download the ultra-lightweight Chinese recognition model:
```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/
```
识别模型转inference模型与检测的方式相同,如下:
The identification model is converted to the inference model in the same way as the detection, as follows:
If you are a model trained on your own data set and you have adjusted the dictionary file of Chinese characters, please pay attention to whether the character_dict_path in the configuration file is the required dictionary file.
转换成功后,在目录下有两个文件:
After the conversion is successful, there are two files in the directory:
```
/inference/rec_crnn/
└─ model 识别inference模型的program文件
└─ params 识别inference模型的参数文件
└─ model Identify the program file of the inference model
└─ params Identify the parameter file of the inference model
The following will introduce the ultra-lightweight Chinese detection model reasoning, DB text detection model reasoning and EAST text detection model reasoning. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters.
### 1.超轻量中文检测模型推理
### 1.Ultra-lightweight Chinese detection model inference
超轻量中文检测模型推理,可以执行如下命令:
Super lightweight Chinese detection model inference, you can execute the following commands:
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
By setting the size of the parameter det_max_side_len, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command:
First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert:
# Set the yml configuration file of the training algorithm after -c
# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# The Global.save_inference_dir parameter sets the address where the converted model will be saved.
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection effect on Chinese text images.
First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English data set as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert:
# Set the yml configuration file of the training algorithm after -c
# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# The Global.save_inference_dir parameter sets the address where the converted model will be saved.
EAST text detection model inference, you need to set the parameter det_algorithm, specify the detection algorithm type as EAST, you can execute the following command:
The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows:
**Note**: The Python version of NMS used in EAST post-processing in this codebase, so the prediction speed is time-consuming. If you use the C++ version, there will be a significant speedup.
The following will introduce the ultra-lightweight Chinese recognition model reasoning and CTC loss-based recognition model reasoning. **The recognition model reasoning based on Attention loss is still being debugged**. For Chinese text recognition, it is recommended to prefer the recognition model based on CTC loss. In practice, it is also found that the effect based on Attention loss is not as good as the recognition model based on CTC loss.
### 1.超轻量中文识别模型推理
### 1.Ultra-lightweight Chinese recognition model inference
超轻量中文识别模型推理,可以执行如下命令:
Super lightweight Chinese recognition model inference, you can execute the following commands:
Taking STAR-Net as an example, we introduce the identification model reasoning based on CTC loss. CRNN and Rosetta are used in a similar way, without setting the recognition algorithm parameter rec_algorithm.
First, convert the model saved in the STAR-Net text recognition training process into an inference model. Based on Resnet34_vd backbone network, using MJSynth and SynthText two English text recognition synthetic data set training
The example of the model ([model download address](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar))
# Set the yml configuration file of the training algorithm after -c
# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# The Global.save_inference_dir parameter sets the address where the converted model will be saved.
**Note**:Since the above model refers to [DTRB] (https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of ultra-lightweight Chinese recognition model in two aspects:
-The image resolution used in training is different, and the image resolution used in training the above model is [3,32,100], While the Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the predictive inference program is the image resolution used in training Chinese, that is [3, 32, 320]. Therefore, when reasoning the above English model here, you need to set the shape of the recognition image through the parameter rec_image_shape.
-Character list, the experiment in the DTRB paper is only for 26 lowercase English mothers and 10 numbers, a total of 36 characters. All upper and lower case characters are converted to lower case characters, and characters not in the above list are ignored and considered as spaces. Therefore, no character dictionary is entered here, but a dictionary is generated by the following command. Therefore, the parameter rec_char_type needs to be set during inference, which is specified as "en" in English.
When performing prediction, you need to specify the path of a single image or a collection of images through the parameter image_dir, the parameter det_model_dir specifies the path to detect the inference model, and the parameter rec_model_dir specifies the path to identify the inference model. The visual recognition results are saved to the ./inference_results folder by default.
If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model, the following gives the EAST text detection and STAR-Net text recognition execution commands:
Note: if docker pull is too slow, you can manually download and load docker according to the following steps. Taking cuda9 docker as an example, using cuda10 docker only needs to change cuda9 to cuda10
```
# 下载CUDA9 docker的压缩文件,并解压
# Download the CUDA9 docker compressed file and unzip it
For more version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
# Note: Code cloud hosting code may not be able to synchronize this github project update in real time, there is a delay of 3~5 days, please use the recommended method first.
The default storage path for training data is `PaddleOCR/train_data`, if you already have a data set on your disk, just create a soft link to the data set directory:
If you do not have a data set locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required by benchmark
*使用自己数据集:
*Use your own dataset:
若您希望使用自己的数据进行训练,请参考下文组织您的数据。
If you want to use your own data for training, please refer to the following to organize your data.
Similar to the training set, the test set also needs to provide a folder containing all pictures (test) and a rec_gt_test.txt. The structure of the test set is as follows:
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
Therefore, the dictionary needs to contain all the characters that you want to be recognized correctly. {word_dict_name}.txt needs to be written in the following format and saved in the `utf-8` encoding format:
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
First download the pretrain model, you can download the trained model to finetune on the icdar2015 data
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter. By default, the best acc model is saved as `output/rec_CRNN/best_accuracy` during the evaluation process.
如果验证集很大,测试将会比较耗时,建议减少评估次数,或训练完再进行评估。
If the verification set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.
*Tip: You can use the `-c` parameter to select multiple model configurations under the `configs/rec/` path for training. The recognition algorithms supported by PaddleOCR are:
| 配置文件 | 算法名称 | backbone | trans | seq | pred |
| Configuration file | Algorithm name | backbone | trans | seq | pred |
For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the effect of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
以 `rec_mv3_none_none_ctc.yml` 为例:
Take `rec_mv3_none_none_ctc.yml` as an example:
```
Global:
...
# 修改 image_shape 以适应长文本
# Modify image_shape to fit long text
image_shape: [3, 32, 320]
...
# 修改字符类型
# Modify character type
character_type: ch
# 添加自定义字典,如修改字典请将路径指向新字典
# Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary
The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model through `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`,
You can use the following command to predict the Chinese model.