The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results](#超轻量级中文OCR效果展示) and [General Chinese OCR results](#通用中文OCR效果展示).
#### 1.环境配置
#### 1. Environment configuration
请先参考[快速安装](./doc/installation.md)配置PaddleOCR运行环境。
Please see [Quick installation](./doc/installation.md)
#### 2.inference模型下载
#### 2. Download inference models
#### (1)超轻量级中文OCR模型下载
#### (1) Download Ultra-lightweight Chinese OCR models
```
mkdir inference && cd inference
# 下载超轻量级中文OCR模型的检测模型并解压
# Download the detection part of the Ultra-lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压
# Download the recognition part of the Ultra-lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
cd ..
```
#### (2)通用中文OCR模型下载
#### (2) Download General Chinese OCR models
```
mkdir inference && cd inference
# 下载通用中文OCR模型的检测模型并解压
# Download the detection part of the general Chinese OCR model and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar && tar xf ch_det_r50_vd_db_infer.tar
# 下载通用中文OCR模型的识别模型并解压
# Download the recognition part of the generic Chinese OCR model and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && tar xf ch_rec_r34_vd_crnn_infer.tar
The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default.
```
# 设置PYTHONPATH环境变量
# Set PYTHONPATH environment variable
export PYTHONPATH=.
# 预测image_dir指定的单张图像
# Predict a single image by specifying image path to image_dir
To run inference of the Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows:
```
# 预测image_dir指定的单张图像
# Predict a single image by specifying image path to image_dir
*Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
For the training guide and use of PaddleOCR text detection algorithm, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md)
## 文本识别算法
## Text recognition algorithm
PaddleOCR开源的文本识别算法列表:
PaddleOCR open-source text recognition algorithm list:
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
Please refer to the document for training guide and use of PaddleOCR text recognition algorithm [Text recognition model training/evaluation/prediction](./doc/recognition.md)
| -c | ALL | Specify configuration file | None | **Please refer to the parameter introduction for configuration file usage** |
| -o | ALL | Set the parameter in the configuration file | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` |
## 配置文件 Global 参数介绍
## Introduction to Global Parameters of Configuration File
| save_inference_dir | inference model 保存路径 | None | 用于保存inference model |
## 配置文件 Reader 系列参数介绍
以 `rec_chinese_reader.yml` 为例
| 字段 | 用途 | 默认值 | 备注 |
| algorithm | Select algorithm to use | Synchronize with configuration file | For selecting model, please refer to the supported model [list](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README.md) |
| use_gpu | Set using GPU or not | true | \ |
| epoch_num | Maximum training epoch number | 3000 | \ |
| reader_function | Select data reading method | ppocr.data.rec.dataset_traversal,SimpleReader | Support two data reading methods: SimpleReader / LMDBReader |
| num_workers | Set the number of data reading threads | 8 | \ |
# How to make your own custom ultra-lightweight models?
生产自定义的超轻量模型可分为三步:训练文本检测模型、训练文本识别模型、模型串联预测。
The process of making a customized ultra-lightweight models can be divided into three steps: training text detection model, training text recognition model, and make prediction with trained models.
PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks. Select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model :
For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection.md)
PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks, MobileNetV3 and ResNet34_vd, and select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network:
For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition.md)
PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages of text detection, detection frame correction, text recognition, and score filtering to output the text position and recognition results, and at the same time, the results can be selected for visualization.
When performing prediction, you need to specify the path of a single image or a image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default.
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes scattered annotation files into separate annotation files. You can download by wget:
The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point in the upper left corner.
`transcription` represents the text of the current text box, and this information is not needed in the text detection task.
If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format。
First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
You can also use the -o parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
Run the following code to calculate the evaluation index based on the test result file specified by save_res_path in the configuration file det_db_mv3.yml
When evaluating, set post-processing parameters box_thresh=0.6, unclip_ratio=1.5, use different data sets, different models for training, these two parameters can be adjusted for optimization.
The model parameters during training are saved in the Global.save_model_dir directory by default. When evaluating indicators, you need to set Global.checkpoints to point to the saved parameter file.
Note: if docker pull is too slow, you can manually download and load docker according to the following steps. Taking cuda9 docker as an example, using cuda10 docker only needs to change cuda9 to cuda10
```
# 下载CUDA9 docker的压缩文件,并解压
# Download the CUDA9 docker compressed file and unzip it
For more version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
# Note: Code cloud hosting code may not be able to synchronize this github project update in real time, there is a delay of 3~5 days, please use the recommended method first.
The default storage path for training data is `PaddleOCR/train_data`, if you already have a data set on your disk, just create a soft link to the data set directory:
If you do not have a data set locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required by benchmark
*使用自己数据集:
*Use your own dataset:
若您希望使用自己的数据进行训练,请参考下文组织您的数据。
If you want to use your own data for training, please refer to the following to organize your data.
Similar to the training set, the test set also needs to provide a folder containing all pictures (test) and a rec_gt_test.txt. The structure of the test set is as follows:
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
Therefore, the dictionary needs to contain all the characters that you want to be recognized correctly. {word_dict_name}.txt needs to be written in the following format and saved in the `utf-8` encoding format:
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
First download the pretrain model, you can download the trained model to finetune on the icdar2015 data
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter. By default, the best acc model is saved as `output/rec_CRNN/best_accuracy` during the evaluation process.
如果验证集很大,测试将会比较耗时,建议减少评估次数,或训练完再进行评估。
If the verification set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.
*Tip: You can use the `-c` parameter to select multiple model configurations under the `configs/rec/` path for training. The recognition algorithms supported by PaddleOCR are:
| 配置文件 | 算法名称 | backbone | trans | seq | pred |
| Configuration file | Algorithm name | backbone | trans | seq | pred |
For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the effect of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:
以 `rec_mv3_none_none_ctc.yml` 为例:
Take `rec_mv3_none_none_ctc.yml` as an example:
```
Global:
...
# 修改 image_shape 以适应长文本
# Modify image_shape to fit long text
image_shape: [3, 32, 320]
...
# 修改字符类型
# Modify character type
character_type: ch
# 添加自定义字典,如修改字典请将路径指向新字典
# Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary
The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model through `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`,
You can use the following command to predict the Chinese model.