提交 f3172214 编写于 作者: 张欣-男's avatar 张欣-男

Merge remote-tracking branch 'upstream/develop' into zxdev

English | [简体中文](README.md)
## Introduction
## INTRODUCTION
PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice.
**Recent updates**
- 2020.6.8 Add [dataset](./doc/doc_en/datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience
- 2020.5.30 Provide lightweight Chinese OCR online experience
- 2020.5.30 Model prediction and training supported on Windows system
- [more](./doc/doc_en/update_en.md)
## Features
- Ultra-lightweight Chinese OCR model, total model size is only 8.6M
## FEATURES
- Lightweight Chinese OCR model, total model size is only 8.6M
- Single model supports Chinese and English numbers combination recognition, vertical text recognition, long text recognition
- Detection model DB (4.1M) + recognition model CRNN (4.5M)
- Various text detection algorithms: EAST, DB
......@@ -22,34 +22,34 @@ PaddleOCR aims to create a rich, leading, and practical OCR tools that help user
|Model Name|Description |Detection Model link|Recognition Model link|
|-|-|-|-|
|chinese_db_crnn_mobile|Ultra-lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|chinese_db_crnn_mobile|lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr
**You can also quickly experience the Ultra-lightweight Chinese OCR and General Chinese OCR models as follows:**
**You can also quickly experience the lightweight Chinese OCR and General Chinese OCR models as follows:**
## **Ultra-lightweight Chinese OCR and General Chinese OCR inference**
## **LIGHTWEIGHT CHINESE OCR AND GENERAL CHINESE OCR INFERENCE**
![](doc/imgs_results/11.jpg)
The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results](#Ultra-lightweight-Chinese-OCR-results) and [General Chinese OCR results](#General-Chinese-OCR-results).
The picture above is the result of our lightweight Chinese OCR model. For more testing results, please see the end of the article [lightweight Chinese OCR results](#lightweight-Chinese-OCR-results) and [General Chinese OCR results](#General-Chinese-OCR-results).
#### 1. Environment configuration
#### 1. ENVIRONMENT CONFIGURATION
Please see [Quick installation](./doc/doc_en/installation_en.md)
#### 2. Download inference models
#### 2. DOWNLOAD INFERENCE MODELS
#### (1) Download Ultra-lightweight Chinese OCR models
#### (1) Download lightweight Chinese OCR models
*If wget is not installed in the windows system, you can copy the link to the browser to download the model. After model downloaded, unzip it and place it in the corresponding directory*
```
mkdir inference && cd inference
# Download the detection part of the Ultra-lightweight Chinese OCR and decompress it
# Download the detection part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# Download the recognition part of the Ultra-lightweight Chinese OCR and decompress it
# Download the recognition part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
cd ..
```
......@@ -63,7 +63,7 @@ wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && t
cd ..
```
#### 3. Single image and batch image prediction
#### 3. SINGLE IMAGE AND BATCH PREDICTION
The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detection model, and the parameter `rec_model_dir` specifies the path to the recognition model. The visual prediction results are saved to the `./inference_results` folder by default.
......@@ -87,14 +87,14 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_mode
For more text detection and recognition models, please refer to the document [Inference](./doc/doc_en/inference_en.md)
## Documentation
## DOCUMENTATION
- [Quick installation](./doc/doc_en/installation_en.md)
- [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
- [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
- [Inference](./doc/doc_en/inference_en.md)
- [Dataset](./doc/doc_en/datasets_en.md)
## Text detection algorithm
## TEXT DETECTION ALGORITHM
PaddleOCR open source text detection algorithms list:
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
......@@ -113,14 +113,14 @@ On the ICDAR2015 dataset, the text detection result is as follows:
For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for Chinese detection task are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|Ultra-lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|General Chinese OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|
* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.
For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
## Text recognition algorithm
## TEXT RECOGNITION ALGORITHM
PaddleOCR open-source text recognition algorithms list:
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
......@@ -145,16 +145,16 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the Chinese model. The related configuration and pre-trained models are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|Ultra-lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|
Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
## End-to-end OCR algorithm
## END-TO-END OCR ALGORITHM
- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon)
<a name="Ultra-lightweight Chinese OCR results"></a>
## Ultra-lightweight Chinese OCR results
<a name="lightweight Chinese OCR results"></a>
## LIGHTWEIGHT CHINESE OCR RESULTS
![](doc/imgs_results/1.jpg)
![](doc/imgs_results/7.jpg)
![](doc/imgs_results/12.jpg)
......@@ -189,11 +189,12 @@ Please refer to the document for training guide and use of PaddleOCR text recogn
[more](./doc/doc_en/FAQ_en.md)
## Welcome to the PaddleOCR technical exchange group
WeChat: paddlehelp . remarks OCR, the assistant will invite you to join the group~
## WELCOME TO THE PaddleOCR TECHNICAL EXCHANGE GROUP
WeChat: paddlehelp, note OCR, our assistant will get you into the group~
<img src="./doc/paddlehelp.jpg" width = "200" height = "200" />
## References
## REFERENCES
```
1. EAST:
@inproceedings{zhou2017east,
......@@ -248,10 +249,10 @@ WeChat: paddlehelp . remarks OCR, the assistant will invite you to join the grou
}
```
## License
## LICENSE
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
## Contribution
## CONTRIBUTION
We welcome all the contributions to PaddleOCR and appreciate for your feedback very much.
- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) for contributing the English documentation.
......
......@@ -10,7 +10,7 @@ PaddleOCR 支持两种数据格式: `lmdb` 用于训练公开数据,调试算
训练数据的默认存储路径是 `PaddleOCR/train_data`,如果您的磁盘上已有数据集,只需创建软链接至数据集目录:
```
ln -sf <path/to/dataset> <path/to/paddle_detection>/train_data/dataset
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
```
......
......@@ -47,3 +47,7 @@ At present, the open source model, dataset and magnitude are as follows:
10. **Error in using the model with TPS module for prediction**
Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](108) != Grid dimension[2](100)
Solution:TPS does not support variable shape. Please set --rec_image_shape='3,32,100' and --rec_char_type='en'
11. **Custom dictionary used during training, the recognition results show that words do not appear in the dictionary**
The used custom dictionary path is not set when making prediction. The solution is setting parameter `rec_char_dict_path` to the corresponding dictionary file.
\ No newline at end of file
# Optional parameters list
# OPTIONAL PARAMETERS LIST
The following list can be viewed via `--help`
......@@ -8,7 +8,7 @@ The following list can be viewed via `--help`
| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` |
## Introduction to Global Parameters of Configuration File
## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE
Take `rec_chinese_lite_train.yml` as an example
......@@ -35,7 +35,7 @@ Take `rec_chinese_lite_train.yml` as an example
| checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption |
| save_inference_dir | path to save model for inference | None | Use to save inference model |
## Introduction to Reader parameters of Configuration file
## INTRODUCTION TO READER PARAMETERS OF CONFIGURATION FILE
Take `rec_chinese_reader.yml` as an example:
......@@ -47,7 +47,7 @@ Take `rec_chinese_reader.yml` as an example:
| label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ |
| infer_img | Result folder path | ./infer_img | \|
## Introduction to Optimizer parameters of Configuration file
## INTRODUCTION TO OPTIMIZER PARAMETERS OF CONFIGURATION FILE
Take `rec_icdar15_train.yml` as an example:
......
# How to make your own ultra-lightweight OCR models?
# HOW TO MAKE YOUR OWN LIGHTWEIGHT OCR MODEL?
The process of making a customized ultra-lightweight OCR models can be divided into three steps: training text detection model, training text recognition model, and concatenate the predictions from previous steps.
## step1: Train text detection model
## STEP1: TRAIN TEXT DETECTION MODEL
PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks, select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model :
```
......@@ -10,7 +10,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml
```
For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection_en.md)
## step2: Train text recognition model
## STEP2: TRAIN TEXT RECOGNITION MODEL
PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks: MobileNetV3 and ResNet34_vd, select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network:
```
......@@ -18,7 +18,7 @@ python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml
```
For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition_en.md)
## step3: Concatenate predictions
## STEP3: CONCATENATE PREDICTIONS
PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages: text detection, text rectification, text recognition, and score filtering to output the text position and recognition results, and at the same time, you can choose to visualize the results.
......
## Dataset
## DATASET
This is a collection of commonly used Chinese datasets, which is being updated continuously. You are welcome to contribute to this list~
- [ICDAR2019-LSVT](#ICDAR2019-LSVT)
- [ICDAR2017-RCTW-17](#ICDAR2017-RCTW-17)
......@@ -13,8 +13,11 @@ In addition to opensource data, users can also use synthesis tools to synthesize
- **Data sources**:https://ai.baidu.com/broad/introduction?dataset=lsvt
- **Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure:
![](../datasets/LSVT_1.jpg)
(a) Fully labeled data
![](../datasets/LSVT_2.jpg)
(b) Weakly labeled data
- **Download link**:https://ai.baidu.com/broad/download?dataset=lsvt
......
# Text detection
# TEXT DETECTION
This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
## Data preparation
## DATA PREPARATION
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.
Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
......@@ -27,13 +27,13 @@ The provided annotation file format is as follow:
" Image file name Image annotation information encoded by json.dumps"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]
```
The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.
The image annotation after json.dumps() encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.
`transcription` represents the text of the current text box, and this information is not needed in the text detection task.
If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format.
## Quickstart training
## TRAINING
First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
```
......@@ -56,7 +56,7 @@ tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models
```
**Start training**
**START TRAINING**
```
python3 tools/train.py -c configs/det/det_mv3_db.yml
```
......@@ -80,7 +80,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you
**Note**:The priority of Global.checkpoints is higher than the priority of Global.pretrain_weights, that is, when two parameters are specified at the same time, the model specified by Global.checkpoints will be loaded first. If the model path specified by Global.checkpoints is wrong, the one specified by Global.pretrain_weights will be loaded.
## Evaluation Indicator
## EVALUATION
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean.
......@@ -100,7 +100,7 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./ou
* Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.
## Test detection result
## TEST DETECTION RESULT
Test the detection result on a single image:
```
......
# Prediction from inference model
# PREDICTION FROM INFERENCE MODEL
The inference model (the model saved by fluid.io.save_inference_model) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification prediction framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html).
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html).
Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model.
## Training model to inference model
### Detection model to inference model
## CONVERT TRAINING MODEL TO INFERENCE MODEL
### Convert detection model to inference model
Download the ultra-lightweight Chinese detection model:
Download the lightweight Chinese detection model:
```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/
```
......@@ -29,9 +29,9 @@ inference/det_db/
└─ params Check the parameter file of the inference model
```
### Recognition model to inference model
### Convert recognition model to inference model
Download the ultra-lightweight Chinese recognition model:
Download the lightweight Chinese recognition model:
```
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/
```
......@@ -51,13 +51,13 @@ After the conversion is successful, there are two files in the directory:
└─ params Identify the parameter files of the inference model
```
## Text detection model inference
## TEXT DETECTION MODEL INFERENCE
The following will introduce the ultra-lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters.
The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters.
### 1.Ultra-lightweight Chinese detection model inference
### 1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE
For ultra-lightweight Chinese detection model inference, you can execute the following commands:
For lightweight Chinese detection model inference, you can execute the following commands:
```
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/"
......@@ -78,7 +78,7 @@ If you want to use the CPU for prediction, execute the command as follows
python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False
```
### 2.DB text detection model inference
### 2. DB TEXT DETECTION MODEL INFERENCE
First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert:
......@@ -102,7 +102,7 @@ The visualized text detection results are saved to the `./inference_results` fol
**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.
### 3.EAST text detection model inference
### 3. EAST TEXT DETECTION MODEL INFERENCE
First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert:
......@@ -126,14 +126,14 @@ The visualized text detection results are saved to the `./inference_results` fol
**Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup.
## Text recognition model inference
## TEXT RECOGNITION MODEL INFERENCE
The following will introduce the ultra-lightweight Chinese recognition model inference and CTC loss-based recognition model inference. **The recognition model inference based on Attention loss is still being debugged**. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss.
The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details.
### 1. Ultra-lightweight Chinese recognition model inference
### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE
For ultra-lightweight Chinese recognition model inference, you can execute the following commands:
For lightweight Chinese recognition model inference, you can execute the following commands:
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/"
......@@ -146,7 +146,7 @@ After executing the command, the prediction results (recognized text and score)
Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695]
### 2. Recognition model inference based on CTC loss
### 2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE
Taking STAR-Net as an example, we introduce the recognition model inference based on CTC loss. CRNN and Rosetta are used in a similar way, by setting the recognition algorithm parameter `rec_algorithm`.
......@@ -165,13 +165,15 @@ For STAR-Net text recognition model inference, execute the following commands:
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en"
```
### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE
![](../imgs_words_en/word_336.png)
After executing the command, the recognition result of the above image is as follows:
Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555]
**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of ultra-lightweight Chinese recognition model in two aspects:
**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of lightweight Chinese recognition model in two aspects:
- The image resolution used in training is different: the image resolution used in training the above model is [3,32,100], while during our Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the inference stage is the image resolution used in training phase, that is [3, 32, 320]. Therefore, when running inference of the above English model here, you need to set the shape of the recognition image through the parameter `rec_image_shape`.
......@@ -182,18 +184,18 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
```
### 4.Recognition model inference using custom text dictionary file
If the text dictionary is replaced during training, you need to specify the text dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict.
### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY
If the chars dictionary is modified during training, you need to specify the new dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict.
```
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path"
```
## Text detection and recognition inference concatenation
## TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION
### 1. Ultra-lightweight Chinese OCR model inference
### 1. LIGHTWEIGHT CHINESE MODEL
When performing prediction, you need to specify the path of a single image or a collection of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default.
When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visualized recognition results are saved to the `./inference_results` folder by default.
```
python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/"
......@@ -203,7 +205,7 @@ After executing the command, the recognition result image is as follows:
![](../imgs_results/2.jpg)
### 2. Other model inference
### 2. OTHER MODELS
If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model, the following command uses the combination of the EAST text detection and STAR-Net text recognition:
......
## Quick installation
## QUICK INSTALLATION
After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility.
......@@ -60,7 +60,7 @@ python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsin
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
3. Clone PaddleOCR repo code
3. Clone PaddleOCR repo
```
# Recommend
git clone https://github.com/PaddlePaddle/PaddleOCR
......
## Text recognition
## TEXT RECOGNITION
### Data preparation
### DATA PREPARATION
PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data:
......@@ -10,7 +10,7 @@ Please organize the dataset as follows:
The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
```
ln -sf <path/to/dataset> <path/to/paddle_detection>/train_data/dataset
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
```
......@@ -96,7 +96,7 @@ You can use them if needed.
To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.
### Start training
### TRAINING
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
......@@ -166,7 +166,7 @@ Global:
### Evaluation
### EVALUATION
The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader.
......@@ -176,7 +176,7 @@ export CUDA_VISIBLE_DEVICES=0
python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
```
### Prediction
### PREDICTION
* Training engine prediction
......
# Recent updates
# RECENT UPDATES
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience
- 2020.5.30 Provide Lightweight Chinese OCR online experience
- 2020.5.30 Model prediction and training support on Windows system
- 2020.5.30 Open source general Chinese OCR model
- 2020.5.14 Release [PaddleOCR Open Class](https://www.bilibili.com/video/BV1nf4y1U7RX?p=4)
- 2020.5.14 Release [PaddleOCR Practice Notebook](https://aistudio.baidu.com/aistudio/projectdetail/467229)
- 2020.5.14 Open source 8.6M ultra-lightweight Chinese OCR model
- 2020.5.14 Open source 8.6M lightweight Chinese OCR model
......@@ -96,7 +96,6 @@ class EvalTestReader(object):
img = cv2.imread(img_path)
if img is None:
logger.info("{} does not exist!".format(img_path))
continue
elif len(list(img.shape)) == 2 or img.shape[2] == 1:
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
outs = process_function(img)
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
......@@ -65,6 +65,7 @@ def cal_det_res(exe, config, eval_info_dict):
err = "concatenate error usually caused by different input image shapes in evaluation or testing.\n \
Please set \"test_batch_size_per_card\" in main yml as 1\n \
or add \"test_image_shape: [h, w]\" in reader yml for EvalReader."
raise Exception(err)
outs = exe.run(eval_info_dict['program'], \
feed={'image': img_list}, \
......@@ -113,7 +114,7 @@ def cal_det_metrics(gt_label_path, save_res_path):
gt_label_path(string): The groundtruth detection label file path
save_res_path(string): The saved predicted detection label path
return:
claculated metrics including Hmeanprecision and recall
claculated metrics including Hmean, precision and recall
"""
evaluator = DetectionIoUEvaluator()
gt_label_infor = load_label_infor(gt_label_path, do_ignore=True)
......
......@@ -83,7 +83,7 @@ def eval_rec_run(exe, config, eval_info_dict, mode):
def test_rec_benchmark(exe, config, eval_info_dict):
" 评估lmdb 数据"
" Evaluate lmdb dataset "
eval_data_list = ['IIIT5k_3000', 'SVT', 'IC03_860', 'IC03_867', \
'IC13_857', 'IC13_1015', 'IC15_1811', 'IC15_2077', 'SVTP', 'CUTE80']
eval_data_dir = config['TestReader']['lmdb_sets_dir']
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册