diff --git a/README_en.md b/README_en.md index 219029968a0e950513cf9135004b2de5203d7117..610c25b7eb1ac7c043e39420f777bb3346c3bb08 100644 --- a/README_en.md +++ b/README_en.md @@ -1,18 +1,18 @@ English | [简体中文](README.md) -## Introduction +## INTRODUCTION PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice. **Recent updates** - 2020.6.8 Add [dataset](./doc/doc_en/datasets_en.md) and keep updating - 2020.6.5 Support exporting `attention` model to `inference_model` - 2020.6.5 Support separate prediction and recognition, output result score -- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience +- 2020.5.30 Provide lightweight Chinese OCR online experience - 2020.5.30 Model prediction and training supported on Windows system - [more](./doc/doc_en/update_en.md) -## Features -- Ultra-lightweight Chinese OCR model, total model size is only 8.6M +## FEATURES +- Lightweight Chinese OCR model, total model size is only 8.6M - Single model supports Chinese and English numbers combination recognition, vertical text recognition, long text recognition - Detection model DB (4.1M) + recognition model CRNN (4.5M) - Various text detection algorithms: EAST, DB @@ -22,34 +22,34 @@ PaddleOCR aims to create a rich, leading, and practical OCR tools that help user |Model Name|Description |Detection Model link|Recognition Model link| |-|-|-|-| -|chinese_db_crnn_mobile|Ultra-lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| +|chinese_db_crnn_mobile|lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| |chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr -**You can also quickly experience the Ultra-lightweight Chinese OCR and General Chinese OCR models as follows:** +**You can also quickly experience the lightweight Chinese OCR and General Chinese OCR models as follows:** -## **Ultra-lightweight Chinese OCR and General Chinese OCR inference** +## **LIGHTWEIGHT CHINESE OCR AND GENERAL CHINESE OCR INFERENCE** ![](doc/imgs_results/11.jpg) -The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results](#Ultra-lightweight-Chinese-OCR-results) and [General Chinese OCR results](#General-Chinese-OCR-results). +The picture above is the result of our lightweight Chinese OCR model. For more testing results, please see the end of the article [lightweight Chinese OCR results](#lightweight-Chinese-OCR-results) and [General Chinese OCR results](#General-Chinese-OCR-results). -#### 1. Environment configuration +#### 1. ENVIRONMENT CONFIGURATION Please see [Quick installation](./doc/doc_en/installation_en.md) -#### 2. Download inference models +#### 2. DOWNLOAD INFERENCE MODELS -#### (1) Download Ultra-lightweight Chinese OCR models +#### (1) Download lightweight Chinese OCR models *If wget is not installed in the windows system, you can copy the link to the browser to download the model. After model downloaded, unzip it and place it in the corresponding directory* ``` mkdir inference && cd inference -# Download the detection part of the Ultra-lightweight Chinese OCR and decompress it +# Download the detection part of the lightweight Chinese OCR and decompress it wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar -# Download the recognition part of the Ultra-lightweight Chinese OCR and decompress it +# Download the recognition part of the lightweight Chinese OCR and decompress it wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar cd .. ``` @@ -63,7 +63,7 @@ wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && t cd .. ``` -#### 3. Single image and batch image prediction +#### 3. SINGLE IMAGE AND BATCH PREDICTION The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detection model, and the parameter `rec_model_dir` specifies the path to the recognition model. The visual prediction results are saved to the `./inference_results` folder by default. @@ -87,14 +87,14 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_mode For more text detection and recognition models, please refer to the document [Inference](./doc/doc_en/inference_en.md) -## Documentation +## DOCUMENTATION - [Quick installation](./doc/doc_en/installation_en.md) - [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) - [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) - [Inference](./doc/doc_en/inference_en.md) - [Dataset](./doc/doc_en/datasets_en.md) -## Text detection algorithm +## TEXT DETECTION ALGORITHM PaddleOCR open source text detection algorithms list: - [x] EAST([paper](https://arxiv.org/abs/1704.03155)) @@ -113,14 +113,14 @@ On the ICDAR2015 dataset, the text detection result is as follows: For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for Chinese detection task are as follows: |Model|Backbone|Configuration file|Pre-trained model| |-|-|-|-| -|Ultra-lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| +|lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)| |General Chinese OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)| * Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result. For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md) -## Text recognition algorithm +## TEXT RECOGNITION ALGORITHM PaddleOCR open-source text recognition algorithms list: - [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) @@ -145,16 +145,16 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the Chinese model. The related configuration and pre-trained models are as follows: |Model|Backbone|Configuration file|Pre-trained model| |-|-|-|-| -|Ultra-lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| +|lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)| |General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)| Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md) -## End-to-end OCR algorithm +## END-TO-END OCR ALGORITHM - [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon) - -## Ultra-lightweight Chinese OCR results + +## LIGHTWEIGHT CHINESE OCR RESULTS ![](doc/imgs_results/1.jpg) ![](doc/imgs_results/7.jpg) ![](doc/imgs_results/12.jpg) @@ -189,11 +189,12 @@ Please refer to the document for training guide and use of PaddleOCR text recogn [more](./doc/doc_en/FAQ_en.md) -## Welcome to the PaddleOCR technical exchange group -WeChat: paddlehelp . remarks OCR, the assistant will invite you to join the group~ +## WELCOME TO THE PaddleOCR TECHNICAL EXCHANGE GROUP +WeChat: paddlehelp, note OCR, our assistant will get you into the group~ + -## References +## REFERENCES ``` 1. EAST: @inproceedings{zhou2017east, @@ -248,10 +249,10 @@ WeChat: paddlehelp . remarks OCR, the assistant will invite you to join the grou } ``` -## License +## LICENSE This project is released under Apache 2.0 license -## Contribution +## CONTRIBUTION We welcome all the contributions to PaddleOCR and appreciate for your feedback very much. - Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) for contributing the English documentation. diff --git a/doc/doc_en/FAQ_en.md b/doc/doc_en/FAQ_en.md index f4a4499e71ffb730c8c77931f2871f4b1ea24f4a..04feb363777801088efa0425195afd9e065a5b1e 100644 --- a/doc/doc_en/FAQ_en.md +++ b/doc/doc_en/FAQ_en.md @@ -47,3 +47,7 @@ At present, the open source model, dataset and magnitude are as follows: 10. **Error in using the model with TPS module for prediction** Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](108) != Grid dimension[2](100) Solution:TPS does not support variable shape. Please set --rec_image_shape='3,32,100' and --rec_char_type='en' + +11. **Custom dictionary used during training, the recognition results show that words do not appear in the dictionary** + +The used custom dictionary path is not set when making prediction. The solution is setting parameter `rec_char_dict_path` to the corresponding dictionary file. \ No newline at end of file diff --git a/doc/doc_en/config_en.md b/doc/doc_en/config_en.md index 41c2bb86c57146b57f451484b1d9397c4d83fbff..b9ad03947c545a4760331f835a9cc85be6ff67a7 100644 --- a/doc/doc_en/config_en.md +++ b/doc/doc_en/config_en.md @@ -1,4 +1,4 @@ -# Optional parameters list +# OPTIONAL PARAMETERS LIST The following list can be viewed via `--help` @@ -8,7 +8,7 @@ The following list can be viewed via `--help` | -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` | -## Introduction to Global Parameters of Configuration File +## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE Take `rec_chinese_lite_train.yml` as an example @@ -35,7 +35,7 @@ Take `rec_chinese_lite_train.yml` as an example | checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption | | save_inference_dir | path to save model for inference | None | Use to save inference model | -## Introduction to Reader parameters of Configuration file +## INTRODUCTION TO READER PARAMETERS OF CONFIGURATION FILE Take `rec_chinese_reader.yml` as an example: @@ -47,7 +47,7 @@ Take `rec_chinese_reader.yml` as an example: | label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ | | infer_img | Result folder path | ./infer_img | \| -## Introduction to Optimizer parameters of Configuration file +## INTRODUCTION TO OPTIMIZER PARAMETERS OF CONFIGURATION FILE Take `rec_icdar15_train.yml` as an example: diff --git a/doc/doc_en/customize_en.md b/doc/doc_en/customize_en.md index d3a61ef29fbdbf8c775280a051fe1d9524442ac1..b63de67c6226abbb5b4fb8d0ed57c19142307203 100644 --- a/doc/doc_en/customize_en.md +++ b/doc/doc_en/customize_en.md @@ -1,8 +1,8 @@ -# How to make your own ultra-lightweight OCR models? +# HOW TO MAKE YOUR OWN LIGHTWEIGHT OCR MODEL? The process of making a customized ultra-lightweight OCR models can be divided into three steps: training text detection model, training text recognition model, and concatenate the predictions from previous steps. -## step1: Train text detection model +## STEP1: TRAIN TEXT DETECTION MODEL PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks, select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model : ``` @@ -10,7 +10,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml ``` For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection_en.md) -## step2: Train text recognition model +## STEP2: TRAIN TEXT RECOGNITION MODEL PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks: MobileNetV3 and ResNet34_vd, select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network: ``` @@ -18,7 +18,7 @@ python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml ``` For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition_en.md) -## step3: Concatenate predictions +## STEP3: CONCATENATE PREDICTIONS PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages: text detection, text rectification, text recognition, and score filtering to output the text position and recognition results, and at the same time, you can choose to visualize the results. diff --git a/doc/doc_en/datasets_en.md b/doc/doc_en/datasets_en.md index ed8580523030cc79ceacd79b28c09bb62dda2e63..61d2033b4fe8f0077ad66fb9ae2cd559ce29fd65 100644 --- a/doc/doc_en/datasets_en.md +++ b/doc/doc_en/datasets_en.md @@ -1,4 +1,4 @@ -## Dataset +## DATASET This is a collection of commonly used Chinese datasets, which is being updated continuously. You are welcome to contribute to this list~ - [ICDAR2019-LSVT](#ICDAR2019-LSVT) - [ICDAR2017-RCTW-17](#ICDAR2017-RCTW-17) @@ -13,8 +13,11 @@ In addition to opensource data, users can also use synthesis tools to synthesize - **Data sources**:https://ai.baidu.com/broad/introduction?dataset=lsvt - **Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure: ![](../datasets/LSVT_1.jpg) + (a) Fully labeled data + ![](../datasets/LSVT_2.jpg) + (b) Weakly labeled data - **Download link**:https://ai.baidu.com/broad/download?dataset=lsvt diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index 7dc1fa200e24194a704cdd330b74bc38a5fe180e..6bb496c91c32702eb408ea5afcfded33301d7535 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -1,8 +1,8 @@ -# Text detection +# TEXT DETECTION This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. -## Data preparation +## DATA PREPARATION The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget: @@ -27,13 +27,13 @@ The provided annotation file format is as follow: " Image file name Image annotation information encoded by json.dumps" ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}] ``` -The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner. +The image annotation after json.dumps() encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner. `transcription` represents the text of the current text box, and this information is not needed in the text detection task. If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format. -## Quickstart training +## TRAINING First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs. ``` @@ -56,7 +56,7 @@ tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models ``` -**Start training** +**START TRAINING** ``` python3 tools/train.py -c configs/det/det_mv3_db.yml ``` @@ -80,7 +80,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you **Note**:The priority of Global.checkpoints is higher than the priority of Global.pretrain_weights, that is, when two parameters are specified at the same time, the model specified by Global.checkpoints will be loaded first. If the model path specified by Global.checkpoints is wrong, the one specified by Global.pretrain_weights will be loaded. -## Evaluation Indicator +## EVALUATION PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean. @@ -100,7 +100,7 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./ou * Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. -## Test detection result +## TEST DETECTION RESULT Test the detection result on a single image: ``` diff --git a/doc/doc_en/inference_en.md b/doc/doc_en/inference_en.md index 8932eafc64232584842a97571c5a7b0672a2ad55..0fd7a3725df653b29703a23618675fbc72a6e342 100644 --- a/doc/doc_en/inference_en.md +++ b/doc/doc_en/inference_en.md @@ -1,18 +1,18 @@ -# Prediction from inference model +# PREDICTION FROM INFERENCE MODEL The inference model (the model saved by fluid.io.save_inference_model) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment. The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training. -Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification prediction framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). +Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model. -## Training model to inference model -### Detection model to inference model +## CONVERT TRAINING MODEL TO INFERENCE MODEL +### Convert detection model to inference model -Download the ultra-lightweight Chinese detection model: +Download the lightweight Chinese detection model: ``` wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/ ``` @@ -29,9 +29,9 @@ inference/det_db/ └─ params Check the parameter file of the inference model ``` -### Recognition model to inference model +### Convert recognition model to inference model -Download the ultra-lightweight Chinese recognition model: +Download the lightweight Chinese recognition model: ``` wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/ ``` @@ -51,13 +51,13 @@ After the conversion is successful, there are two files in the directory: └─ params Identify the parameter files of the inference model ``` -## Text detection model inference +## TEXT DETECTION MODEL INFERENCE -The following will introduce the ultra-lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters. +The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters. -### 1.Ultra-lightweight Chinese detection model inference +### 1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE -For ultra-lightweight Chinese detection model inference, you can execute the following commands: +For lightweight Chinese detection model inference, you can execute the following commands: ``` python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" @@ -78,7 +78,7 @@ If you want to use the CPU for prediction, execute the command as follows python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False ``` -### 2.DB text detection model inference +### 2. DB TEXT DETECTION MODEL INFERENCE First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert: @@ -102,7 +102,7 @@ The visualized text detection results are saved to the `./inference_results` fol **Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images. -### 3.EAST text detection model inference +### 3. EAST TEXT DETECTION MODEL INFERENCE First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert: @@ -126,14 +126,14 @@ The visualized text detection results are saved to the `./inference_results` fol **Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup. -## Text recognition model inference +## TEXT RECOGNITION MODEL INFERENCE -The following will introduce the ultra-lightweight Chinese recognition model inference and CTC loss-based recognition model inference. **The recognition model inference based on Attention loss is still being debugged**. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. +The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details. -### 1. Ultra-lightweight Chinese recognition model inference +### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE -For ultra-lightweight Chinese recognition model inference, you can execute the following commands: +For lightweight Chinese recognition model inference, you can execute the following commands: ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/" @@ -146,7 +146,7 @@ After executing the command, the prediction results (recognized text and score) Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695] -### 2. Recognition model inference based on CTC loss +### 2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE Taking STAR-Net as an example, we introduce the recognition model inference based on CTC loss. CRNN and Rosetta are used in a similar way, by setting the recognition algorithm parameter `rec_algorithm`. @@ -165,13 +165,15 @@ For STAR-Net text recognition model inference, execute the following commands: ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` + +### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE ![](../imgs_words_en/word_336.png) After executing the command, the recognition result of the above image is as follows: Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555] -**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of ultra-lightweight Chinese recognition model in two aspects: +**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of lightweight Chinese recognition model in two aspects: - The image resolution used in training is different: the image resolution used in training the above model is [3,32,100], while during our Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the inference stage is the image resolution used in training phase, that is [3, 32, 320]. Therefore, when running inference of the above English model here, you need to set the shape of the recognition image through the parameter `rec_image_shape`. @@ -182,18 +184,18 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" dict_character = list(self.character_str) ``` -### 4.Recognition model inference using custom text dictionary file -If the text dictionary is replaced during training, you need to specify the text dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict. +### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY +If the chars dictionary is modified during training, you need to specify the new dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict. ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path" ``` -## Text detection and recognition inference concatenation +## TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION -### 1. Ultra-lightweight Chinese OCR model inference +### 1. LIGHTWEIGHT CHINESE MODEL -When performing prediction, you need to specify the path of a single image or a collection of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default. +When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visualized recognition results are saved to the `./inference_results` folder by default. ``` python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" @@ -203,7 +205,7 @@ After executing the command, the recognition result image is as follows: ![](../imgs_results/2.jpg) -### 2. Other model inference +### 2. OTHER MODELS If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model, the following command uses the combination of the EAST text detection and STAR-Net text recognition: diff --git a/doc/doc_en/installation_en.md b/doc/doc_en/installation_en.md index 05471c0c9dcc4b96d625a2757de3392163fc405d..cc3bfb52f5dc592cf9fe1dc1a9a6fd8f1d3bb5cf 100644 --- a/doc/doc_en/installation_en.md +++ b/doc/doc_en/installation_en.md @@ -1,4 +1,4 @@ -## Quick installation +## QUICK INSTALLATION After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility. @@ -60,7 +60,7 @@ python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsin For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. -3. Clone PaddleOCR repo code +3. Clone PaddleOCR repo ``` # Recommend git clone https://github.com/PaddlePaddle/PaddleOCR diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index 49643fd6453a745675641cf0373cae6eebc4db53..9a862c7a67c6d2277bc6472b304534788e06921d 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -1,6 +1,6 @@ -## Text recognition +## TEXT RECOGNITION -### Data preparation +### DATA PREPARATION PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data: @@ -96,7 +96,7 @@ You can use them if needed. To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`. -### Start training +### TRAINING PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: @@ -166,7 +166,7 @@ Global: -### Evaluation +### EVALUATION The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader. @@ -176,7 +176,7 @@ export CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy ``` -### Prediction +### PREDICTION * Training engine prediction diff --git a/doc/doc_en/update_en.md b/doc/doc_en/update_en.md index e5b908f358759d204877c9664e2d7e4f987ac8c9..c3e868b7421b5add6e1ee7f8aca8f7af0fecc999 100644 --- a/doc/doc_en/update_en.md +++ b/doc/doc_en/update_en.md @@ -1,10 +1,10 @@ -# Recent updates +# RECENT UPDATES - 2020.6.5 Support exporting `attention` model to `inference_model` - 2020.6.5 Support separate prediction and recognition, output result score -- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience +- 2020.5.30 Provide Lightweight Chinese OCR online experience - 2020.5.30 Model prediction and training support on Windows system - 2020.5.30 Open source general Chinese OCR model - 2020.5.14 Release [PaddleOCR Open Class](https://www.bilibili.com/video/BV1nf4y1U7RX?p=4) - 2020.5.14 Release [PaddleOCR Practice Notebook](https://aistudio.baidu.com/aistudio/projectdetail/467229) -- 2020.5.14 Open source 8.6M ultra-lightweight Chinese OCR model +- 2020.5.14 Open source 8.6M lightweight Chinese OCR model diff --git a/ppocr/data/det/dataset_traversal.py b/ppocr/data/det/dataset_traversal.py index 737cbe2e90dc259c1266b554ae0c6735ea3a52d2..5331ec191666f797d1dba50f11beb958a43b71e9 100644 --- a/ppocr/data/det/dataset_traversal.py +++ b/ppocr/data/det/dataset_traversal.py @@ -96,7 +96,6 @@ class EvalTestReader(object): img = cv2.imread(img_path) if img is None: logger.info("{} does not exist!".format(img_path)) - continue elif len(list(img.shape)) == 2 or img.shape[2] == 1: img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) outs = process_function(img)