From 9bb1d4ecd3cbb6afd860968386bf44b9c73b28d5 Mon Sep 17 00:00:00 2001 From: xxxpsyduck Date: Wed, 24 Jun 2020 17:13:03 +0700 Subject: [PATCH] update docs --- README_en.md | 3 +- doc/doc_en/FAQ_en.md | 4 +++ doc/doc_en/config_en.md | 8 ++--- doc/doc_en/customize_en.md | 8 ++--- doc/doc_en/datasets_en.md | 2 +- doc/doc_en/detection_en.md | 14 ++++----- doc/doc_en/inference_en.md | 46 +++++++++++++++-------------- doc/doc_en/installation_en.md | 4 +-- doc/doc_en/recognition_en.md | 10 +++---- doc/doc_en/update_en.md | 2 +- ppocr/data/det/dataset_traversal.py | 1 - 11 files changed, 54 insertions(+), 48 deletions(-) diff --git a/README_en.md b/README_en.md index 21902996..85a62444 100644 --- a/README_en.md +++ b/README_en.md @@ -190,8 +190,9 @@ Please refer to the document for training guide and use of PaddleOCR text recogn [more](./doc/doc_en/FAQ_en.md) ## Welcome to the PaddleOCR technical exchange group -WeChat: paddlehelp . remarks OCR, the assistant will invite you to join the group~ +WeChat: paddlehelp, note OCR, our assistant will get you into the group~ + ## References ``` diff --git a/doc/doc_en/FAQ_en.md b/doc/doc_en/FAQ_en.md index cdbc6bf7..06832b30 100644 --- a/doc/doc_en/FAQ_en.md +++ b/doc/doc_en/FAQ_en.md @@ -49,3 +49,7 @@ At present, the open source model, dataset and magnitude are as follows: Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](108) != Grid dimension[2](100) Solution:TPS does not support variable shape. Please set --rec_image_shape='3,32,100' and --rec_char_type='en' + +11. **Custom dictionary used during training, the recognition results show that words do not appear in the dictionary** + +The used custom dictionary path is not set when making prediction. The solution is setting parameter `rec_char_dict_path` to the corresponding dictionary file. \ No newline at end of file diff --git a/doc/doc_en/config_en.md b/doc/doc_en/config_en.md index 41c2bb86..b9ad0394 100644 --- a/doc/doc_en/config_en.md +++ b/doc/doc_en/config_en.md @@ -1,4 +1,4 @@ -# Optional parameters list +# OPTIONAL PARAMETERS LIST The following list can be viewed via `--help` @@ -8,7 +8,7 @@ The following list can be viewed via `--help` | -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` | -## Introduction to Global Parameters of Configuration File +## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE Take `rec_chinese_lite_train.yml` as an example @@ -35,7 +35,7 @@ Take `rec_chinese_lite_train.yml` as an example | checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption | | save_inference_dir | path to save model for inference | None | Use to save inference model | -## Introduction to Reader parameters of Configuration file +## INTRODUCTION TO READER PARAMETERS OF CONFIGURATION FILE Take `rec_chinese_reader.yml` as an example: @@ -47,7 +47,7 @@ Take `rec_chinese_reader.yml` as an example: | label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ | | infer_img | Result folder path | ./infer_img | \| -## Introduction to Optimizer parameters of Configuration file +## INTRODUCTION TO OPTIMIZER PARAMETERS OF CONFIGURATION FILE Take `rec_icdar15_train.yml` as an example: diff --git a/doc/doc_en/customize_en.md b/doc/doc_en/customize_en.md index d3a61ef2..b63de67c 100644 --- a/doc/doc_en/customize_en.md +++ b/doc/doc_en/customize_en.md @@ -1,8 +1,8 @@ -# How to make your own ultra-lightweight OCR models? +# HOW TO MAKE YOUR OWN LIGHTWEIGHT OCR MODEL? The process of making a customized ultra-lightweight OCR models can be divided into three steps: training text detection model, training text recognition model, and concatenate the predictions from previous steps. -## step1: Train text detection model +## STEP1: TRAIN TEXT DETECTION MODEL PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks, select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model : ``` @@ -10,7 +10,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml ``` For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection_en.md) -## step2: Train text recognition model +## STEP2: TRAIN TEXT RECOGNITION MODEL PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks: MobileNetV3 and ResNet34_vd, select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network: ``` @@ -18,7 +18,7 @@ python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml ``` For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition_en.md) -## step3: Concatenate predictions +## STEP3: CONCATENATE PREDICTIONS PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages: text detection, text rectification, text recognition, and score filtering to output the text position and recognition results, and at the same time, you can choose to visualize the results. diff --git a/doc/doc_en/datasets_en.md b/doc/doc_en/datasets_en.md index ed858052..47e6ec48 100644 --- a/doc/doc_en/datasets_en.md +++ b/doc/doc_en/datasets_en.md @@ -1,4 +1,4 @@ -## Dataset +## DATASET This is a collection of commonly used Chinese datasets, which is being updated continuously. You are welcome to contribute to this list~ - [ICDAR2019-LSVT](#ICDAR2019-LSVT) - [ICDAR2017-RCTW-17](#ICDAR2017-RCTW-17) diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index 7dc1fa20..6e4aede3 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -1,8 +1,8 @@ -# Text detection +# TEXT DETECTION This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. -## Data preparation +## DATA PREPARATION The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget: @@ -27,13 +27,13 @@ The provided annotation file format is as follow: " Image file name Image annotation information encoded by json.dumps" ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}] ``` -The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner. +The image annotation after json.dumps() encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner. `transcription` represents the text of the current text box, and this information is not needed in the text detection task. If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format. -## Quickstart training +## QUICKSTART First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs. ``` @@ -56,7 +56,7 @@ tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models ``` -**Start training** +**START TRAINING** ``` python3 tools/train.py -c configs/det/det_mv3_db.yml ``` @@ -80,7 +80,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you **Note**:The priority of Global.checkpoints is higher than the priority of Global.pretrain_weights, that is, when two parameters are specified at the same time, the model specified by Global.checkpoints will be loaded first. If the model path specified by Global.checkpoints is wrong, the one specified by Global.pretrain_weights will be loaded. -## Evaluation Indicator +## EVALUATION PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean. @@ -100,7 +100,7 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./ou * Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. -## Test detection result +## TEST DETECTION RESULT Test the detection result on a single image: ``` diff --git a/doc/doc_en/inference_en.md b/doc/doc_en/inference_en.md index 8932eafc..95a17f2d 100644 --- a/doc/doc_en/inference_en.md +++ b/doc/doc_en/inference_en.md @@ -5,14 +5,14 @@ The inference model (the model saved by fluid.io.save_inference_model) is genera The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training. -Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification prediction framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). +Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting in deployment and accelerating inferencing, is flexible and convenient, and is suitable for integration with actual systems. For more details, please refer to the document [Classification Framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model. -## Training model to inference model -### Detection model to inference model +## Convert training model to inference model +### Convert detection model to inference model -Download the ultra-lightweight Chinese detection model: +Download the lightweight Chinese detection model: ``` wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/ ``` @@ -29,9 +29,9 @@ inference/det_db/ └─ params Check the parameter file of the inference model ``` -### Recognition model to inference model +### Convert recognition model to inference model -Download the ultra-lightweight Chinese recognition model: +Download the lightweight Chinese recognition model: ``` wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/ ``` @@ -53,11 +53,11 @@ After the conversion is successful, there are two files in the directory: ## Text detection model inference -The following will introduce the ultra-lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters. +The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters. -### 1.Ultra-lightweight Chinese detection model inference +### 1. lightweight Chinese detection model inference -For ultra-lightweight Chinese detection model inference, you can execute the following commands: +For lightweight Chinese detection model inference, you can execute the following commands: ``` python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" @@ -78,7 +78,7 @@ If you want to use the CPU for prediction, execute the command as follows python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False ``` -### 2.DB text detection model inference +### 2. DB text detection model inference First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert: @@ -102,7 +102,7 @@ The visualized text detection results are saved to the `./inference_results` fol **Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images. -### 3.EAST text detection model inference +### 3. EAST text detection model inference First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert: @@ -128,12 +128,12 @@ The visualized text detection results are saved to the `./inference_results` fol ## Text recognition model inference -The following will introduce the ultra-lightweight Chinese recognition model inference and CTC loss-based recognition model inference. **The recognition model inference based on Attention loss is still being debugged**. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. +The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details. -### 1. Ultra-lightweight Chinese recognition model inference +### 1. LIGHTWEIGHT CHINESE TEXT RECOGNITION MODEL REFERENCE -For ultra-lightweight Chinese recognition model inference, you can execute the following commands: +For lightweight Chinese recognition model inference, you can execute the following commands: ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/" @@ -146,7 +146,7 @@ After executing the command, the prediction results (recognized text and score) Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695] -### 2. Recognition model inference based on CTC loss +### 2. CTC-BASED TEXT RECOGNITION MODEL INFERENCE Taking STAR-Net as an example, we introduce the recognition model inference based on CTC loss. CRNN and Rosetta are used in a similar way, by setting the recognition algorithm parameter `rec_algorithm`. @@ -165,13 +165,15 @@ For STAR-Net text recognition model inference, execute the following commands: ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" ``` + +### 3. ATTENTION-BASED TEXT RECOGNITION MODEL INFERENCE ![](../imgs_words_en/word_336.png) After executing the command, the recognition result of the above image is as follows: Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555] -**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of ultra-lightweight Chinese recognition model in two aspects: +**Note**:Since the above model refers to [DTRB](https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of lightweight Chinese recognition model in two aspects: - The image resolution used in training is different: the image resolution used in training the above model is [3,32,100], while during our Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the inference stage is the image resolution used in training phase, that is [3, 32, 320]. Therefore, when running inference of the above English model here, you need to set the shape of the recognition image through the parameter `rec_image_shape`. @@ -182,18 +184,18 @@ self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" dict_character = list(self.character_str) ``` -### 4.Recognition model inference using custom text dictionary file -If the text dictionary is replaced during training, you need to specify the text dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict. +### 4. TEXT RECOGNITION MODEL INFERENCE USING CUSTOM CHARACTERS DICTIONARY +If the chars dictionary is modified during training, you need to specify the new dictionary path by setting the parameter `rec_char_dict_path` when using your inference model to predict. ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_type="en" --rec_char_dict_path="your text dict path" ``` -## Text detection and recognition inference concatenation +## TEXT DETECTION AND RECOGNITION INFERENCE CONCATENATION -### 1. Ultra-lightweight Chinese OCR model inference +### 1. LIGHTWEIGHT CHINESE MODEL -When performing prediction, you need to specify the path of a single image or a collection of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default. +When performing prediction, you need to specify the path of a single image or a folder of images through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visualized recognition results are saved to the `./inference_results` folder by default. ``` python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" @@ -203,7 +205,7 @@ After executing the command, the recognition result image is as follows: ![](../imgs_results/2.jpg) -### 2. Other model inference +### 2. OTHER MODELS If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model, the following command uses the combination of the EAST text detection and STAR-Net text recognition: diff --git a/doc/doc_en/installation_en.md b/doc/doc_en/installation_en.md index 05471c0c..cc3bfb52 100644 --- a/doc/doc_en/installation_en.md +++ b/doc/doc_en/installation_en.md @@ -1,4 +1,4 @@ -## Quick installation +## QUICK INSTALLATION After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility. @@ -60,7 +60,7 @@ python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsin For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. -3. Clone PaddleOCR repo code +3. Clone PaddleOCR repo ``` # Recommend git clone https://github.com/PaddlePaddle/PaddleOCR diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index 097bcc6d..01877bd1 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -1,6 +1,6 @@ -## Text recognition +## TEXT RECOGNITION -### Data preparation +### DATA PREPARATION PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data: @@ -96,7 +96,7 @@ You can use them if needed. To customize the dict file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`. -### Start training +### TRAINING PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: @@ -166,7 +166,7 @@ Global: -### Evaluation +### EVALUATION The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader. @@ -176,7 +176,7 @@ export CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy ``` -### Prediction +### PREDICTION * Training engine prediction diff --git a/doc/doc_en/update_en.md b/doc/doc_en/update_en.md index e5b908f3..de0502ff 100644 --- a/doc/doc_en/update_en.md +++ b/doc/doc_en/update_en.md @@ -1,4 +1,4 @@ -# Recent updates +# RECENT UPDATES - 2020.6.5 Support exporting `attention` model to `inference_model` - 2020.6.5 Support separate prediction and recognition, output result score diff --git a/ppocr/data/det/dataset_traversal.py b/ppocr/data/det/dataset_traversal.py index 737cbe2e..5331ec19 100644 --- a/ppocr/data/det/dataset_traversal.py +++ b/ppocr/data/det/dataset_traversal.py @@ -96,7 +96,6 @@ class EvalTestReader(object): img = cv2.imread(img_path) if img is None: logger.info("{} does not exist!".format(img_path)) - continue elif len(list(img.shape)) == 2 or img.shape[2] == 1: img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) outs = process_function(img) -- GitLab