diff --git a/README.md b/README.md index a494aeae19f5ea6dc2039188c2231af33e9f41bf..a5c450ea21614830811bc6f00ce975b307dd58d1 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ ## Introduction -PaddleOCR aims to create a rich, leading, and practical OCR tool library to help users train better models and apply them. +PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice. **Recent updates** - 2020.5.30,Model prediction and training support Windows systems, and the display of recognition results is optimized @@ -23,13 +23,13 @@ PaddleOCR aims to create a rich, leading, and practical OCR tool library to help For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr -**You can also quickly experience the Ultra-lightweight Chinese OCR and general Chinese OCR models as follows:** +**You can also quickly experience the Ultra-lightweight Chinese OCR and General Chinese OCR models as follows:** ## **Ultra-lightweight Chinese OCR and General Chinese OCR inference** ![](doc/imgs_results/11.jpg) -The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results](#超轻量级中文OCR效果展示) and [General Chinese OCR results](#通用中文OCR效果展示). +The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results](#Ultra-lightweight-Chinese-OCR-results) and [General Chinese OCR results](#General-Chinese-OCR-results). #### 1. Environment configuration @@ -58,25 +58,25 @@ cd .. #### 3. Single image and batch image prediction -The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default. +The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detection model, and the parameter `rec_model_dir` specifies the path to the recognition model. The visual prediction results are saved to the `./inference_results` folder by default. ``` # Set PYTHONPATH environment variable export PYTHONPATH=. -# Predict a single image by specifying image path to image_dir +# Prediction on a single image by specifying image path to image_dir python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" -# Predict a batch of images by specifying image folder path to image_dir +# Prediction on a batch of images by specifying image folder path to image_dir python3 tools/infer/predict_system.py --image_dir="./doc/imgs/" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" -# If you want to use the CPU for prediction, you need to set the use_gpu parameter to False +# If you want to use CPU for prediction, you need to set the use_gpu parameter to False python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_mv3_db/" --rec_model_dir="./inference/ch_rec_mv3_crnn/" --use_gpu=False ``` To run inference of the Generic Chinese OCR model, follow these steps above to download the corresponding models and update the relevant parameters. Examples are as follows: ``` -# Predict a single image by specifying image path to image_dir +# Prediction on a single image by specifying image path to image_dir python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_det_r50_vd_db/" --rec_model_dir="./inference/ch_rec_r34_vd_crnn/" ``` @@ -90,12 +90,12 @@ For more text detection and recognition models, please refer to the document [In ## Text detection algorithm -PaddleOCR open source text detection algorithm list: +PaddleOCR open source text detection algorithms list: - [x] EAST([paper](https://arxiv.org/abs/1704.03155)) - [x] DB([paper](https://arxiv.org/abs/1911.08947)) - [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research, comming soon) -On the ICDAR2015 text detection public dataset, the detection result is as follows: +On the ICDAR2015 dataset, the text detection result is as follows: |Model|Backbone|precision|recall|Hmean|Download link| |-|-|-|-|-|-| @@ -106,11 +106,11 @@ On the ICDAR2015 text detection public dataset, the detection result is as follo * Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result. -For the training guide and use of PaddleOCR text detection algorithm, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md) +For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/detection.md) ## Text recognition algorithm -PaddleOCR open-source text recognition algorithm list: +PaddleOCR open-source text recognition algorithms list: - [x] CRNN([paper](https://arxiv.org/abs/1507.05717)) - [x] Rosetta([paper](https://arxiv.org/abs/1910.05085)) - [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)) @@ -130,13 +130,13 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r |RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)| |RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)| -Please refer to the document for training guide and use of PaddleOCR text recognition algorithm [Text recognition model training/evaluation/prediction](./doc/recognition.md) +Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/recognition.md) ## End-to-end OCR algorithm - [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon) - -## Ultra-lightweight Chinese OCR result + +## Ultra-lightweight Chinese OCR results ![](doc/imgs_results/1.jpg) ![](doc/imgs_results/7.jpg) ![](doc/imgs_results/12.jpg) @@ -146,35 +146,35 @@ Please refer to the document for training guide and use of PaddleOCR text recogn ![](doc/imgs_results/16.png) ![](doc/imgs_results/22.jpg) - -## 通用中文OCR效果展示 + +## General Chinese OCR results ![](doc/imgs_results/chinese_db_crnn_server/11.jpg) ![](doc/imgs_results/chinese_db_crnn_server/2.jpg) ![](doc/imgs_results/chinese_db_crnn_server/8.jpg) ## FAQ -1. 预测报错:got an unexpected keyword argument 'gradient_clip' +1. Prediction error:got an unexpected keyword argument 'gradient_clip' - The installed paddle version is not correct. At present, this project only supports paddle1.7, which will be adapted to 1.8 in the near future.。 + The installed paddle version is not correct. At present, this project only supports paddle1.7, which will be adapted to 1.8 in the near future. -2. 转换attention识别模型时报错:KeyError: 'predict' +2. Error when using attention-based recognition model: KeyError: 'predict' - 基于Attention损失的识别模型推理还在调试中。对于中文文本识别,建议优先选择基于CTC损失的识别模型,实践中也发现基于Attention损失的效果不如基于CTC损失的识别模型。 + The inference of recognition model based on attention loss is still being debugged. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss first. In practice, it is also found that the recognition model based on attention loss is not as effective as the one based on CTC loss. -3. 关于推理速度 +3. About inference speed - 图片中的文字较多时,预测时间会增,可以使用--rec_batch_num设置更小预测batch num,默认值为30,可以改为10或其他数值。 + When there are a lot of texts in the picture, the prediction time will increase. You can use `--rec_batch_num` to set a smaller prediction batch size. The default value is 30, which can be changed to 10 or other values. -4. 服务部署与移动端部署 +4. Service deployment and mobile deployment - 预计6月中下旬会先后发布基于Serving的服务部署方案和基于Paddle Lite的移动端部署方案,欢迎持续关注。 + It is expected that the service deployment based on Serving and the mobile deployment based on Paddle Lite will be released successively in mid-to-late June. Stay tuned for more updates. -5. 自研算法发布时间 +5. Release time of self-developed algorithm - 自研算法SAST、SRN、End2End-PSL都将在6-7月陆续发布,敬请期待。 + Baidu Self-developed algorithms such as SAST, SRN and end2end PSL will be released in June or July. Please be patient. ## Welcome to the PaddleOCR technical exchange group -加微信:paddlehelp,备注OCR,小助手拉你进群~ +Add Wechat: paddlehelp, remark OCR, small assistant will pull you into the group ~ ## References @@ -236,4 +236,4 @@ Please refer to the document for training guide and use of PaddleOCR text recogn This project is released under Apache 2.0 license ## Contribution -We welcome your contribution to PaddleOCR and thank you for your feedback. +We welcome all the contributions to PaddleOCR and appreciate for your feedback very much. diff --git a/doc/config.md b/doc/config.md deleted file mode 100644 index 3ea85d0d5f95225b447ddc2e2e6efd454b39d7f4..0000000000000000000000000000000000000000 --- a/doc/config.md +++ /dev/null @@ -1,49 +0,0 @@ -# Optional parameters list - -The following list can be viewed via `--help` - -| FLAG | Supported script | Use | Defaults | Note | -| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: | -| -c | ALL | Specify configuration file | None | **Please refer to the parameter introduction for configuration file usage** | -| -o | ALL | Set the parameter in the configuration file | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` | - - -## Introduction to Global Parameters of Configuration File - -Take `rec_chinese_lite_train.yml` as an example - - -| Parameter | Use | Default | Note | -| :----------------------: | :---------------------: | :--------------: | :--------------------: | -| algorithm | Select algorithm to use | Synchronize with configuration file | For selecting model, please refer to the supported model [list](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/README.md) | -| use_gpu | Set using GPU or not | true | \ | -| epoch_num | Maximum training epoch number | 3000 | \ | -| log_smooth_window | Sliding window size | 20 | \ | -| print_batch_step | Set print log interval | 10 | \ | -| save_model_dir | Set model save path | output/{model_name} | \ | -| save_epoch_step | Set model save interval | 3 | \ | -| eval_batch_step | Set the model evaluation interval | 2000 | \ | -|train_batch_size_per_card | Set the batch size during training | 256 | \ | -| test_batch_size_per_card | Set the batch size during testing | 256 | \ | -| image_shape | Set input image size | [3, 32, 100] | \ | -| max_text_length | Set the maximum text length | 25 | \ | -| character_type | Set character type | ch | en/ch, the default dict will be used for en, and the custom dict will be used for ch| -| character_dict_path | Set dictionary path | ./ppocr/utils/ic15_dict.txt | \ | -| loss_type | Set loss type | ctc | Supports two types of loss: ctc / attention | -| reader_yml | Set the reader configuration file | ./configs/rec/rec_icdar15_reader.yml | \ | -| pretrain_weights | Load pre-trained model path | ./pretrain_models/CRNN/best_accuracy | \ | -| checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption | -| save_inference_dir | path to save model for inference | None | Use to save inference model | - -## Introduction Reader parameters of Configuration file - -Take `rec_chinese_reader.yml` as an example: - -| Parameter | Use | Default | Note | -| :----------------------: | :---------------------: | :--------------: | :--------------------: | -| reader_function | Select data reading method | ppocr.data.rec.dataset_traversal,SimpleReader | Support two data reading methods: SimpleReader / LMDBReader | -| num_workers | Set the number of data reading threads | 8 | \ | -| img_set_dir | Image folder path | ./train_data | \ | -| label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ | -| infer_img | Result folder path | ./infer_img | \| - diff --git a/doc/customize.md b/doc/customize.md deleted file mode 100644 index 472e54b484f659f146331a9ac7cb8a83d9dfcd8b..0000000000000000000000000000000000000000 --- a/doc/customize.md +++ /dev/null @@ -1,30 +0,0 @@ -# How to make your own custom ultra-lightweight models? - -The process of making a customized ultra-lightweight models can be divided into three steps: training text detection model, training text recognition model, and make prediction with trained models. - -## step1: Train text detection model - -PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks. Select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model : -``` -python3 tools/train.py -c configs/det/det_mv3_db.yml -``` -For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection.md) - -## step2: Train text recognition model - -PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks, MobileNetV3 and ResNet34_vd, and select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network: -``` -python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml -``` -For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition.md) - -## step3: Make prediction - -PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages of text detection, detection frame correction, text recognition, and score filtering to output the text position and recognition results, and at the same time, the results can be selected for visualization. - -When performing prediction, you need to specify the path of a single image or a image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detect the inference model, and the parameter `rec_model_dir` specifies the path to identify the inference model. The visual recognition results are saved to the `./inference_results` folder by default. - -``` -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/det/" --rec_model_dir="./inference/rec/" -``` -For more text detection and recognition concatenation, please refer to the document [Inference](./inference.md) diff --git a/doc/detection.md b/doc/detection.md deleted file mode 100644 index 1353d11e67f0166c36d2afff7fca1f563bd0be05..0000000000000000000000000000000000000000 --- a/doc/detection.md +++ /dev/null @@ -1,96 +0,0 @@ -# Text detection - -This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR. - -## Data preparation -The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading. - -Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes scattered annotation files into separate annotation files. You can download by wget: -``` -# Under the PaddleOCR path -cd PaddleOCR/ -wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_label.txt -wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt -``` - -After decompressing the data set and downloading the annotation file, PaddleOCR/train_data/ has two folders and two files, which are: -``` -/PaddleOCR/train_data/icdar2015/text_localization/ - └─ icdar_c4_train_imgs/ Training data of icdar dataset - └─ ch4_test_images/ Testing data of icdar dataset - └─ train_icdar2015_label.txt Training annotation of icdar dataset - └─ test_icdar2015_label.txt Test annotation of icdar dataset -``` - -The label file format provided is: -``` -" Image file name Image annotation information encoded by json.dumps" -ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}] -``` -The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point in the upper left corner. - -`transcription` represents the text of the current text box, and this information is not needed in the text detection task. -If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format。 - - -## Quickstart training - -First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs. -``` -cd PaddleOCR/ -# Download the pre-trained model of MobileNetV3 -wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar -# Download the pre-trained model of ResNet50 -wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar -``` - -**Start training** -``` -python3 tools/train.py -c configs/det/det_mv3_db.yml -``` - -In the above instruction, use -c to select the training to use the configs/det/det_db_mv3.yml configuration file. -For a detailed explanation of the configuration file, please refer to [link](./doc/config.md). - -You can also use the -o parameter to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 -``` -python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001 -``` - -## Index evaluation - -PaddleOCR calculates three indicators related to OCR detection: Precision, Recall, and Hmean. - -Run the following code to calculate the evaluation index based on the test result file specified by save_res_path in the configuration file det_db_mv3.yml - -When evaluating, set post-processing parameters box_thresh=0.6, unclip_ratio=1.5, use different data sets, different models for training, these two parameters can be adjusted for optimization. - -``` -python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 -``` -The model parameters during training are saved in the Global.save_model_dir directory by default. When evaluating indicators, you need to set Global.checkpoints to point to the saved parameter file. - -Such as: -``` -python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 -``` - -* Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and do not need to be set when evaluating the EAST model. - -## Test detection result - -Test the detection result on a single image: -``` -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" -``` - -When testing the DB model, adjust the post-processing threshold: -``` -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 -``` - - -Test the detection effect of all images in the folder: -``` -python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" -``` diff --git a/doc/inference.md b/doc/inference.md deleted file mode 100644 index 1a6d882a7a717ae44630d24c9dccbcbfbe715f5a..0000000000000000000000000000000000000000 --- a/doc/inference.md +++ /dev/null @@ -1,209 +0,0 @@ - -# Inference based on prediction engine - -inference model (model saved by fluid.io.save_inference_model) -It is generally the solidified model saved after the model training is completed, which is mostly used to predict deployment. -The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training. -Compared with the checkpoints model, the inference model will additionally save the structural information of the model. It has superior performance in predicting deployment and accelerating reasoning, is flexible and convenient, and is suitable for integration with actual systems. For more detailed introduction, please refer to the document [Classification prediction framework](https://paddleclas.readthedocs.io/zh_CN/latest/extension/paddle_inference.html). - -Next, we first introduce how to convert the trained model into an inference model, and then we will introduce text detection, text recognition, and the connection of the two based on prediction engine inference. - -## Training model to inference model -### Detection model to inference model - -Download the super lightweight Chinese detection model: -``` -wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar && tar xf ./ch_lite/ch_det_mv3_db.tar -C ./ch_lite/ -``` -The above model is a DB algorithm trained with MobileNetV3 as the backbone. To convert the trained model into an inference model, just run the following command: -``` -python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/det_mv3_db/best_accuracy Global.save_inference_dir=./inference/det_db/ -``` -When transferring an inference model, the configuration file used is the same as the configuration file used during training. In addition, you also need to set the Global.checkpoints and Global.save_inference_dir parameters in the configuration file. -Global.checkpoints points to the model parameter file saved in training, and Global.save_inference_dir is the directory where the generated inference model is to be saved. -After the conversion is successful, there are two files in the `save_inference_dir` directory: -``` -inference/det_db/ - └─ model Check the program file of inference model - └─ params Check the parameter file of the inference model -``` - -### Recognition model to inference model - -Download the ultra-lightweight Chinese recognition model: -``` -wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar && tar xf ./ch_lite/ch_rec_mv3_crnn.tar -C ./ch_lite/ -``` - -The identification model is converted to the inference model in the same way as the detection, as follows: -``` -python3 tools/export_model.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints=./ch_lite/rec_mv3_crnn/best_accuracy \ - Global.save_inference_dir=./inference/rec_crnn/ -``` - -If you are a model trained on your own data set and you have adjusted the dictionary file of Chinese characters, please pay attention to whether the character_dict_path in the configuration file is the required dictionary file. - -After the conversion is successful, there are two files in the directory: -``` -/inference/rec_crnn/ - └─ model Identify the program file of the inference model - └─ params Identify the parameter file of the inference model -``` - -## Text detection model inference - -The following will introduce the ultra-lightweight Chinese detection model reasoning, DB text detection model reasoning and EAST text detection model reasoning. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters. - -### 1.Ultra-lightweight Chinese detection model inference - -Super lightweight Chinese detection model inference, you can execute the following commands: - -``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" -``` - -The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: - -![](imgs_results/det_res_2.jpg) - -By setting the size of the parameter det_max_side_len, the maximum value of picture normalization in the detection algorithm is changed. When the length and width of the picture are less than det_max_side_len, the original picture is used for prediction, otherwise the picture is scaled to the maximum value for prediction. This parameter is set to det_max_side_len=960 by default. If the resolution of the input picture is relatively large and you want to use a larger resolution for prediction, you can execute the following command: - -``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --det_max_side_len=1200 -``` - -If you want to use the CPU for prediction, execute the command as follows -``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False -``` - -### 2.DB text detection model inference - -First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert: - -``` -# Set the yml configuration file of the training algorithm after -c -# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. -# The Global.save_inference_dir parameter sets the address where the converted model will be saved. - -python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.checkpoints="./models/det_r50_vd_db/best_accuracy" Global.save_inference_dir="./inference/det_db" -``` - -DB text detection model inference, you can execute the following command: - -``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/" -``` - -The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: - -![](imgs_results/det_res_img_10_db.jpg) - -**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection effect on Chinese text images. - -### 3.EAST text detection model inference - -First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English data set as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert: - -``` -# Set the yml configuration file of the training algorithm after -c -# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. -# The Global.save_inference_dir parameter sets the address where the converted model will be saved. - -python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.checkpoints="./models/det_r50_vd_east/best_accuracy" Global.save_inference_dir="./inference/det_east" -``` - -EAST text detection model inference, you need to set the parameter det_algorithm, specify the detection algorithm type as EAST, you can execute the following command: - -``` -python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" -``` -The visual text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with'det_res'. Examples of results are as follows: - -![](imgs_results/det_res_img_10_east.jpg) - -**Note**: The Python version of NMS used in EAST post-processing in this codebase, so the prediction speed is time-consuming. If you use the C++ version, there will be a significant speedup. - - -## Text recognition model inference - -The following will introduce the ultra-lightweight Chinese recognition model reasoning and CTC loss-based recognition model reasoning. **The recognition model reasoning based on Attention loss is still being debugged**. For Chinese text recognition, it is recommended to prefer the recognition model based on CTC loss. In practice, it is also found that the effect based on Attention loss is not as good as the recognition model based on CTC loss. - - -### 1.Ultra-lightweight Chinese recognition model inference - -Super lightweight Chinese recognition model inference, you can execute the following commands: - -``` -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./inference/rec_crnn/" -``` - -![](imgs_words/ch/word_4.jpg) - -After executing the command, the prediction results (recognized text and score) of the above image will be printed on the screen. - -Predicts of ./doc/imgs_words/ch/word_4.jpg:['实力活力', 0.89552695] - - -### 2.Identification model reasoning based on CTC loss - -Taking STAR-Net as an example, we introduce the identification model reasoning based on CTC loss. CRNN and Rosetta are used in a similar way, without setting the recognition algorithm parameter rec_algorithm. - -First, convert the model saved in the STAR-Net text recognition training process into an inference model. Based on Resnet34_vd backbone network, using MJSynth and SynthText two English text recognition synthetic data set training -The example of the model ([model download address](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_ctc.tar)) - -``` -# Set the yml configuration file of the training algorithm after -c -# The Global.checkpoints parameter sets the address of the training model to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams. -# The Global.save_inference_dir parameter sets the address where the converted model will be saved. - -python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.checkpoints="./models/rec_r34_vd_tps_bilstm_ctc/best_accuracy" Global.save_inference_dir="./inference/starnet" -``` - -STAR-Net text recognition model inference can execute the following commands: - -``` -python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" -``` -![](imgs_words_en/word_336.png) - -After executing the command, the recognition result of the above image is as follows: - -Predicts of ./doc/imgs_words_en/word_336.png:['super', 0.9999555] - -**Note**:Since the above model refers to [DTRB] (https://arxiv.org/abs/1904.01906) text recognition training and evaluation process, it is different from the training of ultra-lightweight Chinese recognition model in two aspects: - -- The image resolution used in training is different, and the image resolution used in training the above model is [3,32,100], While the Chinese model training, in order to ensure the recognition effect of long text, the image resolution used in training is [3, 32, 320]. The default shape parameter of the predictive inference program is the image resolution used in training Chinese, that is [3, 32, 320]. Therefore, when reasoning the above English model here, you need to set the shape of the recognition image through the parameter rec_image_shape. - -- Character list, the experiment in the DTRB paper is only for 26 lowercase English mothers and 10 numbers, a total of 36 characters. All upper and lower case characters are converted to lower case characters, and characters not in the above list are ignored and considered as spaces. Therefore, no character dictionary is entered here, but a dictionary is generated by the following command. Therefore, the parameter rec_char_type needs to be set during inference, which is specified as "en" in English. - -``` -self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" -dict_character = list(self.character_str) -``` - -## Text detection, recognition tandem reasoning - -### 1.Ultra-lightweight Chinese OCR model reasoning - -When performing prediction, you need to specify the path of a single image or a collection of images through the parameter image_dir, the parameter det_model_dir specifies the path to detect the inference model, and the parameter rec_model_dir specifies the path to identify the inference model. The visual recognition results are saved to the ./inference_results folder by default. - -``` -python3 tools/infer/predict_system.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --rec_model_dir="./inference/rec_crnn/" -``` - -After executing the command, the recognition result image is as follows: - -![](imgs_results/2.jpg) - -### 2.Other model reasoning - -If you want to try other detection algorithms or recognition algorithms, please refer to the above text detection model inference and text recognition model inference, update the corresponding configuration and model, the following gives the EAST text detection and STAR-Net text recognition execution commands: - -``` -python3 tools/infer/predict_system.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_east/" --det_algorithm="EAST" --rec_model_dir="./inference/starnet/" --rec_image_shape="3, 32, 100" --rec_char_type="en" -``` - -After executing the command, the recognition result image is as follows: - -![](imgs_results/img_10.jpg) diff --git a/doc/installation.md b/doc/installation.md deleted file mode 100644 index bb3b2c297ce6cc73bc45efdbc735a9c2f5e68100..0000000000000000000000000000000000000000 --- a/doc/installation.md +++ /dev/null @@ -1,78 +0,0 @@ -## Quick installation - -After testing PaddleOCR can run on glibc 2.23, you can also test other glibc versions or install glic 2.23 -PaddleOCR working environment -- PaddlePaddle1.7 -- python3 -- glibc 2.23 - -It is recommended to use the docker provided by us to run PaddleOCR, please refer to the use of docker [link](https://docs.docker.com/get-started/). - -1. (Recommended) Prepare a docker environment. The first time you use this image, it will be downloaded automatically. Please be patient. -``` -# Switch to the working directory -cd /home/Projects -# You need to create a docker container for the first run, and do not need to run the current command when you run it again -# Create a docker container named ppocr and map the current directory to the /paddle directory of the container - -If you want to use docker in a CPU environment, use docker instead of nvidia-docker to create docker -sudo docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash -``` -If your machine is installed CUDA9, please run the following command to create a container -``` -sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash -``` -If your machine is installed with CUDA10, please run the following command to create a container -``` -sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev /bin/bash -``` -You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that fits your machine. -``` -# ctrl+P+Q can exit docker and re-enter docker using the following command -sudo docker container exec -it ppocr /bin/bash -``` - -Note: if docker pull is too slow, you can manually download and load docker according to the following steps. Taking cuda9 docker as an example, using cuda10 docker only needs to change cuda9 to cuda10 -``` -# Download the CUDA9 docker compressed file and unzip it -wget https://paddleocr.bj.bcebos.com/docker/docker_pdocr_cuda9.tar.gz -# To reduce download time, the uploaded docker image is compressed and needs to be decompressed -tar zxf docker_pdocr_cuda9.tar.gz -# Create image -docker load < docker_pdocr_cuda9.tar -# After completing the above steps, check whether the downloaded image is loaded through docker images -docker images -# If you have the following output after executing docker images, you can follow step 1 to create a docker environment. -hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829 -``` - -2. Install PaddlePaddle Fluid v1.7 (the higher version is not supported yet, the adaptation work is in progress) -``` -pip3 install --upgrade pip - -# If your machine is installed CUDA9, please run the following command to install -python3 -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple - -# If your machine is installed CUDA10, please run the following command to install -python3 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple -``` -For more version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. - - -3. Clone PaddleOCR repo code -``` -# Recommend -git clone https://github.com/PaddlePaddle/PaddleOCR - -# If you cannot pull because of network problems, you can also choose to use the hosting on the code cloud: - -git clone https://gitee.com/paddlepaddle/PaddleOCR - -# Note: Code cloud hosting code may not be able to synchronize this github project update in real time, there is a delay of 3~5 days, please use the recommended method first. -``` - -4. Install third-party libraries -``` -cd PaddleOCR -pip3 install -r requirments.txt -``` diff --git a/doc/recognition.md b/doc/recognition.md deleted file mode 100644 index 2f4be5bfe0e8f47bb40b9cd8ca8fb85d0a49587f..0000000000000000000000000000000000000000 --- a/doc/recognition.md +++ /dev/null @@ -1,222 +0,0 @@ -## Text recognition - -### Data preparation - - -PaddleOCR pupports two data formats: `lmdb` used to train public data and debug algorithms; `General Data` to train your own data: - -Please set the dataset as follows: - -The default storage path for training data is `PaddleOCR/train_data`, if you already have a data set on your disk, just create a soft link to the data set directory: - -``` -ln -sf /train_data/dataset -``` - - -* Data download - -If you do not have a data set locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here),download the lmdb format dataset required by benchmark - -* Use your own dataset: - -If you want to use your own data for training, please refer to the following to organize your data. - -- Training set - -First put the training pictures in the same folder (train_images), and use a txt file (rec_gt_train.txt) to record the picture path and label. - -* Note: by default, please split the image path and image label with \t, if you use other methods to split, it will cause training error - -``` -" Image file name Image annotation " - -train_data/train_0001.jpg 简单可依赖 -train_data/train_0002.jpg 用科技让复杂的世界更简单 -``` -PaddleOCR provides a label file for training the icdar2015 dataset, which can be downloaded in the following ways: - -``` -# Training set label -wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt -# Test Set Label -wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt -``` - -The final training set should have the following file structure: - -``` -|-train_data - |-ic15_data - |- rec_gt_train.txt - |- train - |- word_001.png - |- word_002.jpg - |- word_003.jpg - | ... -``` - -- Test set - -Similar to the training set, the test set also needs to provide a folder containing all pictures (test) and a rec_gt_test.txt. The structure of the test set is as follows: - -``` -|-train_data - |-ic15_data - |- rec_gt_test.txt - |- test - |- word_001.jpg - |- word_002.jpg - |- word_003.jpg - | ... -``` - -- Dictionary - -Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index. - -Therefore, the dictionary needs to contain all the characters that you want to be recognized correctly. {word_dict_name}.txt needs to be written in the following format and saved in the `utf-8` encoding format: - -``` -l -d -a -d -r -n -``` - -word_dict.txt There is a single word in each line, which maps characters and numeric indexes together, and "and" will be mapped to [2 5 1] - -`ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters, - -`ppocr/utils/ic15_dict.txt` is an English dictionary with 36 characters, - -You can use them as needed. - -To customize the dic file, please modify the `character_dict_path` field in `configs/rec/rec_icdar15_train.yml` and set `character_type` to `ch`.。 - -### Start training - -PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: - -First download the pretrain model, you can download the trained model to finetune on the icdar2015 data - -``` -cd PaddleOCR/ -# Download the pre-trained model of MobileNetV3 -wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/rec_mv3_none_bilstm_ctc.tar -# Decompress model parameters -cd pretrain_models -tar -xf rec_mv3_none_bilstm_ctc.tar && rm -rf rec_mv3_none_bilstm_ctc.tar -``` - -Start training: - -``` -# Set PYTHONPATH path -export PYTHONPATH=$PYTHONPATH:. -# GPU training Support single card and multi-card training, specify the card number through CUDA_VISIBLE_DEVICES -export CUDA_VISIBLE_DEVICES=0,1,2,3 -# Training icdar15 English data -python3 tools/train.py -c configs/rec/rec_icdar15_train.yml -``` - -PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter. By default, the best acc model is saved as `output/rec_CRNN/best_accuracy` during the evaluation process. - -If the verification set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training. - -* Tip: You can use the `-c` parameter to select multiple model configurations under the `configs/rec/` path for training. The recognition algorithms supported by PaddleOCR are: - - -| Configuration file | Algorithm name | backbone | trans | seq | pred | -| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | -| rec_chinese_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | -| rec_icdar15_train.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | -| rec_mv3_none_bilstm_ctc.yml | CRNN | Mobilenet_v3 large 0.5 | None | BiLSTM | ctc | -| rec_mv3_none_none_ctc.yml | Rosetta | Mobilenet_v3 large 0.5 | None | None | ctc | -| rec_mv3_tps_bilstm_ctc.yml | STARNet | Mobilenet_v3 large 0.5 | tps | BiLSTM | ctc | -| rec_mv3_tps_bilstm_attn.yml | RARE | Mobilenet_v3 large 0.5 | tps | BiLSTM | attention | -| rec_r34_vd_none_bilstm_ctc.yml | CRNN | Resnet34_vd | None | BiLSTM | ctc | -| rec_r34_vd_none_none_ctc.yml | Rosetta | Resnet34_vd | None | None | ctc | -| rec_r34_vd_tps_bilstm_attn.yml | RARE | Resnet34_vd | tps | BiLSTM | attention | -| rec_r34_vd_tps_bilstm_ctc.yml | STARNet | Resnet34_vd | tps | BiLSTM | ctc | - -For training Chinese data, it is recommended to use `rec_chinese_lite_train.yml`. If you want to try the effect of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file: - -Take `rec_mv3_none_none_ctc.yml` as an example: -``` -Global: - ... - # Modify image_shape to fit long text - image_shape: [3, 32, 320] - ... - # Modify character type - character_type: ch - # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary - character_dict_path: ./ppocr/utils/ppocr_keys_v1.txt - ... - # Modify reader type - reader_yml: ./configs/rec/rec_chinese_reader.yml - ... - -... -``` -**Note that the configuration file for prediction/evaluation must be consistent with the training.** - - - -### Evaluation - -The evaluation data set can be modified via `configs/rec/rec_icdar15_reader.yml` setting of `label_file_path` in EvalReader. - -``` -export CUDA_VISIBLE_DEVICES=0 -# GPU evaluation, Global.checkpoints is the weight to be tested -python3 tools/eval.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy -``` - -### prediction - -* Training engine prediction - -The model trained using PaddleOCR can be quickly predicted by the following script. - -The default prediction picture is stored in `infer_img`, and the weight is specified via `-o Global.checkpoints`: - -``` -# Predict English results -python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/en/word_1.jpg -``` - -Input image: - -![](./imgs_words/en/word_1.png) - -Get the prediction result of the input image: - -``` -infer_img: doc/imgs_words/en/word_1.png - index: [19 24 18 23 29] - word : joint -``` - -The configuration file used for prediction must be consistent with the training. For example, you completed the training of the Chinese model through `python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml`, -You can use the following command to predict the Chinese model. - -``` -# Predict Chinese results -python3 tools/infer_rec.py -c configs/rec/rec_chinese_lite_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy TestReader.infer_img=doc/imgs_words/ch/word_1.jpg -``` - -Input image: - -![](./imgs_words/ch/word_1.jpg) - -Get the prediction result of the input image: - -``` -infer_img: doc/imgs_words/ch/word_1.jpg - index: [2092 177 312 2503] - word : 韩国小馆 -```