diff --git a/doc/doc_en/datasets_en.md b/doc/doc_en/datasets_en.md index 47e6ec48b0cc9935893a9aecaa8a74455b400b63..61d2033b4fe8f0077ad66fb9ae2cd559ce29fd65 100644 --- a/doc/doc_en/datasets_en.md +++ b/doc/doc_en/datasets_en.md @@ -13,8 +13,11 @@ In addition to opensource data, users can also use synthesis tools to synthesize - **Data sources**:https://ai.baidu.com/broad/introduction?dataset=lsvt - **Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure: ![](../datasets/LSVT_1.jpg) + (a) Fully labeled data + ![](../datasets/LSVT_2.jpg) + (b) Weakly labeled data - **Download link**:https://ai.baidu.com/broad/download?dataset=lsvt diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index 6e4aede31938434fefba8bc205a967d092e32249..6bb496c91c32702eb408ea5afcfded33301d7535 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -33,7 +33,7 @@ The image annotation after json.dumps() encoding is a list containing multiple d If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format. -## QUICKSTART +## TRAINING First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs. ``` diff --git a/doc/doc_en/inference_en.md b/doc/doc_en/inference_en.md index 95a17f2d119c2178fd92bce7020b93cc8444fed5..0fd7a3725df653b29703a23618675fbc72a6e342 100644 --- a/doc/doc_en/inference_en.md +++ b/doc/doc_en/inference_en.md @@ -1,5 +1,5 @@ -# Prediction from inference model +# PREDICTION FROM INFERENCE MODEL The inference model (the model saved by fluid.io.save_inference_model) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment. @@ -9,7 +9,7 @@ Compared with the checkpoints model, the inference model will additionally save Next, we first introduce how to convert a trained model into an inference model, and then we will introduce text detection, text recognition, and the concatenation of them based on inference model. -## Convert training model to inference model +## CONVERT TRAINING MODEL TO INFERENCE MODEL ### Convert detection model to inference model Download the lightweight Chinese detection model: @@ -51,11 +51,11 @@ After the conversion is successful, there are two files in the directory: └─ params Identify the parameter files of the inference model ``` -## Text detection model inference +## TEXT DETECTION MODEL INFERENCE The following will introduce the lightweight Chinese detection model inference, DB text detection model inference and EAST text detection model inference. The default configuration is based on the inference setting of the DB text detection model. Because EAST and DB algorithms are very different, when inference, it is necessary to adapt the EAST text detection algorithm by passing in corresponding parameters. -### 1. lightweight Chinese detection model inference +### 1. LIGHTWEIGHT CHINESE DETECTION MODEL INFERENCE For lightweight Chinese detection model inference, you can execute the following commands: @@ -78,7 +78,7 @@ If you want to use the CPU for prediction, execute the command as follows python3 tools/infer/predict_det.py --image_dir="./doc/imgs/2.jpg" --det_model_dir="./inference/det_db/" --use_gpu=False ``` -### 2. DB text detection model inference +### 2. DB TEXT DETECTION MODEL INFERENCE First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)), you can use the following command to convert: @@ -102,7 +102,7 @@ The visualized text detection results are saved to the `./inference_results` fol **Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images. -### 3. EAST text detection model inference +### 3. EAST TEXT DETECTION MODEL INFERENCE First, convert the model saved in the EAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as an example ([model download link](https://paddleocr.bj.bcebos.com/det_r50_vd_east.tar)), you can use the following command to convert: @@ -126,7 +126,7 @@ The visualized text detection results are saved to the `./inference_results` fol **Note**: The Python version of NMS in EAST post-processing used in this codebase so the prediction speed is quite slow. If you use the C++ version, there will be a significant speedup. -## Text recognition model inference +## TEXT RECOGNITION MODEL INFERENCE The following will introduce the lightweight Chinese recognition model inference, other CTC-based and Attention-based text recognition models inference. For Chinese text recognition, it is recommended to choose the recognition model based on CTC loss. In practice, it is also found that the result of the model based on Attention loss is not as good as the one based on CTC loss. In addition, if the characters dictionary is modified during training, make sure that you use the same characters set during inferencing. Please check below for details.