diff --git a/doc/doc_en/detection_en.md b/doc/doc_en/detection_en.md index 618e20fb5e2a9a7afd67bb7d15646971b88365ee..06b14440377bb90d5ef4e6e69e96628cc9f2bc9c 100644 --- a/doc/doc_en/detection_en.md +++ b/doc/doc_en/detection_en.md @@ -9,12 +9,15 @@ This section uses the icdar2015 dataset as an example to introduce the training, * [2.1 Start Training](#21-start-training) * [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training) * [2.3 Training with New Backbone](#23-training-with-new-backbone) - * [2.4 Training with knowledge distillation](#24) + * [2.4 Mixed Precision Training](#24-amp-training) + * [2.5 Distributed Training](#25-distributed-training) + * [2.6 Training with knowledge distillation](#26) + * [2.7 Training on other platform(Windows/macOS/Linux DCU)](#27) - [3. Evaluation and Test](#3-evaluation-and-test) * [3.1 Evaluation](#31-evaluation) * [3.2 Test](#32-test) - [4. Inference](#4-inference) -- [5. FAQ](#2-faq) +- [5. FAQ](#5-faq) ## 1. Data and Weights Preparation @@ -175,11 +178,44 @@ After adding the four-part modules of the network, you only need to configure th **NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md). +### 2.4 Mixed Precision Training -### 2.4 Training with knowledge distillation +If you want to speed up your training further, you can use [Auto Mixed Precision Training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), taking a single machine and a single gpu as an example, the commands are as follows: + +```shell +python3 tools/train.py -c configs/det/det_mv3_db.yml \ + -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \ + Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True + ``` + +### 2.5 Distributed Training + +During multi-machine multi-gpu training, use the `--ips` parameter to set the used machine IP address, and the `--gpus` parameter to set the used GPU ID: + +```bash +python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \ + -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained +``` + +**Note:** When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. In addition, training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`. + +### 2.6 Training with knowledge distillation Knowledge distillation is supported in PaddleOCR for text detection training process. For more details, please refer to [doc](./knowledge_distillation_en.md). +### 2.7 Training on other platform(Windows/macOS/Linux DCU + +- Windows GPU/CPU +The Windows platform is slightly different from the Linux platform: +Windows platform only supports `single gpu` training and inference, specify GPU for training `set CUDA_VISIBLE_DEVICES=0` +On the Windows platform, DataLoader only supports single-process mode, so you need to set `num_workers` to 0; + +- macOS +GPU mode is not supported, you need to set `use_gpu` to False in the configuration file, and the rest of the training evaluation prediction commands are exactly the same as Linux GPU. + +- Linux DCU +Running on a DCU device requires setting the environment variable `export HIP_VISIBLE_DEVICES=0,1,2,3`, and the rest of the training and evaluation prediction commands are exactly the same as the Linux GPU. + ## 3. Evaluation and Test ### 3.1 Evaluation diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md index c3700070b9d01c89cf8189a7af5f13d877114fb2..383835b38f9524df17d5958bee09e58887e6d822 100644 --- a/doc/doc_en/recognition_en.md +++ b/doc/doc_en/recognition_en.md @@ -1,21 +1,25 @@ # Text Recognition - [1. Data Preparation](#DATA_PREPARATION) - - [1.1 Costom Dataset](#Costom_Dataset) - - [1.2 Dataset Download](#Dataset_download) - - [1.3 Dictionary](#Dictionary) - - [1.4 Add Space Category](#Add_space_category) - + * [1.1 Costom Dataset](#Costom_Dataset) + * [1.2 Dataset Download](#Dataset_download) + * [1.3 Dictionary](#Dictionary) + * [1.4 Add Space Category](#Add_space_category) + * [1.5 Data Augmentation](#Data_Augmentation) - [2. Training](#TRAINING) - - [2.1 Data Augmentation](#Data_Augmentation) - - [2.2 General Training](#Training) - - [2.3 Multi-language Training](#Multi_language) - - [2.4 Training with Knowledge Distillation](#kd) - -- [3. Evaluation](#EVALUATION) - -- [4. Prediction](#PREDICTION) -- [5. Convert to Inference Model](#Inference) + * [2.1 Start Training](#21-start-training) + * [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training) + * [2.3 Training with New Backbone](#23-training-with-new-backbone) + * [2.4 Mixed Precision Training](#24-amp-training) + * [2.5 Distributed Training](#25-distributed-training) + * [2.6 Training with knowledge distillation](#kd) + * [2.7 Multi-language Training](#Multi_language) + * [2.8 Training on other platform(Windows/macOS/Linux DCU)](#28) +- [3. Evaluation and Test](#3-evaluation-and-test) + * [3.1 Evaluation](#31-evaluation) + * [3.2 Test](#32-test) +- [4. Inference](#4-inference) +- [5. FAQ](#5-faq) ## 1. Data Preparation @@ -173,11 +177,8 @@ If you need to customize dic file, please add character_dict_path field in confi If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`. - -## 2.Training - -### 2.1 Data Augmentation +### 1.5 Data Augmentation PaddleOCR provides a variety of data augmentation methods. All the augmentation methods are enabled by default. @@ -185,11 +186,14 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand Each disturbance method is selected with a 40% probability during the training process. For specific code implementation, please refer to: [rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py) - -### 2.2 General Training + +## 2.Training PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example: + +### 2.1 Start Training + First download the pretrain model, you can download the trained model to finetune on the icdar2015 data: ``` @@ -305,8 +309,99 @@ Eval: ``` **Note that the configuration file for prediction/evaluation must be consistent with the training.** + +### 2.2 Load Trained Model and Continue Training + +If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded. + +For example: +```shell +python3 tools/train.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints=./your/trained/model +``` + +**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrained_model`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrained_model` will be loaded. + + +### 2.3 Training with New Backbone + +The network part completes the construction of the network, and PaddleOCR divides the network into four parts, which are under [ppocr/modeling](../../ppocr/modeling). The data entering the network will pass through these four parts in sequence(transforms->backbones-> +necks->heads). + +```bash +├── architectures # Code for building network +├── transforms # Image Transformation Module +├── backbones # Feature extraction module +├── necks # Feature enhancement module +└── heads # Output module +``` + +If the Backbone to be replaced has a corresponding implementation in PaddleOCR, you can directly modify the parameters in the `Backbone` part of the configuration yml file. + +However, if you want to use a new Backbone, an example of replacing the backbones is as follows: + +1. Create a new file under the [ppocr/modeling/backbones](../../ppocr/modeling/backbones) folder, such as my_backbone.py. +2. Add code in the my_backbone.py file, the sample code is as follows: + +```python +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +class MyBackbone(nn.Layer): + def __init__(self, *args, **kwargs): + super(MyBackbone, self).__init__() + # your init code + self.conv = nn.xxxx + + def forward(self, inputs): + # your network forward + y = self.conv(inputs) + return y +``` + +3. Import the added module in the [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py) file. + +After adding the four-part modules of the network, you only need to configure them in the configuration file to use, such as: + +```yaml + Backbone: + name: MyBackbone + args1: args1 +``` + +**NOTE**: More details about replace Backbone and other mudule can be found in [doc](add_new_algorithm_en.md). + + +### 2.4 Mixed Precision Training + +If you want to speed up your training further, you can use [Auto Mixed Precision Training](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), taking a single machine and a single gpu as an example, the commands are as follows: + +```shell +python3 tools/train.py -c configs/rec/rec_icdar15_train.yml \ + -o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train \ + Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True + ``` + + +### 2.5 Distributed Training + +During multi-machine multi-gpu training, use the `--ips` parameter to set the used machine IP address, and the `--gpus` parameter to set the used GPU ID: + +```bash +python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml \ + -o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train +``` + +**Note:** When using multi-machine and multi-gpu training, you need to replace the ips value in the above command with the address of your machine, and the machines need to be able to ping each other. In addition, training needs to be launched separately on multiple machines. The command to view the ip address of the machine is `ifconfig`. + + +### 2.6 Training with Knowledge Distillation + +Knowledge distillation is supported in PaddleOCR for text recognition training process. For more details, please refer to [doc](./knowledge_distillation_en.md). + -### 2.3 Multi-language Training +### 2.7 Multi-language Training Currently, the multi-language algorithms supported by PaddleOCR are: @@ -362,25 +457,35 @@ Eval: ... ``` - + +### 2.8 Training on other platform(Windows/macOS/Linux DCU -### 2.4 Training with Knowledge Distillation +- Windows GPU/CPU +The Windows platform is slightly different from the Linux platform: +Windows platform only supports `single gpu` training and inference, specify GPU for training `set CUDA_VISIBLE_DEVICES=0` +On the Windows platform, DataLoader only supports single-process mode, so you need to set `num_workers` to 0; -Knowledge distillation is supported in PaddleOCR for text recognition training process. For more details, please refer to [doc](./knowledge_distillation_en.md). +- macOS +GPU mode is not supported, you need to set `use_gpu` to False in the configuration file, and the rest of the training evaluation prediction commands are exactly the same as Linux GPU. - +- Linux DCU +Running on a DCU device requires setting the environment variable `export HIP_VISIBLE_DEVICES=0,1,2,3`, and the rest of the training and evaluation prediction commands are exactly the same as the Linux GPU. -## 3. Evalution + +## 3. Evaluation and Test -The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file. + +### 3.1 Evaluation + +The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file. The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file. ``` # GPU evaluation, Global.checkpoints is the weight to be tested python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy ``` - -## 4. Prediction + +### 3.2 Test Using the model trained by paddleocr, you can quickly get prediction through the following script. @@ -442,9 +547,14 @@ infer_img: doc/imgs_words/ch/word_1.jpg result: ('韩国小馆', 0.997218) ``` - + +## 4. Inference + +The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment. + +The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training. -## 5. Convert to Inference Model +Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems. The recognition model is converted to the inference model in the same way as the detection, as follows: @@ -462,7 +572,7 @@ If you have a model trained on your own dataset with a different dictionary file After the conversion is successful, there are three files in the model save directory: ``` -inference/det_db/ +inference/rec_crnn/ ├── inference.pdiparams # The parameter file of recognition inference model ├── inference.pdiparams.info # The parameter information of recognition inference model, which can be ignored └── inference.pdmodel # The program file of recognition model @@ -475,3 +585,10 @@ inference/det_db/ ``` python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path" ``` + + +## 5. FAQ + +Q1: After the training model is transferred to the inference model, the prediction effect is inconsistent? + +**A**: There are many such problems, and the problems are mostly caused by inconsistent preprocessing and postprocessing parameters when the trained model predicts and the preprocessing and postprocessing parameters when the inference model predicts. You can compare whether there are differences in preprocessing, postprocessing, and prediction in the configuration files used for training.