-[3. Data and vertical scenes](#2-data-and-vertical-scenes)
*[3.1 Training data](#21-training-data)
*[3.2 Vertical scene](#22-vertical-scene)
*[3.3 Build your own data set](#23-build-your-own-data-set)
*[4. FAQ](#3-faq)
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene.
<aname="1-Yml-Configuration"></a>
## 1. Yml configuration
The PaddleOCR model uses configuration files to manage network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to complete the model training. When optimized, the configuration can be completed by modifying the parameters in the configuration file, which is simple to use and convenient to modify.
For the complete configuration file description, please refer to [Configuration File](./config_en.md)
<aname="1-basic-concepts"></a>
# 1. Basic concepts
OCR (Optical Character Recognition) refers to the process of analyzing and recognizing images to obtain text and layout information. It is a typical computer vision task.
It usually consists of two subtasks: text detection and text recognition.
## 2. Basic concepts
The following parameters need to be paid attention to when tuning the model:
In the process of model training, some hyperparameters need to be manually adjusted to help the model obtain the optimal index at the least loss. Different data volumes may require different hyper-parameters. When you want to finetune your own data or tune the model effect, there are several parameter adjustment strategies for reference:
<aname="11-learning-rate"></a>
## 1.1 Learning rate
### 2.1 Learning rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
...
...
@@ -45,7 +52,7 @@ and the learning rate is the same in each stage.
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
<aname="12-regularization"></a>
## 1.2 Regularization
### 2.2 Regularization
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods.
L1 and L2 regularization are the most commonly used regularization methods.
...
...
@@ -61,7 +68,7 @@ Optimizer:
factor: 2.0e-05
```
<aname="13-evaluation-indicators-"></a>
## 1.3 Evaluation indicators
### 2.3 Evaluation indicators
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
...
...
@@ -71,11 +78,11 @@ Optimizer:
<aname="2-data-and-vertical-scenes"></a>
# 2. Data and vertical scenes
## 3. Data and vertical scenes
<aname="21-training-data"></a>
## 2.1 Training data
### 3.1 Training data
The current open source models, data sets and magnitudes are as follows:
...
...
@@ -92,14 +99,14 @@ Among them, the public data sets are all open source, users can search and downl
<aname="22-vertical-scene"></a>
## 2.2 Vertical scene
### 3.2 Vertical scene
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
<aname="23-build-your-own-data-set"></a>
## 2.3 Build your own data set
### 3.3 Build your own data set
There are several experiences for reference when constructing the data set:
...
...
@@ -117,7 +124,7 @@ There are several experiences for reference when constructing the data set:
<aname="3-faq"></a>
# 3. FAQ
## 4. FAQ
**Q**: How to choose a suitable network input shape when training CRNN recognition?