(1)基于回归的方法分为box回归和像素值回归。a. 采用box回归的方法主要有CTPN、Textbox系列和EAST,这类算法对规则形状文本检测效果较好,但无法准确检测不规则形状文本。 b. 像素值回归的方法主要有CRAFT和SA-Text,这类算法能够检测弯曲文本且对小文本效果优秀但是实时性能不够。
If you want to use your own data for training, please refer to the following to organize your data.
If you want to use your own data for training, please refer to the following to organize your data.
...
@@ -84,7 +84,7 @@ Similar to the training set, the test set also needs to be provided a folder con
...
@@ -84,7 +84,7 @@ Similar to the training set, the test set also needs to be provided a folder con
```
```
<aname="Dataset_download"></a>
<aname="Dataset_download"></a>
#### 1.2 Dataset download
### 1.2 Dataset download
- ICDAR2015
- ICDAR2015
...
@@ -121,7 +121,7 @@ The multi-language model training method is the same as the Chinese model. The t
...
@@ -121,7 +121,7 @@ The multi-language model training method is the same as the Chinese model. The t
<aname="Dictionary"></a>
<aname="Dictionary"></a>
#### 1.3 Dictionary
### 1.3 Dictionary
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.
...
@@ -166,17 +166,17 @@ To customize the dict file, please modify the `character_dict_path` field in `co
...
@@ -166,17 +166,17 @@ To customize the dict file, please modify the `character_dict_path` field in `co
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.
<aname="Add_space_category"></a>
<aname="Add_space_category"></a>
#### 1.4 Add space category
### 1.4 Add space category
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.
**Note: use_space_char only takes effect when character_type=ch**
**Note: use_space_char only takes effect when character_type=ch**
<aname="TRAINING"></a>
<aname="TRAINING"></a>
### 2 TRAINING
## 2 TRAINING
<aname="Data_Augmentation"></a>
<aname="Data_Augmentation"></a>
#### 2.1 Data Augmentation
### 2.1 Data Augmentation
PaddleOCR provides a variety of data augmentation methods. All the augmentation methods are enabled by default.
PaddleOCR provides a variety of data augmentation methods. All the augmentation methods are enabled by default.
Each disturbance method is selected with a 40% probability during the training process. For specific code implementation, please refer to: [rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
Each disturbance method is selected with a 40% probability during the training process. For specific code implementation, please refer to: [rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
<aname="Training"></a>
<aname="Training"></a>
#### 2.2 General Training
### 2.2 General Training
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:
...
@@ -304,7 +304,7 @@ Eval:
...
@@ -304,7 +304,7 @@ Eval:
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
**Note that the configuration file for prediction/evaluation must be consistent with the training.**
<aname="Multi_language"></a>
<aname="Multi_language"></a>
#### 2.3 Multi-language Training
### 2.3 Multi-language Training
Currently, the multi-language algorithms supported by PaddleOCR are:
Currently, the multi-language algorithms supported by PaddleOCR are:
...
@@ -361,7 +361,7 @@ Eval:
...
@@ -361,7 +361,7 @@ Eval:
```
```
<aname="EVALUATION"></a>
<aname="EVALUATION"></a>
### 3 EVALUATION
## 3 EVALUATION
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.
-[3. Data and vertical scenes](#3-data-and-vertical-scenes)
*[3.1 Training data](#31-training-data)
*[3.2 Vertical scene](#32-vertical-scene)
*[3.3 Build your own data set](#33-build-your-own-data-set)
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training.
This article will introduce the basic concepts that need to be mastered during model training and the tuning methods during training.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene.
At the same time, it will briefly introduce the components of the PaddleOCR model training data and how to prepare the data finetune model in the vertical scene.
### 1. Basic concepts
<aname="1-basic-concepts"></a>
# 1. Basic concepts
OCR (Optical Character Recognition) refers to the process of analyzing and recognizing images to obtain text and layout information. It is a typical computer vision task.
OCR (Optical Character Recognition) refers to the process of analyzing and recognizing images to obtain text and layout information. It is a typical computer vision task.
It usually consists of two subtasks: text detection and text recognition.
It usually consists of two subtasks: text detection and text recognition.
The following parameters need to be paid attention to when tuning the model:
The following parameters need to be paid attention to when tuning the model:
#### 1.1 Learning rate
<aname="11-learning-rate"></a>
## 1.1 Learning rate
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
The learning rate is one of the important hyperparameters for training neural networks. It represents the step length of the gradient moving to the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
A variety of learning rate update strategies are provided in PaddleOCR, which can be modified through configuration files, for example:
...
@@ -31,7 +44,8 @@ and the learning rate is the same in each stage.
...
@@ -31,7 +44,8 @@ and the learning rate is the same in each stage.
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
warmup_epoch means that in the first 5 epochs, the learning rate will gradually increase from 0 to base_lr. For all strategies, please refer to the code [learning_rate.py](../../ppocr/optimizer/learning_rate.py).
#### 1.2 Regularization
<aname="12-regularization"></a>
## 1.2 Regularization
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods.
Regularization can effectively avoid algorithm overfitting. PaddleOCR provides L1 and L2 regularization methods.
L1 and L2 regularization are the most commonly used regularization methods.
L1 and L2 regularization are the most commonly used regularization methods.
...
@@ -46,8 +60,8 @@ Optimizer:
...
@@ -46,8 +60,8 @@ Optimizer:
name: L2
name: L2
factor: 2.0e-05
factor: 2.0e-05
```
```
<aname="13-evaluation-indicators-"></a>
#### 1.3 Evaluation indicators:
## 1.3 Evaluation indicators
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
...
@@ -55,25 +69,8 @@ Optimizer:
...
@@ -55,25 +69,8 @@ Optimizer:
(3) End-to-end statistics: End-to-end recall rate: accurately detect and correctly identify the proportion of text lines in all labeled text lines; End-to-end accuracy rate: accurately detect and correctly identify the number of text lines in the detected text lines The standard for accurate detection is that the IOU of the detection box and the labeled box is greater than a certain threshold, and the text in the correctly identified detection box is the same as the labeled text.
(3) End-to-end statistics: End-to-end recall rate: accurately detect and correctly identify the proportion of text lines in all labeled text lines; End-to-end accuracy rate: accurately detect and correctly identify the number of text lines in the detected text lines The standard for accurate detection is that the IOU of the detection box and the labeled box is greater than a certain threshold, and the text in the correctly identified detection box is the same as the labeled text.
<aname="2-faq"></a>
### 2. FAQ
# 2. FAQ
**Q**: What are the text detection methods based on deep learning? What are the advantages and disadvantages of each?
A: Commonly used deep learning-based text detection methods can generally be divided into two categories: regression-based and segmentation-based, and of course there are some methods that combine the two.
(1) Methods based on regression are divided into box regression and pixel value regression. a. The methods that use box regression mainly include CTPN, Textbox series and EAST. This type of algorithm has a better effect on regular shape text detection, but it cannot accurately detect irregular shape text. b. The methods of pixel value regression mainly include CRAFT and SA-Text. This type of algorithm can detect curved text and has an excellent effect on small text, but the real-time performance is not enough.
(2) Algorithms based on segmentation, such as PSENet, are not limited by the shape of the text, and can achieve better results for texts of various shapes, but the post-processing is often more complicated, leading to serious time-consuming. At present, there are also some algorithms that are specifically improved for this problem, such as DB, which approximates the binarization, makes it guideable, and integrates it into training, so as to obtain a more accurate boundary, which greatly reduces the time-consuming post-processing.
**Q**: For Chinese line text recognition, which is better, CTC or Attention?
A:
(1) From the point of view of effect, the recognition effect of CTC in general OCR scene is better than Attention, because there are more characters in the dictionary with recognition, and the commonly used Chinese characters are more than 3,000 characters. If the training samples are insufficient, for these characters Sequence relationship mining is more difficult. The advantages of the Attention model in the Chinese scene cannot be reflected. Moreover, Attention is suitable for short sentence recognition, and it is relatively poor in recognition of long sentences.
(2) In terms of training and prediction speed, Attention's serial decoding structure limits the prediction speed, while the CTC network structure is more efficient and has an advantage in prediction speed.
**Q**: How to choose a suitable network input shape when training CRNN recognition?
**Q**: How to choose a suitable network input shape when training CRNN recognition?
...
@@ -83,11 +80,23 @@ Optimizer:
...
@@ -83,11 +80,23 @@ Optimizer:
(2) Count the number of texts in training samples. The selection of the longest number of characters considers the training sample that satisfies 80%. Then the aspect ratio of Chinese characters is approximately considered to be 1, and that of English is 3:1, and the longest width is estimated.
(2) Count the number of texts in training samples. The selection of the longest number of characters considers the training sample that satisfies 80%. Then the aspect ratio of Chinese characters is approximately considered to be 1, and that of English is 3:1, and the longest width is estimated.
**Q**: During the recognition training, the accuracy of the training set has reached 90, but the accuracy of the verification set has been kept at 70, what should I do?
A: If the accuracy of the training set is 90 and the test set is more than 70, it should be over-fitting. There are two methods to try:
(1) Add more augmentation methods or increase the [probability] of augmented prob (https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/rec_img_aug.py#L341), The default is 0.4.
(2) Increase the [l2 dcay value] of the system (https://github.com/PaddlePaddle/PaddleOCR/blob/a501603d54ff5513fc4fc760319472e59da25424/configs/rec/ch_ppocr_v1.1/rec_chinese_lite_train_v1.1.yml#L47)
### 3. Data and vertical scenes
**Q**: When the recognition model is trained, loss can drop normally, but acc is always 0
#### 3.1 Training data
A: It is normal for the acc to be 0 at the beginning of the recognition model training, and the indicator will come up after a longer training period.
<aname="3-data-and-vertical-scenes"></a>
# 3. Data and vertical scenes
<aname="31-training-data"></a>
## 3.1 Training data
The current open source models, data sets and magnitudes are as follows:
The current open source models, data sets and magnitudes are as follows:
...
@@ -102,14 +111,14 @@ The current open source models, data sets and magnitudes are as follows:
...
@@ -102,14 +111,14 @@ The current open source models, data sets and magnitudes are as follows:
Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](./datasets.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc.
Among them, the public data sets are all open source, users can search and download by themselves, or refer to [Chinese data set](./datasets.md), synthetic data is not open source, users can use open source synthesis tools to synthesize by themselves. Synthesis tools include [text_renderer](https://github.com/Sanster/text_renderer), [SynthText](https://github.com/ankush-me/SynthText), [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator) etc.
<aname="32-vertical-scene"></a>
#### 3.2 Vertical scene
## 3.2 Vertical scene
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
PaddleOCR mainly focuses on general OCR. If you have vertical requirements, you can use PaddleOCR + vertical data to train yourself;
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
If there is a lack of labeled data, or if you do not want to invest in research and development costs, it is recommended to directly call the open API, which covers some of the more common vertical categories.
<aname="33-build-your-own-data-set"></a>
#### 3.3 Build your own data set
## 3.3 Build your own data set
There are several experiences for reference when constructing the data set:
There are several experiences for reference when constructing the data set: