未验证 提交 97ef80e3 编写于 作者: X xiaoting 提交者: GitHub

update finetune doc (#8774)

* update finetune doc

* update finetune doc
上级 c1e19914
...@@ -103,6 +103,66 @@ PaddleOCR提供的配置文件是在8卡训练(相当于总的batch size是`8* ...@@ -103,6 +103,66 @@ PaddleOCR提供的配置文件是在8卡训练(相当于总的batch size是`8*
更多PP-OCR系列模型,请参考[PP-OCR 系列模型库](./models_list.md) 更多PP-OCR系列模型,请参考[PP-OCR 系列模型库](./models_list.md)
PP-OCRv3 模型使用了GTC策略,其中SAR分支参数量大,当训练数据为简单场景时模型容易过拟合,导致微调效果不佳,建议去除GTC策略,模型结构部分配置文件修改如下:
```yaml
Architecture:
model_type: rec
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Neck:
name: SequenceEncoder
encoder_type: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: False
Head:
name: CTCHead
fc_decay: 0.00001
Loss:
name: CTCLoss
Train:
dataset:
......
transforms:
# 去除 RecConAug 增广
# - RecConAug:
# prob: 0.5
# ext_data_num: 2
# image_shape: [48, 320, 3]
# max_text_length: *max_text_length
- RecAug:
# 修改 Encode 方式
- CTCLabelEncode:
- KeepKeys:
keep_keys:
- image
- label
- length
...
Eval:
dataset:
...
transforms:
...
- CTCLabelEncode:
- KeepKeys:
keep_keys:
- image
- label
- length
...
```
### 3.3 训练超参选择 ### 3.3 训练超参选择
...@@ -163,6 +223,9 @@ Train: ...@@ -163,6 +223,9 @@ Train:
ratio_list: [1.0, 0.1] ratio_list: [1.0, 0.1]
``` ```
### 3.4 训练调优 ### 3.4 训练调优
训练过程并非一蹴而就的,完成一个阶段的训练评估后,建议收集分析当前模型在真实场景中的 badcase,有针对性的调整训练数据比例,或者进一步新增合成数据。通过多次迭代训练,不断优化模型效果。 训练过程并非一蹴而就的,完成一个阶段的训练评估后,建议收集分析当前模型在真实场景中的 badcase,有针对性的调整训练数据比例,或者进一步新增合成数据。通过多次迭代训练,不断优化模型效果。
如果在训练时修改了自定义字典,由于无法加载最后一层FC的参数,在迭代初期acc=0是正常的情况,不必担心,加载预训练模型依然可以加快模型收敛。
...@@ -103,6 +103,66 @@ It is recommended to choose the PP-OCRv3 model (configuration file: [ch_PP-OCRv3 ...@@ -103,6 +103,66 @@ It is recommended to choose the PP-OCRv3 model (configuration file: [ch_PP-OCRv3
For more PP-OCR series models, please refer to [PP-OCR Series Model Library](./models_list_en.md) For more PP-OCR series models, please refer to [PP-OCR Series Model Library](./models_list_en.md)
The PP-OCRv3 model uses the GTC strategy. The SAR branch has a large number of parameters. When the training data is a simple scene, the model is easy to overfit, resulting in poor fine-tuning effect. It is recommended to remove the GTC strategy. The configuration file of the model structure is modified as follows:
```yaml
Architecture:
model_type: rec
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Neck:
name: SequenceEncoder
encoder_type: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: False
Head:
name: CTCHead
fc_decay: 0.00001
Loss:
name: CTCLoss
Train:
dataset:
......
transforms:
# remove RecConAug
# - RecConAug:
# prob: 0.5
# ext_data_num: 2
# image_shape: [48, 320, 3]
# max_text_length: *max_text_length
- RecAug:
# modify Encode
- CTCLabelEncode:
- KeepKeys:
keep_keys:
- image
- label
- length
...
Eval:
dataset:
...
transforms:
...
- CTCLabelEncode:
- KeepKeys:
keep_keys:
- image
- label
- length
...
```
### 3.3 Training hyperparameter ### 3.3 Training hyperparameter
...@@ -165,3 +225,5 @@ Train: ...@@ -165,3 +225,5 @@ Train:
### 3.4 training optimization ### 3.4 training optimization
The training process does not happen overnight. After completing a stage of training evaluation, it is recommended to collect and analyze the badcase of the current model in the real scene, adjust the proportion of training data in a targeted manner, or further add synthetic data. Through multiple iterations of training, the model effect is continuously optimized. The training process does not happen overnight. After completing a stage of training evaluation, it is recommended to collect and analyze the badcase of the current model in the real scene, adjust the proportion of training data in a targeted manner, or further add synthetic data. Through multiple iterations of training, the model effect is continuously optimized.
If you modify the custom dictionary during training, since the parameters of the last layer of FC cannot be loaded, it is normal for acc=0 at the beginning of the iteration. Don't worry, loading the pre-trained model can still speed up the model convergence.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册