From 97ef80e33337c5df2562bb21e7f9819ac550cd19 Mon Sep 17 00:00:00 2001 From: xiaoting <31891223+tink2123@users.noreply.github.com> Date: Wed, 4 Jan 2023 18:41:16 +0800 Subject: [PATCH] update finetune doc (#8774) * update finetune doc * update finetune doc --- doc/doc_ch/finetune.md | 63 +++++++++++++++++++++++++++++++++++++++ doc/doc_en/finetune_en.md | 62 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 125 insertions(+) diff --git a/doc/doc_ch/finetune.md b/doc/doc_ch/finetune.md index 97ec99f2..ec4bd065 100644 --- a/doc/doc_ch/finetune.md +++ b/doc/doc_ch/finetune.md @@ -103,6 +103,66 @@ PaddleOCR提供的配置文件是在8卡训练(相当于总的batch size是`8* 更多PP-OCR系列模型,请参考[PP-OCR 系列模型库](./models_list.md)。 +PP-OCRv3 模型使用了GTC策略,其中SAR分支参数量大,当训练数据为简单场景时模型容易过拟合,导致微调效果不佳,建议去除GTC策略,模型结构部分配置文件修改如下: + +```yaml +Architecture: + model_type: rec + algorithm: SVTR + Transform: + Backbone: + name: MobileNetV1Enhance + scale: 0.5 + last_conv_stride: [1, 2] + last_pool_type: avg + Neck: + name: SequenceEncoder + encoder_type: svtr + dims: 64 + depth: 2 + hidden_dims: 120 + use_guide: False + Head: + name: CTCHead + fc_decay: 0.00001 +Loss: + name: CTCLoss + +Train: + dataset: + ...... + transforms: + # 去除 RecConAug 增广 + # - RecConAug: + # prob: 0.5 + # ext_data_num: 2 + # image_shape: [48, 320, 3] + # max_text_length: *max_text_length + - RecAug: + # 修改 Encode 方式 + - CTCLabelEncode: + - KeepKeys: + keep_keys: + - image + - label + - length +... + +Eval: + dataset: + ... + transforms: + ... + - CTCLabelEncode: + - KeepKeys: + keep_keys: + - image + - label + - length +... + + +``` ### 3.3 训练超参选择 @@ -163,6 +223,9 @@ Train: ratio_list: [1.0, 0.1] ``` + ### 3.4 训练调优 训练过程并非一蹴而就的,完成一个阶段的训练评估后,建议收集分析当前模型在真实场景中的 badcase,有针对性的调整训练数据比例,或者进一步新增合成数据。通过多次迭代训练,不断优化模型效果。 + +如果在训练时修改了自定义字典,由于无法加载最后一层FC的参数,在迭代初期acc=0是正常的情况,不必担心,加载预训练模型依然可以加快模型收敛。 diff --git a/doc/doc_en/finetune_en.md b/doc/doc_en/finetune_en.md index 54be93f4..e76eb1e2 100644 --- a/doc/doc_en/finetune_en.md +++ b/doc/doc_en/finetune_en.md @@ -103,6 +103,66 @@ It is recommended to choose the PP-OCRv3 model (configuration file: [ch_PP-OCRv3 For more PP-OCR series models, please refer to [PP-OCR Series Model Library](./models_list_en.md)。 +The PP-OCRv3 model uses the GTC strategy. The SAR branch has a large number of parameters. When the training data is a simple scene, the model is easy to overfit, resulting in poor fine-tuning effect. It is recommended to remove the GTC strategy. The configuration file of the model structure is modified as follows: + +```yaml +Architecture: + model_type: rec + algorithm: SVTR + Transform: + Backbone: + name: MobileNetV1Enhance + scale: 0.5 + last_conv_stride: [1, 2] + last_pool_type: avg + Neck: + name: SequenceEncoder + encoder_type: svtr + dims: 64 + depth: 2 + hidden_dims: 120 + use_guide: False + Head: + name: CTCHead + fc_decay: 0.00001 +Loss: + name: CTCLoss + +Train: + dataset: + ...... + transforms: + # remove RecConAug + # - RecConAug: + # prob: 0.5 + # ext_data_num: 2 + # image_shape: [48, 320, 3] + # max_text_length: *max_text_length + - RecAug: + # modify Encode + - CTCLabelEncode: + - KeepKeys: + keep_keys: + - image + - label + - length +... + +Eval: + dataset: + ... + transforms: + ... + - CTCLabelEncode: + - KeepKeys: + keep_keys: + - image + - label + - length +... + + +``` ### 3.3 Training hyperparameter @@ -165,3 +225,5 @@ Train: ### 3.4 training optimization The training process does not happen overnight. After completing a stage of training evaluation, it is recommended to collect and analyze the badcase of the current model in the real scene, adjust the proportion of training data in a targeted manner, or further add synthetic data. Through multiple iterations of training, the model effect is continuously optimized. + +If you modify the custom dictionary during training, since the parameters of the last layer of FC cannot be loaded, it is normal for acc=0 at the beginning of the iteration. Don't worry, loading the pre-trained model can still speed up the model convergence. -- GitLab