未验证 提交 523d2ce0 编写于 作者: M MissPenguin 提交者: GitHub

Merge pull request #6097 from littletomatodonkey/dyg/add_kd_rec_doc

upgrade kd doc
...@@ -71,7 +71,7 @@ PostProcess: ...@@ -71,7 +71,7 @@ PostProcess:
Metric: Metric:
name: RecMetric name: RecMetric
main_indicator: acc main_indicator: acc
ignore_space: True ignore_space: False
Train: Train:
dataset: dataset:
......
...@@ -129,7 +129,7 @@ Loss: ...@@ -129,7 +129,7 @@ Loss:
key: head_out key: head_out
multi_head: True multi_head: True
- DistillationSARLoss: - DistillationSARLoss:
weight: 1.0 weight: 0.5
model_name_list: ["Student", "Teacher"] model_name_list: ["Student", "Teacher"]
key: head_out key: head_out
multi_head: True multi_head: True
...@@ -145,7 +145,7 @@ Metric: ...@@ -145,7 +145,7 @@ Metric:
base_metric_name: RecMetric base_metric_name: RecMetric
main_indicator: acc main_indicator: acc
key: "Student" key: "Student"
ignore_space: True ignore_space: False
Train: Train:
dataset: dataset:
......
...@@ -60,7 +60,7 @@ PaddleOCR中集成了知识蒸馏的算法,具体地,有以下几个主要 ...@@ -60,7 +60,7 @@ PaddleOCR中集成了知识蒸馏的算法,具体地,有以下几个主要
<a name="21"></a> <a name="21"></a>
### 2.1 识别配置文件解析 ### 2.1 识别配置文件解析
配置文件在[ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml) 配置文件在[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)
<a name="211"></a> <a name="211"></a>
#### 2.1.1 模型结构 #### 2.1.1 模型结构
...@@ -69,7 +69,7 @@ PaddleOCR中集成了知识蒸馏的算法,具体地,有以下几个主要 ...@@ -69,7 +69,7 @@ PaddleOCR中集成了知识蒸馏的算法,具体地,有以下几个主要
```yaml ```yaml
Architecture: Architecture:
model_type: &model_type "rec" # 模型类别,rec、det等,每个子网络的的模型类别都与 model_type: &model_type "rec" # 模型类别,rec、det等,每个子网络的的模型相同
name: DistillationModel # 结构名称,蒸馏任务中,为DistillationModel,用于构建对应的结构 name: DistillationModel # 结构名称,蒸馏任务中,为DistillationModel,用于构建对应的结构
algorithm: Distillation # 算法名称 algorithm: Distillation # 算法名称
Models: # 模型,包含子网络的配置信息 Models: # 模型,包含子网络的配置信息
...@@ -78,37 +78,55 @@ Architecture: ...@@ -78,37 +78,55 @@ Architecture:
freeze_params: false # 是否需要固定参数 freeze_params: false # 是否需要固定参数
return_all_feats: true # 子网络的参数,表示是否需要返回所有的features,如果为False,则只返回最后的输出 return_all_feats: true # 子网络的参数,表示是否需要返回所有的features,如果为False,则只返回最后的输出
model_type: *model_type # 模型类别 model_type: *model_type # 模型类别
algorithm: CRNN # 子网络的算法名称,该子网络剩余参与均为构造参数,与普通的模型训练配置一致 algorithm: SVTR # 子网络的算法名称,该子网络其余参数均为构造参数,与普通的模型训练配置一致
Transform: Transform:
Backbone: Backbone:
name: MobileNetV1Enhance name: MobileNetV1Enhance
scale: 0.5 scale: 0.5
Neck: last_conv_stride: [1, 2]
name: SequenceEncoder last_pool_type: avg
encoder_type: rnn
hidden_size: 64
Head: Head:
name: CTCHead name: MultiHead
mid_channels: 96 head_list:
fc_decay: 0.00002 - CTCHead:
Student: # 另外一个子网络,这里给的是DML的蒸馏示例,两个子网络结构相同,均需要学习参数 Neck:
pretrained: # 下面的组网参数同上 name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Student:
pretrained:
freeze_params: false freeze_params: false
return_all_feats: true return_all_feats: true
model_type: *model_type model_type: *model_type
algorithm: CRNN algorithm: SVTR
Transform: Transform:
Backbone: Backbone:
name: MobileNetV1Enhance name: MobileNetV1Enhance
scale: 0.5 scale: 0.5
Neck: last_conv_stride: [1, 2]
name: SequenceEncoder last_pool_type: avg
encoder_type: rnn
hidden_size: 64
Head: Head:
name: CTCHead name: MultiHead
mid_channels: 96 head_list:
fc_decay: 0.00002 - CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
``` ```
当然,这里如果希望添加更多的子网络进行训练,也可以按照`Student``Teacher`的添加方式,在配置文件中添加相应的字段。比如说如果希望有3个模型互相监督,共同训练,那么`Architecture`可以写为如下格式。 当然,这里如果希望添加更多的子网络进行训练,也可以按照`Student``Teacher`的添加方式,在配置文件中添加相应的字段。比如说如果希望有3个模型互相监督,共同训练,那么`Architecture`可以写为如下格式。
...@@ -124,55 +142,82 @@ Architecture: ...@@ -124,55 +142,82 @@ Architecture:
freeze_params: false freeze_params: false
return_all_feats: true return_all_feats: true
model_type: *model_type model_type: *model_type
algorithm: CRNN algorithm: SVTR
Transform: Transform:
Backbone: Backbone:
name: MobileNetV1Enhance name: MobileNetV1Enhance
scale: 0.5 scale: 0.5
Neck: last_conv_stride: [1, 2]
name: SequenceEncoder last_pool_type: avg
encoder_type: rnn
hidden_size: 64
Head: Head:
name: CTCHead name: MultiHead
mid_channels: 96 head_list:
fc_decay: 0.00002 - CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Student: Student:
pretrained: pretrained:
freeze_params: false freeze_params: false
return_all_feats: true return_all_feats: true
model_type: *model_type model_type: *model_type
algorithm: CRNN algorithm: SVTR
Transform: Transform:
Backbone: Backbone:
name: MobileNetV1Enhance name: MobileNetV1Enhance
scale: 0.5 scale: 0.5
Neck: last_conv_stride: [1, 2]
name: SequenceEncoder last_pool_type: avg
encoder_type: rnn
hidden_size: 64
Head: Head:
name: CTCHead name: MultiHead
mid_channels: 96 head_list:
fc_decay: 0.00002 - CTCHead:
Student2: # 知识蒸馏任务中引入的新的子网络,其他部分与上述配置相同 Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Student2:
pretrained: pretrained:
freeze_params: false freeze_params: false
return_all_feats: true return_all_feats: true
model_type: *model_type model_type: *model_type
algorithm: CRNN algorithm: SVTR
Transform: Transform:
Backbone: Backbone:
name: MobileNetV1Enhance name: MobileNetV1Enhance
scale: 0.5 scale: 0.5
Neck: last_conv_stride: [1, 2]
name: SequenceEncoder last_pool_type: avg
encoder_type: rnn
hidden_size: 64
Head: Head:
name: CTCHead name: MultiHead
mid_channels: 96 head_list:
fc_decay: 0.00002 - CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
``` ```
最终该模型训练时,包含3个子网络:`Teacher`, `Student`, `Student2` 最终该模型训练时,包含3个子网络:`Teacher`, `Student`, `Student2`
...@@ -205,34 +250,56 @@ Architecture: ...@@ -205,34 +250,56 @@ Architecture:
```yaml ```yaml
Loss: Loss:
name: CombinedLoss # 损失函数名称,基于改名称,构建用于损失函数的类 name: CombinedLoss
loss_config_list: # 损失函数配置文件列表,为CombinedLoss的必备函数 loss_config_list:
- DistillationCTCLoss: # 基于蒸馏的CTC损失函数,继承自标准的CTC loss
weight: 1.0 # 损失函数的权重,loss_config_list中,每个损失函数的配置都必须包含该字段
model_name_list: ["Student", "Teacher"] # 对于蒸馏模型的预测结果,提取这两个子网络的输出,与gt计算CTC loss
key: head_out # 取子网络输出dict中,该key对应的tensor
- DistillationDMLLoss: # 蒸馏的DML损失函数,继承自标准的DMLLoss - DistillationDMLLoss: # 蒸馏的DML损失函数,继承自标准的DMLLoss
weight: 1.0 # 权重 weight: 1.0 # 权重
act: "softmax" # 激活函数,对输入使用激活函数处理,可以为softmax, sigmoid或者为None,默认为None act: "softmax" # 激活函数,对输入使用激活函数处理,可以为softmax, sigmoid或者为None,默认为None
use_log: true # 对输入计算log,如果函数已经
model_name_pairs: # 用于计算DML loss的子网络名称对,如果希望计算其他子网络的DML loss,可以在列表下面继续填充 model_name_pairs: # 用于计算DML loss的子网络名称对,如果希望计算其他子网络的DML loss,可以在列表下面继续填充
- ["Student", "Teacher"] - ["Student", "Teacher"]
key: head_out # 取子网络输出dict中,该key对应的tensor key: head_out # 取子网络输出dict中,该key对应的tensor
multi_head: True # 是否为多头结构
dis_head: ctc # 指定用于计算损失函数的head
name: dml_ctc # 蒸馏loss的前缀名称,避免不同loss之间的命名冲突
- DistillationDMLLoss: # 蒸馏的DML损失函数,继承自标准的DMLLoss
weight: 0.5 # 权重
act: "softmax" # 激活函数,对输入使用激活函数处理,可以为softmax, sigmoid或者为None,默认为None
use_log: true # 对输入计算log,如果函数已经
model_name_pairs: # 用于计算DML loss的子网络名称对,如果希望计算其他子网络的DML loss,可以在列表下面继续填充
- ["Student", "Teacher"]
key: head_out # 取子网络输出dict中,该key对应的tensor
multi_head: True # 是否为多头结构
dis_head: sar # 指定用于计算损失函数的head
name: dml_sar # 蒸馏loss的前缀名称,避免不同loss之间的命名冲突
- DistillationDistanceLoss: # 蒸馏的距离损失函数 - DistillationDistanceLoss: # 蒸馏的距离损失函数
weight: 1.0 # 权重 weight: 1.0 # 权重
mode: "l2" # 距离计算方法,目前支持l1, l2, smooth_l1 mode: "l2" # 距离计算方法,目前支持l1, l2, smooth_l1
model_name_pairs: # 用于计算distance loss的子网络名称对 model_name_pairs: # 用于计算distance loss的子网络名称对
- ["Student", "Teacher"] - ["Student", "Teacher"]
key: backbone_out # 取子网络输出dict中,该key对应的tensor key: backbone_out # 取子网络输出dict中,该key对应的tensor
- DistillationCTCLoss: # 基于蒸馏的CTC损失函数,继承自标准的CTC loss
weight: 1.0 # 损失函数的权重,loss_config_list中,每个损失函数的配置都必须包含该字段
model_name_list: ["Student", "Teacher"] # 对于蒸馏模型的预测结果,提取这两个子网络的输出,与gt计算CTC loss
key: head_out # 取子网络输出dict中,该key对应的tensor
- DistillationSARLoss: # 基于蒸馏的SAR损失函数,继承自标准的SARLoss
weight: 1.0 # 损失函数的权重,loss_config_list中,每个损失函数的配置都必须包含该字段
model_name_list: ["Student", "Teacher"] # 对于蒸馏模型的预测结果,提取这两个子网络的输出,与gt计算CTC loss
key: head_out # 取子网络输出dict中,该key对应的tensor
multi_head: True # 是否为多头结构,为true时,取出其中的SAR分支计算损失函数
``` ```
上述损失函数中,所有的蒸馏损失函数均继承自标准的损失函数类,主要功能为: 对蒸馏模型的输出进行解析,找到用于计算损失的中间节点(tensor),再使用标准的损失函数类去计算。 上述损失函数中,所有的蒸馏损失函数均继承自标准的损失函数类,主要功能为: 对蒸馏模型的输出进行解析,找到用于计算损失的中间节点(tensor),再使用标准的损失函数类去计算。
以上述配置为例,最终蒸馏训练的损失函数包含下面3个部分。 以上述配置为例,最终蒸馏训练的损失函数包含下面5个部分。
- `Student``Teacher`的最终输出(`head_out`)与gt的CTC loss,权重为1。在这里因为2个子网络都需要更新参数,因此2者都需要计算与g的loss。 - `Student``Teacher`最终输出(`head_out`)的CTC分支与gt的CTC loss,权重为1。在这里因为2个子网络都需要更新参数,因此2者都需要计算与g的loss。
- `Student``Teacher`的最终输出(`head_out`)之间的DML loss,权重为1。 - `Student``Teacher`最终输出(`head_out`)的SAR分支与gt的SAR loss,权重为1.0。在这里因为2个子网络都需要更新参数,因此2者都需要计算与g的loss。
- `Student``Teacher`最终输出(`head_out`)的CTC分支之间的DML loss,权重为1。
- `Student``Teacher`最终输出(`head_out`)的SAR分支之间的DML loss,权重为0.5。
- `Student``Teacher`的骨干网络输出(`backbone_out`)之间的l2 loss,权重为1。 - `Student``Teacher`的骨干网络输出(`backbone_out`)之间的l2 loss,权重为1。
关于`CombinedLoss`更加具体的实现可以参考: [combined_loss.py](../../ppocr/losses/combined_loss.py#L23)。关于`DistillationCTCLoss`等蒸馏损失函数更加具体的实现可以参考[distillation_loss.py](../../ppocr/losses/distillation_loss.py) 关于`CombinedLoss`更加具体的实现可以参考: [combined_loss.py](../../ppocr/losses/combined_loss.py#L23)。关于`DistillationCTCLoss`等蒸馏损失函数更加具体的实现可以参考[distillation_loss.py](../../ppocr/losses/distillation_loss.py)
<a name="213"></a> <a name="213"></a>
...@@ -245,6 +312,7 @@ PostProcess: ...@@ -245,6 +312,7 @@ PostProcess:
name: DistillationCTCLabelDecode # 蒸馏任务的CTC解码后处理,继承自标准的CTCLabelDecode类 name: DistillationCTCLabelDecode # 蒸馏任务的CTC解码后处理,继承自标准的CTCLabelDecode类
model_name: ["Student", "Teacher"] # 对于蒸馏模型的预测结果,提取这两个子网络的输出,进行解码 model_name: ["Student", "Teacher"] # 对于蒸馏模型的预测结果,提取这两个子网络的输出,进行解码
key: head_out # 取子网络输出dict中,该key对应的tensor key: head_out # 取子网络输出dict中,该key对应的tensor
multi_head: True # 多头结构时,会取出其中的CTC分支进行计算
``` ```
以上述配置为例,最终会同时计算`Student``Teahcer` 2个子网络的CTC解码输出,返回一个`dict``key`为用于处理的子网络名称,`value`为用于处理的子网络列表。 以上述配置为例,最终会同时计算`Student``Teahcer` 2个子网络的CTC解码输出,返回一个`dict``key`为用于处理的子网络名称,`value`为用于处理的子网络列表。
...@@ -262,6 +330,7 @@ Metric: ...@@ -262,6 +330,7 @@ Metric:
base_metric_name: RecMetric # 指标计算的基类,对于模型的输出,会基于该类,计算指标 base_metric_name: RecMetric # 指标计算的基类,对于模型的输出,会基于该类,计算指标
main_indicator: acc # 指标的名称 main_indicator: acc # 指标的名称
key: "Student" # 选取该子网络的 main_indicator 作为作为保存保存best model的判断标准 key: "Student" # 选取该子网络的 main_indicator 作为作为保存保存best model的判断标准
ignore_space: False # 评估时是否忽略空格的影响
``` ```
以上述配置为例,最终会使用`Student`子网络的acc指标作为保存best model的判断指标,同时,日志中也会打印出所有子网络的acc指标。 以上述配置为例,最终会使用`Student`子网络的acc指标作为保存best model的判断指标,同时,日志中也会打印出所有子网络的acc指标。
...@@ -273,15 +342,15 @@ Metric: ...@@ -273,15 +342,15 @@ Metric:
对蒸馏得到的识别蒸馏进行微调有2种方式。 对蒸馏得到的识别蒸馏进行微调有2种方式。
(1)基于知识蒸馏的微调:这种情况比较简单,下载预训练模型,在[ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml)中配置好预训练模型路径以及自己的数据路径,即可进行模型微调训练。 (1)基于知识蒸馏的微调:这种情况比较简单,下载预训练模型,在[ch_PP-OCRv3_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml)中配置好预训练模型路径以及自己的数据路径,即可进行模型微调训练。
(2)微调时不使用知识蒸馏:这种情况,需要首先将预训练模型中的学生模型参数提取出来,具体步骤如下。 (2)微调时不使用知识蒸馏:这种情况,需要首先将预训练模型中的学生模型参数提取出来,具体步骤如下。
* 首先下载预训练模型并解压。 * 首先下载预训练模型并解压。
```shell ```shell
# 下面预训练模型并解压 # 下面预训练模型并解压
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
tar -xf ch_PP-OCRv2_rec_train.tar tar -xf ch_PP-OCRv3_rec_train.tar
``` ```
* 然后使用python,对其中的学生模型参数进行提取 * 然后使用python,对其中的学生模型参数进行提取
...@@ -289,7 +358,7 @@ tar -xf ch_PP-OCRv2_rec_train.tar ...@@ -289,7 +358,7 @@ tar -xf ch_PP-OCRv2_rec_train.tar
```python ```python
import paddle import paddle
# 加载预训练模型 # 加载预训练模型
all_params = paddle.load("ch_PP-OCRv2_rec_train/best_accuracy.pdparams") all_params = paddle.load("ch_PP-OCRv3_rec_train/best_accuracy.pdparams")
# 查看权重参数的keys # 查看权重参数的keys
print(all_params.keys()) print(all_params.keys())
# 学生模型的权重提取 # 学生模型的权重提取
...@@ -297,10 +366,10 @@ s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Stu ...@@ -297,10 +366,10 @@ s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Stu
# 查看学生模型权重参数的keys # 查看学生模型权重参数的keys
print(s_params.keys()) print(s_params.keys())
# 保存 # 保存
paddle.save(s_params, "ch_PP-OCRv2_rec_train/student.pdparams") paddle.save(s_params, "ch_PP-OCRv3_rec_train/student.pdparams")
``` ```
转化完成之后,使用[ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml),修改预训练模型的路径(为导出的`student.pdparams`模型路径)以及自己的数据路径,即可进行模型微调。 转化完成之后,使用[ch_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml),修改预训练模型的路径(为导出的`student.pdparams`模型路径)以及自己的数据路径,即可进行模型微调。
<a name="22"></a> <a name="22"></a>
### 2.2 检测配置文件解析 ### 2.2 检测配置文件解析
......
...@@ -74,6 +74,7 @@ The configuration file is in [ch_PP-OCRv2_rec_distillation.yml](../../configs/re ...@@ -74,6 +74,7 @@ The configuration file is in [ch_PP-OCRv2_rec_distillation.yml](../../configs/re
#### 2.1.1 Model Structure #### 2.1.1 Model Structure
In the knowledge distillation task, the model structure configuration is as follows. In the knowledge distillation task, the model structure configuration is as follows.
```yaml ```yaml
Architecture: Architecture:
model_type: &model_type "rec" # Model category, recognition, detection, etc. model_type: &model_type "rec" # Model category, recognition, detection, etc.
...@@ -85,37 +86,55 @@ Architecture: ...@@ -85,37 +86,55 @@ Architecture:
freeze_params: false # Do you need fixed parameters freeze_params: false # Do you need fixed parameters
return_all_feats: true # Do you need to return all features, if it is False, only the final output is returned return_all_feats: true # Do you need to return all features, if it is False, only the final output is returned
model_type: *model_type # Model category model_type: *model_type # Model category
algorithm: CRNN # The algorithm name of the sub-network. The remaining parameters of the sub-network are consistent with the general model training configuration algorithm: SVTR # The algorithm name of the sub-network. The remaining parameters of the sub-network are consistent with the general model training configuration
Transform: Transform:
Backbone: Backbone:
name: MobileNetV1Enhance name: MobileNetV1Enhance
scale: 0.5 scale: 0.5
Neck: last_conv_stride: [1, 2]
name: SequenceEncoder last_pool_type: avg
encoder_type: rnn
hidden_size: 64
Head: Head:
name: CTCHead name: MultiHead
mid_channels: 96 head_list:
fc_decay: 0.00002 - CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Student: # Another sub-network, here is a distillation example of DML, the two sub-networks have the same structure, and both need to learn parameters Student: # Another sub-network, here is a distillation example of DML, the two sub-networks have the same structure, and both need to learn parameters
pretrained: # The following parameters are the same as above pretrained: # The following parameters are the same as above
freeze_params: false freeze_params: false
return_all_feats: true return_all_feats: true
model_type: *model_type model_type: *model_type
algorithm: CRNN algorithm: SVTR
Transform: Transform:
Backbone: Backbone:
name: MobileNetV1Enhance name: MobileNetV1Enhance
scale: 0.5 scale: 0.5
Neck: last_conv_stride: [1, 2]
name: SequenceEncoder last_pool_type: avg
encoder_type: rnn
hidden_size: 64
Head: Head:
name: CTCHead name: MultiHead
mid_channels: 96 head_list:
fc_decay: 0.00002 - CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
``` ```
If you want to add more sub-networks for training, you can also add the corresponding fields in the configuration file according to the way of adding `Student` and `Teacher`. If you want to add more sub-networks for training, you can also add the corresponding fields in the configuration file according to the way of adding `Student` and `Teacher`.
...@@ -132,55 +151,83 @@ Architecture: ...@@ -132,55 +151,83 @@ Architecture:
freeze_params: false freeze_params: false
return_all_feats: true return_all_feats: true
model_type: *model_type model_type: *model_type
algorithm: CRNN algorithm: SVTR
Transform: Transform:
Backbone: Backbone:
name: MobileNetV1Enhance name: MobileNetV1Enhance
scale: 0.5 scale: 0.5
Neck: last_conv_stride: [1, 2]
name: SequenceEncoder last_pool_type: avg
encoder_type: rnn
hidden_size: 64
Head: Head:
name: CTCHead name: MultiHead
mid_channels: 96 head_list:
fc_decay: 0.00002 - CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Student: Student:
pretrained: pretrained:
freeze_params: false freeze_params: false
return_all_feats: true return_all_feats: true
model_type: *model_type model_type: *model_type
algorithm: CRNN algorithm: SVTR
Transform: Transform:
Backbone: Backbone:
name: MobileNetV1Enhance name: MobileNetV1Enhance
scale: 0.5 scale: 0.5
Neck: last_conv_stride: [1, 2]
name: SequenceEncoder last_pool_type: avg
encoder_type: rnn
hidden_size: 64
Head: Head:
name: CTCHead name: MultiHead
mid_channels: 96 head_list:
fc_decay: 0.00002 - CTCHead:
Student2: # The new sub-network introduced in the knowledge distillation task, the configuration is the same as above Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Student2:
pretrained: pretrained:
freeze_params: false freeze_params: false
return_all_feats: true return_all_feats: true
model_type: *model_type model_type: *model_type
algorithm: CRNN algorithm: SVTR
Transform: Transform:
Backbone: Backbone:
name: MobileNetV1Enhance name: MobileNetV1Enhance
scale: 0.5 scale: 0.5
Neck: last_conv_stride: [1, 2]
name: SequenceEncoder last_pool_type: avg
encoder_type: rnn
hidden_size: 64
Head: Head:
name: CTCHead name: MultiHead
mid_channels: 96 head_list:
fc_decay: 0.00002 - CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
```
``` ```
When the model is finally trained, it contains 3 sub-networks: `Teacher`, `Student`, `Student2`. When the model is finally trained, it contains 3 sub-networks: `Teacher`, `Student`, `Student2`.
...@@ -224,23 +271,42 @@ Loss: ...@@ -224,23 +271,42 @@ Loss:
act: "softmax" # Activation function, use it to process the input, can be softmax, sigmoid or None, the default is None act: "softmax" # Activation function, use it to process the input, can be softmax, sigmoid or None, the default is None
model_name_pairs: # The subnet name pair used to calculate DML loss. If you want to calculate the DML loss of other subnets, you can continue to add it below the list model_name_pairs: # The subnet name pair used to calculate DML loss. If you want to calculate the DML loss of other subnets, you can continue to add it below the list
- ["Student", "Teacher"] - ["Student", "Teacher"]
key: head_out key: head_out
multi_head: True # whether to use mult_head
dis_head: ctc # assign the head name to calculate loss
name: dml_ctc # prefix name of the loss
- DistillationDMLLoss: # DML loss function, inherited from the standard DMLLoss
weight: 0.5
act: "softmax" # Activation function, use it to process the input, can be softmax, sigmoid or None, the default is None
model_name_pairs: # The subnet name pair used to calculate DML loss. If you want to calculate the DML loss of other subnets, you can continue to add it below the list
- ["Student", "Teacher"]
key: head_out
multi_head: True # whether to use mult_head
dis_head: sar # assign the head name to calculate loss
name: dml_sar # prefix name of the loss
- DistillationDistanceLoss: # Distilled distance loss function - DistillationDistanceLoss: # Distilled distance loss function
weight: 1.0 weight: 1.0
mode: "l2" # Support l1, l2 or smooth_l1 mode: "l2" # Support l1, l2 or smooth_l1
model_name_pairs: # Calculate the distance loss of the subnet name pair model_name_pairs: # Calculate the distance loss of the subnet name pair
- ["Student", "Teacher"] - ["Student", "Teacher"]
key: backbone_out key: backbone_out
- DistillationSARLoss: # SAR loss function based on distillation, inherited from standard SAR loss
weight: 1.0 # The weight of the loss function. In loss_config_list, each loss function must include this field
model_name_list: ["Student", "Teacher"] # For the prediction results of the distillation model, extract the output of these two sub-networks and calculate the SAR loss with gt
key: head_out # In the sub-network output dict, take the corresponding tensor
multi_head: True # whether it is multi-head or not, if true, SAR branch is used to calculate the loss
``` ```
Among the above loss functions, all distillation loss functions are inherited from the standard loss function class. Among the above loss functions, all distillation loss functions are inherited from the standard loss function class.
The main functions are: Analyze the output of the distillation model, find the intermediate node (tensor) used to calculate the loss, The main functions are: Analyze the output of the distillation model, find the intermediate node (tensor) used to calculate the loss,
and then use the standard loss function class to calculate. and then use the standard loss function class to calculate.
Taking the above configuration as an example, the final distillation training loss function contains the following three parts. Taking the above configuration as an example, the final distillation training loss function contains the following five parts.
- The final output `head_out` of `Student` and `Teacher` calculates the CTC loss with gt (loss weight equals 1.0). Here, because both sub-networks need to update the parameters, both of them need to calculate the loss with gt. - CTC branch of the final output `head_out` for `Student` and `Teacher` calculates the CTC loss with gt (loss weight equals 1.0). Here, because both sub-networks need to update the parameters, both of them need to calculate the loss with gt.
- DML loss between `Student` and `Teacher`'s final output `head_out` (loss weight equals 1.0). - SAR branch of the final output `head_out` for `Student` and `Teacher` calculates the SAR loss with gt (loss weight equals 1.0). Here, because both sub-networks need to update the parameters, both of them need to calculate the loss with gt.
- DML loss between CTC branch of `Student` and `Teacher`'s final output `head_out` (loss weight equals 1.0).
- DML loss between SAR branch of `Student` and `Teacher`'s final output `head_out` (loss weight equals 0.5).
- L2 loss between `Student` and `Teacher`'s backbone network output `backbone_out` (loss weight equals 1.0). - L2 loss between `Student` and `Teacher`'s backbone network output `backbone_out` (loss weight equals 1.0).
For more specific implementation of `CombinedLoss`, please refer to: [combined_loss.py](../../ppocr/losses/combined_loss.py#L23). For more specific implementation of `CombinedLoss`, please refer to: [combined_loss.py](../../ppocr/losses/combined_loss.py#L23).
...@@ -257,6 +323,7 @@ PostProcess: ...@@ -257,6 +323,7 @@ PostProcess:
name: DistillationCTCLabelDecode # CTC decoding post-processing of distillation tasks, inherited from the standard CTCLabelDecode class name: DistillationCTCLabelDecode # CTC decoding post-processing of distillation tasks, inherited from the standard CTCLabelDecode class
model_name: ["Student", "Teacher"] # For the prediction results of the distillation model, extract the outputs of these two sub-networks and decode them model_name: ["Student", "Teacher"] # For the prediction results of the distillation model, extract the outputs of these two sub-networks and decode them
key: head_out # Take the corresponding tensor in the subnet output dict key: head_out # Take the corresponding tensor in the subnet output dict
multi_head: True # whether it is multi-head or not, if true, CTC branch is used to calculate the loss
``` ```
Taking the above configuration as an example, the CTC decoding output of the two sub-networks `Student` and `Teahcer` will be calculated at the same time. Taking the above configuration as an example, the CTC decoding output of the two sub-networks `Student` and `Teahcer` will be calculated at the same time.
...@@ -276,6 +343,7 @@ Metric: ...@@ -276,6 +343,7 @@ Metric:
base_metric_name: RecMetric # The base class of indicator calculation. For the output of the model, the indicator will be calculated based on this class base_metric_name: RecMetric # The base class of indicator calculation. For the output of the model, the indicator will be calculated based on this class
main_indicator: acc # The name of the indicator main_indicator: acc # The name of the indicator
key: "Student" # Select the main_indicator of this subnet as the criterion for saving the best model key: "Student" # Select the main_indicator of this subnet as the criterion for saving the best model
ignore_space: False # whether to ignore space during evaulation
``` ```
Taking the above configuration as an example, the accuracy metric of the `Student` subnet will be used as the judgment metric for saving the best model. Taking the above configuration as an example, the accuracy metric of the `Student` subnet will be used as the judgment metric for saving the best model.
...@@ -289,13 +357,13 @@ For more specific implementation of `DistillationMetric`, please refer to: [dist ...@@ -289,13 +357,13 @@ For more specific implementation of `DistillationMetric`, please refer to: [dist
There are two ways to fine-tune the recognition distillation task. There are two ways to fine-tune the recognition distillation task.
1. Fine-tuning based on knowledge distillation: this situation is relatively simple, download the pre-trained model. Then configure the pre-training model path and your own data path in [ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml) to perform fine-tuning training of the model. 1. Fine-tuning based on knowledge distillation: this situation is relatively simple, download the pre-trained model. Then configure the pre-training model path and your own data path in [ch_PP-OCRv2_rec_distillation.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml) to perform fine-tuning training of the model.
2. Do not use knowledge distillation in fine-tuning: In this case, you need to first extract the student model parameters from the pre-training model. The specific steps are as follows. 2. Do not use knowledge distillation in fine-tuning: In this case, you need to first extract the student model parameters from the pre-training model. The specific steps are as follows.
- First download the pre-trained model and unzip it. - First download the pre-trained model and unzip it.
```shell ```shell
wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
tar -xf ch_PP-OCRv2_rec_train.tar tar -xf ch_PP-OCRv3_rec_train.tar
``` ```
- Then use python to extract the student model parameters - Then use python to extract the student model parameters
...@@ -303,7 +371,7 @@ tar -xf ch_PP-OCRv2_rec_train.tar ...@@ -303,7 +371,7 @@ tar -xf ch_PP-OCRv2_rec_train.tar
```python ```python
import paddle import paddle
# Load the pre-trained model # Load the pre-trained model
all_params = paddle.load("ch_PP-OCRv2_rec_train/best_accuracy.pdparams") all_params = paddle.load("ch_PP-OCRv3_rec_train/best_accuracy.pdparams")
# View the keys of the weight parameter # View the keys of the weight parameter
print(all_params.keys()) print(all_params.keys())
# Weight extraction of student model # Weight extraction of student model
...@@ -311,10 +379,10 @@ s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Stu ...@@ -311,10 +379,10 @@ s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Stu
# View the keys of the weight parameters of the student model # View the keys of the weight parameters of the student model
print(s_params.keys()) print(s_params.keys())
# Save weight parameters # Save weight parameters
paddle.save(s_params, "ch_PP-OCRv2_rec_train/student.pdparams") paddle.save(s_params, "ch_PP-OCRv3_rec_train/student.pdparams")
``` ```
After the extraction is complete, use [ch_PP-OCRv2_rec.yml](../../configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml) to modify the path of the pre-trained model (the path of the exported `student.pdparams` model) and your own data path to fine-tune the model. After the extraction is complete, use [ch_PP-OCRv3_rec.yml](../../configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml) to modify the path of the pre-trained model (the path of the exported `student.pdparams` model) and your own data path to fine-tune the model.
<a name="22"></a> <a name="22"></a>
### 2.2 Detection Model Configuration File Analysis ### 2.2 Detection Model Configuration File Analysis
......
...@@ -49,18 +49,23 @@ def get_check_global_params(mode): ...@@ -49,18 +49,23 @@ def get_check_global_params(mode):
return check_params return check_params
def _check_image_file(path):
img_end = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff', 'gif'}
return any([path.lower().endswith(e) for e in img_end])
def get_image_file_list(img_file): def get_image_file_list(img_file):
imgs_lists = [] imgs_lists = []
if img_file is None or not os.path.exists(img_file): if img_file is None or not os.path.exists(img_file):
raise Exception("not found any img file in {}".format(img_file)) raise Exception("not found any img file in {}".format(img_file))
img_end = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff', 'gif', 'GIF'} img_end = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff', 'gif'}
if os.path.isfile(img_file) and imghdr.what(img_file) in img_end: if os.path.isfile(img_file) and _check_image_file(file_path):
imgs_lists.append(img_file) imgs_lists.append(img_file)
elif os.path.isdir(img_file): elif os.path.isdir(img_file):
for single_file in os.listdir(img_file): for single_file in os.listdir(img_file):
file_path = os.path.join(img_file, single_file) file_path = os.path.join(img_file, single_file)
if os.path.isfile(file_path) and imghdr.what(file_path) in img_end: if os.path.isfile(file_path) and _check_image_file(file_path):
imgs_lists.append(file_path) imgs_lists.append(file_path)
if len(imgs_lists) == 0: if len(imgs_lists) == 0:
raise Exception("not found any img file in {}".format(img_file)) raise Exception("not found any img file in {}".format(img_file))
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册