model_name_pairs:[["Student","Teacher"]]# calculate mgdloss for Student and Teacher
name:"loss_mgd"
base_loss_name:MGDLoss# MGD loss, the following are parameters of 'MGD loss'
s_keys:["blocks[7]"]# feature map used to calculate MGD loss in student model
t_keys:["blocks[15]"]# feature map used to calculate MGD loss in teacher model
s_key:"blocks[7]"# feature map used to calculate MGD loss in student model
t_key:"blocks[15]"# feature map used to calculate MGD loss in teacher model
student_channels:512# channel num for stduent feature map
teacher_channels:512# channel num for teacher feature map
Eval:
...
...
@@ -722,6 +723,80 @@ Loss:
weight:1.0
```
<aname='1.2.10'></a>
#### 1.2.10 PEFD
##### 1.2.10.1 Introduction to PEFD
Paper:
> [Improved Feature Distillation via Projector Ensemble](https://arxiv.org/pdf/2210.15274.pdf)
>
> Yudong Chen, Sen Wang, Jiajun Liu, Xuwei Xu, Frank de Hoog, Zi Huang
>
> NeurIPS 2022
PEFD uses an ensemble of multiple projectors to transform student's features before applying the feature distillation loss, so as to prevent the student from overfitting the teacher's features and further improve the performance of feature distillation.
The PEFD configuration is shown below. In the `Arch` field, you need to define both the student model and the teacher model. The teacher model has fixed parameters, and the pretrained parameters are loaded. In the `Loss` field, you need to define `DistillationPairLoss` (PEFD loss between student and teacher) and `DistillationGTCELoss` (CE loss with ground truth labels) as the training loss.
```yaml
# model architecture
Arch:
name:"DistillationModel"
class_num:&class_num1000
# if not null, its lengths should be same as models
pretrained_list:
# if not null, its lengths should be same as models