提交 962a94b7 编写于 作者: H HydrogenSulfate

Merge branch 'fix_amp_train_new' into fix_amp_train

# PaddleClas FAQ Summary - 2022 Season 1
## Before You Read
- We collect some frequently asked questions in issues and user groups since PaddleClas is open-sourced and provide brief answers, aiming to give some reference for the majority to save you from twists and turns.
- There are many talents in the field of image classification, recognition and retrieval with quickly updated models and papers, and the answers here mainly rely on our limited project practice, so it is not possible to cover all facets. We sincerely hope that the man of insight will help to supplement and correct the content, thanks a lot.
## Catalogue
- [1. Theory](#1-theory)
- [2. Practice](#2-actual-combat)
- [2.1 Common problems of training and evaluation](#21-common-problems-of-training-and-evaluation)
- [Q2.1.1 How to freeze the parameters of some layers during training?](#q211-how-to-freeze-the-parameters-of-some-layers-during-training)
<a name="1"></a>
## 1. Theory
<a name="2"></a>
## 2. Practice
<a name="2.1"></a>
### 2.1 Common problems of training and evaluation
#### Q2.1.1 How to freeze the parameters of some layers during training?
**A**: There are currently three methods available
1. Manually modify the model code, use `paddle.ParamAttr(learning_rate=0.0)`, and set the learning rate of the frozen layer to 0.0. For details, see [paddle.ParamAttr documentation](https://www.paddlepaddle.org.cn/documentation/docs/en/develop/api/paddle/ParamAttr_en.html#paramattr). The following code can set the learning rate of the weight parameter of the self.conv layer to 0.0.
```python
self.conv = Conv2D(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
weight_attr=ParamAttr(learning_rate=0.0), # <-- set here
bias_attr=False,
data_format=data_format)
```
2. Manually set stop_gradient=True for the frozen layer, please refer to [this link](https://github.com/RainFrost1/PaddleClas/blob/24e968b8d9f7d9e2309e713cbf2afe8fda9deacd/ppcls/engine/train/train_idml.py#L40-L66). When using this method, after the gradient is returned to the layer which set strop_gradient=True, the gradient backward is stopped, that is, the weight of the previous layer will be fixed.
3. After loss.backward() and before optimizer.step(), use the clear_gradients() method in nn.Layer. For the layer to be fixed, call this method without affecting the loss return. The following code can clear the gradient of a layer or the gradient of a parameter of a layer
```python
import paddle
linear = paddle.nn.Linear(3, 4)
x = paddle.randn([4, 3])
y = linear(x)
loss = y.sum().backward()
print(linear.weight.grad)
print(linear.bias.grad)
linear.clear_gradients() # clear the gradients of the entire layer
# linear.weight.clear_grad() # Only clear the gradient of the weight parameter of the Linear layer
print(linear.weight.grad)
print(linear.bias.grad)
```
# PaddleClas 相关常见问题汇总 - 2022 第 1 季
## 写在前面
* 我们收集整理了开源以来在 issues 和用户群中的常见问题并且给出了简要解答,旨在为广大用户提供一些参考,也希望帮助大家少走一些弯路。
* 图像分类、识别、检索领域大佬众多,模型和论文更新速度也很快,本文档回答主要依赖有限的项目实践,难免挂一漏万,如有遗漏和不足,也希望有识之士帮忙补充和修正,万分感谢。
## 目录
- [1. 理论篇](#1-理论篇)
- [2. 实战篇](#2-实战篇)
- [2.1 训练与评估共性问题](#21-训练与评估共性问题)
- [Q2.1.1 如何在训练时冻结某些层的参数?](#q211-如何在训练时冻结某些层的参数)
<a name="1"></a>
## 1. 理论篇
<a name="2"></a>
## 2. 实战篇
<a name="2.1"></a>
### 2.1 训练与评估共性问题
#### Q2.1.1 如何在训练时冻结某些层的参数?
**A**:目前有三种方法可以使用
1. 手动修改模型代码,使用`paddle.ParamAttr(learning_rate=0.0)`,将冻结层的学习率设置为0.0,具体用法可以查看[paddle.ParamAttr文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/ParamAttr_cn.html#paramattr)。如下代码可以将self.conv层的weight参数学习率设置为0.0。
```python
self.conv = Conv2D(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
weight_attr=ParamAttr(learning_rate=0.0), # <--在这里设置
bias_attr=False,
data_format=data_format)
```
2. 手动设置冻结层的stop_gradient=True,可参考[此链接](https://github.com/RainFrost1/PaddleClas/blob/24e968b8d9f7d9e2309e713cbf2afe8fda9deacd/ppcls/engine/train/train_idml.py#L40-L66)。使用此方法后,梯度回传到strop_gradient的层之后,停止反向回传,即之前的层的权重也会被固定。
3. 在loss.backward()之后,optimizer.step()之前,使用nn.Layer或者paddle.Tensor的clear_gradients()方法。对要固定的层或参数,调用此方法,不用影响loss回传。如下代码可以清空某一层的梯度或者是某一层的某个参数张量的梯度
```python
import paddle
linear = paddle.nn.Linear(3, 4)
x = paddle.randn([4, 3])
y = linear(x)
loss = y.sum().backward()
print(linear.weight.grad)
print(linear.bias.grad)
linear.clear_gradients() # 清空整个Linear层的梯度,包括linear.weight和linear.bias
# linear.weight.clear_grad() # 只清空Linear.weight的梯度
print(linear.weight.grad)
print(linear.bias.grad)
```
...@@ -7,5 +7,7 @@ wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/ImageNet1k/ILSVR ...@@ -7,5 +7,7 @@ wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/data/ImageNet1k/ILSVR
tar xf ILSVRC2012_val.tar tar xf ILSVRC2012_val.tar
ln -s ILSVRC2012_val ILSVRC2012 ln -s ILSVRC2012_val ILSVRC2012
cd ILSVRC2012 cd ILSVRC2012
ln -s val_list.txt train_list.txt for ((i=1; i<=4; i++));do
cat val_list.txt >> train_list.txt
done
cd ../../ cd ../../
...@@ -63,7 +63,7 @@ function _train(){ ...@@ -63,7 +63,7 @@ function _train(){
esac esac
echo "train_cmd: ${train_cmd} log_file: ${log_file}" echo "train_cmd: ${train_cmd} log_file: ${log_file}"
timeout 5m ${train_cmd} > ${log_file} 2>&1 timeout 10m ${train_cmd} > ${log_file} 2>&1
if [ $? -ne 0 ];then if [ $? -ne 0 ];then
echo -e "${model_name}, FAIL" echo -e "${model_name}, FAIL"
else else
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册