Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
436144e9
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
436144e9
编写于
1月 19, 2021
作者:
W
WangXi
提交者:
GitHub
1月 19, 2021
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
fix adamw lr_to_coeff is fixed when dygraph (#30526) (#30559)
上级
832032c2
变更
2
显示空白变更内容
内联
并排
Showing
2 changed file
with
29 addition
and
8 deletion
+29
-8
python/paddle/fluid/tests/unittests/test_adamw_op.py
python/paddle/fluid/tests/unittests/test_adamw_op.py
+18
-7
python/paddle/optimizer/adamw.py
python/paddle/optimizer/adamw.py
+11
-1
未找到文件。
python/paddle/fluid/tests/unittests/test_adamw_op.py
浏览文件 @
436144e9
...
...
@@ -98,15 +98,26 @@ class TestAdamWOp(unittest.TestCase):
value
=
np
.
arange
(
26
).
reshape
(
2
,
13
).
astype
(
"float32"
)
a
=
paddle
.
to_tensor
(
value
)
linear
=
paddle
.
nn
.
Linear
(
13
,
5
)
lr
=
paddle
.
optimizer
.
lr
.
NoamDecay
(
d_model
=
0.01
,
warmup_steps
=
10
)
wd
=
0.1
adam
=
paddle
.
optimizer
.
AdamW
(
learning_rate
=
paddle
.
optimizer
.
lr
.
NoamDecay
(
d_model
=
512
,
warmup_steps
=
4000
),
learning_rate
=
lr
,
parameters
=
linear
.
parameters
(),
apply_decay_param_fun
=
lambda
name
:
True
,
weight_decay
=
0.01
)
weight_decay
=
wd
)
for
_
in
range
(
2
):
out
=
linear
(
a
)
out
.
backward
()
lr_to_coeff
=
adam
.
_lr_to_coeff
adam
.
step
()
for
i
,
value
in
enumerate
(
lr_to_coeff
.
values
()):
self
.
assertAlmostEqual
(
value
.
numpy
()[
0
],
1.0
-
lr
()
*
wd
)
self
.
assertEqual
(
len
(
adam
.
_lr_to_coeff
),
0
)
lr
.
step
()
adam
.
clear_gradients
()
...
...
python/paddle/optimizer/adamw.py
浏览文件 @
436144e9
...
...
@@ -173,7 +173,10 @@ class AdamW(Adam):
[
param
,
grad
]),
framework
.
name_scope
(
'weight decay'
):
self
.
_params_name
.
add
(
param
.
name
)
# If it has been calculated, the result will be reused
# If it has been calculated, the result will be reused.
# NOTE(wangxi): In dygraph mode, apply_gradient will be executed
# every step, so need clear _lr_to_coeff every step,
# we do this in _create_optimization_pass
decay_coeff
=
self
.
_lr_to_coeff
.
get
(
learning_rate
,
None
)
if
decay_coeff
is
None
:
decay_coeff
=
1.0
-
learning_rate
*
self
.
_coeff
...
...
@@ -186,5 +189,12 @@ class AdamW(Adam):
self
.
_append_decoupled_weight_decay
(
block
,
param_and_grad
)
return
super
(
AdamW
,
self
).
_append_optimize_op
(
block
,
param_and_grad
)
def
_create_optimization_pass
(
self
,
parameters_and_grads
):
optimize_ops
=
super
(
AdamW
,
self
).
_create_optimization_pass
(
parameters_and_grads
)
# In dygraph mode, clear _lr_to_coeff after applied gradient
self
.
_lr_to_coeff
=
dict
()
return
optimize_ops
def
__str__
(
self
):
return
" "
.
join
([
"Weight Decay, params:"
,
","
.
join
(
self
.
_params_name
)])
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录