提交 042d1e7e 编写于 作者: G gaotingquan 提交者: cuicheng01

fix layer key name for dynamic lr in adamwdl optimizer

上级 80ae9079
......@@ -411,7 +411,10 @@ class AdamWDL(object):
idx = static_name.find("blocks.")
layer = int(static_name[idx:].split(".")[1])
ratio = decay_rate**(n_layers - layer)
elif "embed" in static_name:
elif any([
key in static_name
for key in ["embed", "token", "conv1", "ln_pre"]
]):
ratio = decay_rate**(n_layers + 1)
# param.optimize_attr["learning_rate"] *= ratio
return ratio
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册