Fix the bug that regularization does not take effect in Adam (#5836) · Issue · PaddlePaddle / Paddle

Fix the bug that regularization does not take effect in Adam

Created by: lcy-seso

Adam works well in practice and compares favorably to other adaptive learning-method algorithms. It becomes a popular (almost a default optimizer for many tasks) optimizer for deep neural networks.

But the current implementation of Adam in V2 API ignore the weight_decay_rate parameter. This means the l2 regularization does not work for Adam and Adamax at all even if the user sets it.

A recent paper Fixing Weight Decay Regularization in Adam points out that decoupling weight decay and the optimization steps achieve a better learning performance.

Correctly implementing regularization is important for a learning task.

We have this PR https://github.com/PaddlePaddle/Paddle/pull/2097 , but it does not correctly implement the L2 regularization in Adam and Adamax.

A related issue reported by one of our users: https://github.com/PaddlePaddle/Paddle/issues/4162

I will fix this.

PaddlePaddle / Paddle 大约 1 年 前同步成功

Fix the bug that regularization does not take effect in Adam

PaddlePaddle / Paddle
大约 1 年前同步成功