Add Amsgrad optimizer (!20995) · 合并请求 · PaddlePaddle / Paddle

Add Amsgrad optimizer !20995

Created by: jinxing94

Amsgrad optimizer is a variant of the Adam optimizer. It is proposed in the ICLR 2018 paper titled on the convergence of Adam and beyond .

The paper argues that the exponential moving average used in Adam optimizer may cause the model fail to converge to an optimal solution, even in some simple convex optimization settings. Amsgrad is proposed to address the issue by endowing the algorithm with “long-term memory” of past gradients.

Both tensorflow and pytorch have added the amsgrad optimizer in their official APIs. See the amsgrad option in the Adam APIs of tensorflow and pytorch for reference.

PaddlePaddle / Paddle 大约 1 年 前同步成功

Add Amsgrad optimizer !20995

PaddlePaddle / Paddle
大约 1 年前同步成功