Created by: jinxing94
Amsgrad optimizer is a variant of the Adam optimizer. It is proposed in the ICLR 2018 paper titled on the convergence of Adam and beyond .
The paper argues that the exponential moving average used in Adam optimizer may cause the model fail to converge to an optimal solution, even in some simple convex optimization settings. Amsgrad is proposed to address the issue by endowing the algorithm with “long-term memory” of past gradients.
Both tensorflow and pytorch have added the amsgrad optimizer in their official APIs. See the amsgrad option in the Adam APIs of tensorflow and pytorch for reference.