Adding the Adam Optimizer operator (#4733)
* add adam op moment1_out = beta1 * moment1 + (1 − beta1) * grad moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad moment1_hat = moment1_out / (1 - beta1^t) moment2_hat = moment2_out / (1 - beta2^t) param_out = param - learning_rate * moment1_hat / (sqrt(moment2_hat) + epsilon) * fix moment 2 * Adding the Adam optimization operator * Adding more tests for Adam op
Showing
paddle/operators/adam_op.cc
0 → 100644
paddle/operators/adam_op.cu
0 → 100644
paddle/operators/adam_op.h
0 → 100644
想要评论请 注册 或 登录