• A
    Adding the Adam Optimizer operator (#4733) · 11680037
    Abhinav Arora 提交于
    * add adam op
    
    moment1_out = beta1 * moment1 + (1 − beta1) * grad
    moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad
    moment1_hat =  moment1_out / (1 - beta1^t)
    moment2_hat =  moment2_out / (1 - beta2^t)
    param_out = param - learning_rate * moment1_hat / (sqrt(moment2_hat) +
    epsilon)
    
    * fix moment 2
    
    * Adding the Adam optimization operator
    
    * Adding more tests for Adam op
    11680037
adam_op.cu 819 字节