Add optimizers with decoupled weight decay
Created by: tianxin1860
Fixing Weight Decay Regularization 这篇论文指出了在使用Adam、SGDM等优化算法时,weight_decay与L2 regularizer并不是等价的,直接把Adam和L2Decay组合起来最终的效果会背离weight_decay的本意,希望尽快添加Adam、SGDM等优化算法的WeightDecay版本实现;
tensorflow实现参考: https://github.com/tensorflow/tensorflow/pull/17438