Created by: pengli09
It seems that Adam and Adamax do not take L2 decay into account when updating parameters.