Fork自 PaddlePaddle / Paddle
* sparse_momentum_op is used to save w@GRAD memory for gather_op when gather from a large parameter