* sparse_momentum_op is used to save w@GRAD memory for gather_op when gather from a large parameter
拖放文件到此处或点击上传