sparse_momentum_op is used to save w@GRAD memory for gather_op (#34942)
* sparse_momentum_op is used to save w@GRAD memory for gather_op when gather from a large parameter
Showing
想要评论请 注册 或 登录
* sparse_momentum_op is used to save w@GRAD memory for gather_op when gather from a large parameter