[NPU] flatten params and grads, fuse grad_clip and optimizer op (#33461)
* enable npu alignment * support flatten_params/grads * support clip by global norm * remove memset in coalesce_tensor_op * fix npu kernel of sum op when input is one tensor * add ut for flatten_param_grads+regularizer * fix ut * fix typo
Showing
想要评论请 注册 或 登录