Optimize the CUDA kernel in DistributedFusedLamb optimizer (#39972)
* vectorize lamb kernel * remove flags, add ut * remove useless codes * refine code, add param order
Showing
想要评论请 注册 或 登录
* vectorize lamb kernel * remove flags, add ut * remove useless codes * refine code, add param order