[Auto Parallel] performance improvement for Sharding-DP hybrid parallelism (#46180)
* remove no need grad allreduce communication when sharding-dp * remove no need grad allreduce communication when sharding-dp * bugfix * bugfix * bugfix
Showing
想要评论请 注册 或 登录