support sharding stage1 (#54069)
* support sharding stage1 * fix unittest * format * pass sharded sharding params_and_grads to inner_opt apply_pptimize * change sharding gradient allreduce to reduce * support save state_dict adptively and support sharding with mp * fix sharding test * test set_state_dict * add more unit test * fix global norm of mp case * polish * hack to calculate global norm in order to remove diff in calculating global norm values in HybridParallelClipGrad compared to dp * remove print
Showing
想要评论请 注册 或 登录