You need to sign in or sign up before continuing.
  • P
    support sharding stage1 (#54069) · 974676bc
    pangengzheng 提交于
    * support sharding stage1
    
    * fix unittest
    
    * format
    
    * pass sharded sharding params_and_grads to inner_opt apply_pptimize
    
    * change sharding gradient allreduce to reduce
    
    * support save state_dict adptively and support sharding with mp
    
    * fix sharding test
    
    * test set_state_dict
    
    * add more unit test
    
    * fix global norm of mp case
    
    * polish
    
    * hack to calculate global norm in order to remove diff in calculating global norm values in HybridParallelClipGrad compared to dp
    
    * remove print
    974676bc
topology.py 13.7 KB