• S
    Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61
    sneaxiy 提交于
    * add DistributedFusedLamb op
    
    * polish code
    
    * fix compile error
    
    * compatible with pten changement
    
    * fix rocm compile error
    
    * improve converage
    
    * update upstream/develop
    
    * fix cast_with_ptr.h
    
    * add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1
    
    * fix clip before allreduce
    
    * add use_master_param_norm
    
    * code polish
    
    * fix bug
    
    * fix ROCM ci
    5df3cd61
distributed_fused_lamb.py 12.0 KB