• G
    Add mp_all_reduce asynchronize overlap. (#55662) · 6b1dfb5f
    Ghost Screaming 提交于
    * [WIP] Add mp_all_reduce asynchronize overlap.
    
    * Fix some problems.
    
    * Fix dw compute bug, and use a temporary solution to achieve overlap.
    
    * Use fused_linear_param_grad_add to compute dw.
    
    * Reformat ColumnParallel _overlap_linear. Use environment flags to
    control following behaviors:
    1. export Flags_mp_aysnc_allreduce=True to turn on mp async all_reduce
    2. export Flags_skip_mp_c_identity=True to skip two c_identity operators
       in dygraph mode.
    3. export Flags_fused_linear_param_grad_add to enable fused_linear_param_grad_add
       in ColumnParallel backward with mp async all_reduce.
    
    * Polish code.
    
    * Remove useless communication API.
    
    * Fix some problems in mp_async_all_reduce and skip_c_identity.
    
    * Add test cases.
    
    * Remove environment variable Flags_fused_linear_param_grad_add in test case.
    
    * Reset error threshold.
    
    * Reset threshold in test case.
    
    * Add useful log. Remove useless test cases.
    6b1dfb5f
mp_ops.py 30.6 KB