• G
    Fix cuda12 timeout problems. (#54615) · a90d9088
    Ghost Screaming 提交于
    * Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
    is wrong.
    
    * Remove climits.
    
    * Fix problem of pickle and NCCL_P2P_DISABLE in distributed testcases in
    cuda12.
    
    * Fix problem of TimeOut of distributed testcases under cuda12.
    
    * Remove useless modification.
    
    * Remove useless modification.
    a90d9088
hybrid_parallel_pp_bf16.py 5.6 KB