“4d8345e3ac0ba79a17359a72e940ade284c0b1a9”上不存在“paddle/phi/kernels/fill_diagonal_kernel.h”
  • Q
    Fix NCCLBcast hang up bug in Parallel Executor (#11377) · 046bb5c8
    Qiyang Min 提交于
    * 1. Create buddy allocator in each places before NcclBcast the variables
    2. Check the memory usage of ALL gpus rather than the first one
    
    * 1. Make NCCLGroupGuard guards only the ncclBcast part, which avoid ncclGroupEnd blocking the exception throwing
    2. NOTE the usage of NCCLGroupGuard
    
    * Remove the memory usage check of gpus
    
    * Fix code style
    046bb5c8
parallel_executor.cc 8.4 KB