• Y
    supports collective training with programs (#18392) · a873fa84
    Yi Liu 提交于
    1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
    2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
    3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
    a873fa84
c_gen_nccl_id_op.cc 5.1 KB