-
由 Yi Liu 提交于
1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops 2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext 3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
a873fa84