Multi-node training with All-Reduce for CPU
Created by: hshen14
Hello,
I found current multi-node training provides the support for PS and NCCL. Especially, NCCL supports all-reduce in GPU. May I know whether there is a plan to support all-reduce for CPU? As we know, MPI-based or MLSL-based solution are alternative ways. Intel provides the mature solution based on MLSL and we can have a discussion and provide the necessary support in that direction. Thanks.
@Yancey1989 @helinwang @Superjomn @panyx0718