Created by: chengduoZH
ncclAllReduce can be replaced by ncclReduce in parallel_do_grad.
ncclAllReduce
ncclReduce
parallel_do_grad