Fork自 PaddlePaddle / Paddle
implement dygraph.parallel.DataParallel to hook reduce op.
add NCCLParallelContext for parallel dygraph