How to implement DataParallelEngine
Created by: QiJune
We should support a Net running on multi-GPUs. And users can just define a Net and set GPU ids, and the parallel running on multi-GPUs will be automatic.
In caffe2, NCCL and gloo are used to support multi-GPUs on multi-Servers. And both the operations in NCCL and gloo are represented as Operator
.
In paddle now, we have implemented MultiGradientMachine and pserver. We might use NCCL to merge gradient in multi-GPUs in our new version. And should we take NCCL operations as Operator
?
If NCCL operation is Operator
, then one Net might corresponds to multi-GPUs. Or, just we take NCCL operation as a function, then we will have one Net corresponds to one GPU.