How to implement DataParallelEngine (#2749) · Issue · PaddlePaddle / Paddle

How to implement DataParallelEngine

Created by: QiJune

We should support a Net running on multi-GPUs. And users can just define a Net and set GPU ids, and the parallel running on multi-GPUs will be automatic.

In caffe2, NCCL and gloo are used to support multi-GPUs on multi-Servers. And both the operations in NCCL and gloo are represented as Operator.

In paddle now, we have implemented MultiGradientMachine and pserver. We might use NCCL to merge gradient in multi-GPUs in our new version. And should we take NCCL operations as Operator?

If NCCL operation is Operator, then one Net might corresponds to multi-GPUs. Or, just we take NCCL operation as a function, then we will have one Net corresponds to one GPU.

PaddlePaddle / Paddle 1 年多 前同步成功

How to implement DataParallelEngine

PaddlePaddle / Paddle
1 年多前同步成功