From 7389ea98eaae7ca79442811991cba37f908e5a74 Mon Sep 17 00:00:00 2001 From: dzhwinter Date: Mon, 11 Dec 2017 03:24:25 -0800 Subject: [PATCH] "add NCCL multi-GPU design doc" --- .../design}/images/multigpu_allreduce.graffle | Bin .../design}/images/multigpu_allreduce.png | Bin .../images/multigpu_before_convert.graffle | Bin .../design}/images/multigpu_before_convert.png | Bin .../multigpu.md => doc/design/paddle_nccl.md | 14 ++++++-------- 5 files changed, 6 insertions(+), 8 deletions(-) rename {paddle/framework => doc/design}/images/multigpu_allreduce.graffle (100%) rename {paddle/framework => doc/design}/images/multigpu_allreduce.png (100%) rename {paddle/framework => doc/design}/images/multigpu_before_convert.graffle (100%) rename {paddle/framework => doc/design}/images/multigpu_before_convert.png (100%) rename paddle/framework/multigpu.md => doc/design/paddle_nccl.md (83%) diff --git a/paddle/framework/images/multigpu_allreduce.graffle b/doc/design/images/multigpu_allreduce.graffle similarity index 100% rename from paddle/framework/images/multigpu_allreduce.graffle rename to doc/design/images/multigpu_allreduce.graffle diff --git a/paddle/framework/images/multigpu_allreduce.png b/doc/design/images/multigpu_allreduce.png similarity index 100% rename from paddle/framework/images/multigpu_allreduce.png rename to doc/design/images/multigpu_allreduce.png diff --git a/paddle/framework/images/multigpu_before_convert.graffle b/doc/design/images/multigpu_before_convert.graffle similarity index 100% rename from paddle/framework/images/multigpu_before_convert.graffle rename to doc/design/images/multigpu_before_convert.graffle diff --git a/paddle/framework/images/multigpu_before_convert.png b/doc/design/images/multigpu_before_convert.png similarity index 100% rename from paddle/framework/images/multigpu_before_convert.png rename to doc/design/images/multigpu_before_convert.png diff --git a/paddle/framework/multigpu.md b/doc/design/paddle_nccl.md similarity index 83% rename from paddle/framework/multigpu.md rename to doc/design/paddle_nccl.md index 1c843326ee..7c889fdd7f 100644 --- a/paddle/framework/multigpu.md +++ b/doc/design/paddle_nccl.md @@ -1,15 +1,17 @@ -# Design Doc: Multi-GPU support in Operation Graph +# Design Doc: NCCL support in Paddle Fluid ## Abstract -This Design Doc refers to the multi-GPU feature in paddle. We propose an approach to support multi-GPU both on a single machine and multiple machines. Every device only run sub-graphs which our framework issued. We use `Broadcast`, `Allreduce` operators to join different device sub-graph to the whole graph. - +This Design Doc refers to the NCCL feature in paddle. We propose an approach to support NCCL library both on a single machine and multiple machines. We wrapper the NCCL primitives `Broadcast`, `Allreduce`, `Reduce` as operators to utilize Multi-GPU powers in one script. ## Motivation -Paddle supports training with multiple CPUs and GPUs, refer to different physical devices. We need to support multi-GPU training in parallel for acceleration, in detail, there are two aspects. +NCCL is a Nvidia library support Multi-GPU communicating. [NCCL](https://developer.nvidia.com/nccl). With NCCL library, we can easily accelerate the training in parallel. +- can easily move the optimize sub-graph to parameter server, multi-GPU feature can be compatible with distributed support design. +- easily plug-in with [NCCL2](https://developer.nvidia.com/nccl) library. +- GPU Model parallelism becomes easier to implement. we only need to replace different GPU's sub-graph with different part of the whole graph. - GPU Data Parallelism Suppose to we have `n`GPUs, every GPU has `1/n`part of training data, and store a complete model in GPU memory. @@ -58,7 +60,3 @@ As it shown in the picture, when each GPU compute the gradient of `W`, followed In fact, in the way of every GPU optimized full batch of data, wasted (n-1) GPU compute resources. We will enhance it in the next stage. ### Benefits - -- can easily move the optimize sub-graph to parameter server, multi-GPU feature can be compatible with distributed support design. -- easily plug-in with [NCCL2](https://developer.nvidia.com/nccl) library. -- GPU Model parallelism becomes easier to implement. we only need to replace different GPU's sub-graph with different part of the whole graph. -- GitLab