From 3b3981565eec50aeb101b31916f568ac1bc94683 Mon Sep 17 00:00:00 2001 From: tangwei12 Date: Tue, 27 Mar 2018 22:07:15 +0800 Subject: [PATCH] mpi enabled design doc --- .../design/dist_train/mpi_enabled_design.md | 56 +++++++++++++++++++ paddle/fluid/operators/detail/mpi_utils.cpp | 4 ++ paddle/fluid/operators/detail/mpi_utils.h | 8 +++ 3 files changed, 68 insertions(+) create mode 100644 doc/fluid/design/dist_train/mpi_enabled_design.md create mode 100644 paddle/fluid/operators/detail/mpi_utils.cpp create mode 100644 paddle/fluid/operators/detail/mpi_utils.h diff --git a/doc/fluid/design/dist_train/mpi_enabled_design.md b/doc/fluid/design/dist_train/mpi_enabled_design.md new file mode 100644 index 00000000000..19f4298d71c --- /dev/null +++ b/doc/fluid/design/dist_train/mpi_enabled_design.md @@ -0,0 +1,56 @@ +#MPI-enabled PaddlePaddle Design doc +## Overview +We will introduce Open MPI API to PaddlePaddle, which can bring two benefits to PaddlePaddle: +1. Enable RDMA with PaddlePaddle, which bring high performance low latency networks. +2. Enable GPUDriect with PaddlePaddle, which bring highest throughput and lowest latency GPU read and write. + +## Global Config +Launch the script using the 'mpirun' launcher, For example: ```mpirun -np 3 -hosts node1,node2,node3 python train.py```. By doing this, We can number the actors (trainer/pserver/master) whith o .. (n-1). The actor's number is the Rank of the calling process in group of comm (integer), The MPI processes identify each other using an Rank ID. We have to create a mapping between PaddlePaddle's actors and there Rank ID, so that we can communicate with the correct destinations when using MPI operations. + **We have to store the Rank ID and the mapping in global variables.** + +#Utils +We will build mpi_send_recv_utils Class to unify package interface about MPI Send and Receive. +```c++ +#mpi send and receive utils +class Mpi_ISend { + +} +class Mpi_IRecv { + +} + +class MPIUtils { + public: + const int GetRankID(const std::string& task_id); + void InitMPI(); + private: + std::map name_to_id_; +} + +``` +```c++ +class MPIServer { + public: + SetCond(); + ShutDown(); + WaitClientGet(); + reset(); + Push(); + SetScope(); + SetDevCtx(); + get(); +} +``` + +## New OP +We won't replace all the gRPC requests to MPI requests, the standard gRPC library is used for all administrative operations and the MPI API will used to transfer tensor or selectRows to Pservers. Base of this idea, we create two new operators to handle requests and receives, the two operators are send_mpi_op and listenandserve_mpi_op. They are a little similar with [send_op](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/send_op.cc) and [listen_and_serv_op](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/listen_and_serv_op.cc). + +### send_mpi_op +vary similar with send_op, we will replace grpc with mpi send service. +### listenandserve_mpi_op +vary similar with listen_and_serv_op, we will replace grpc with mpi receive service. +## Build args +Beause MPI or CUDA need hardware supported, so we will add some build args to control compiling. +**The specific arguments is under design** +## Execute args +Launch the script using the 'mpirun' launcher, For example: ```mpirun -np 3 -hosts node1,node2,node3 python train.py```. \ No newline at end of file diff --git a/paddle/fluid/operators/detail/mpi_utils.cpp b/paddle/fluid/operators/detail/mpi_utils.cpp new file mode 100644 index 00000000000..adf4a3b9254 --- /dev/null +++ b/paddle/fluid/operators/detail/mpi_utils.cpp @@ -0,0 +1,4 @@ +// +// Created by tangwei12 on 2018/3/27. +// + diff --git a/paddle/fluid/operators/detail/mpi_utils.h b/paddle/fluid/operators/detail/mpi_utils.h new file mode 100644 index 00000000000..fb2f1412461 --- /dev/null +++ b/paddle/fluid/operators/detail/mpi_utils.h @@ -0,0 +1,8 @@ +// +// Created by tangwei12 on 2018/3/27. +// + +#ifndef PADDLE_MPI_UTILS_H +#define PADDLE_MPI_UTILS_H + +#endif //PADDLE_MPI_UTILS_H -- GitLab