Compile time - runtime separation / single node multiple GPU milestones (#6359) · Issue · PaddlePaddle / Paddle

Compile time - runtime separation / single node multiple GPU milestones

Created by: helinwang

Attributes use Attr proto message only (any C++ trick to make changing to it easier?): The current attribute items can't be serialized, we can't use ProgramDesc for executor unless we change this.

$ cd paddle/paddle

# brew install ack, or use ack-grep on linux
$ ack 'Attr<' |sed 's/Attr</|/' |awk -F'|' '{print $2}'|sed 's/>(/|/' |awk -F'|' '{print $1}'|sort|uniq
AttrType
T
bool
float
framework::BlockDescBind *
int
int32_t
size_t
std::string
std::vector<int>
std::vector<std::string>
# not a single OpDesc::Attr

C++ Executor take ProgramDesc

Eventually we will change to ExecutionPlan, ProgramDesc is similar to ExecutionPlan (both can be serialized), but ProgramDescBind is too different from ExecutionPlan. If we develop multiple thread executor based on ProgramDescBind, there are too much to change later.
ExecutionPlan design doc (https://github.com/PaddlePaddle/Paddle/pull/6078)
change C++ Executor to take ExecutionPlan
Multiple thread executor design doc
C++ multiple thread Executor
simple C++ planner

Just place everything on CPU / GPU-0

Modular Python Executor

current code:

# pseudo code
# implement local Python executor first, remote Python executor later.
def run(program):
  plan = planner.plan(program, local_devices)
  fetch_vars = cpp_executor.run(plan, g_scope)
  return fetch_vars

single node multiple GPU planner

PaddlePaddle / Paddle
大约 1 年前同步成功

Compile time - runtime separation / single node multiple GPU milestones

Dependency

PaddlePaddle / Paddle 大约 1 年 前同步成功

Compile time - runtime separation / single node multiple GPU milestones

Dependency

PaddlePaddle / Paddle
大约 1 年前同步成功