Compile time - runtime separation / single node multiple GPU milestones
Created by: helinwang
-
Attributes use Attr proto message only (any C++ trick to make changing to it easier?): The current attribute items can't be serialized, we can't use ProgramDesc for executor unless we change this.
$ cd paddle/paddle # brew install ack, or use ack-grep on linux $ ack 'Attr<' |sed 's/Attr</|/' |awk -F'|' '{print $2}'|sed 's/>(/|/' |awk -F'|' '{print $1}'|sort|uniq AttrType T bool float framework::BlockDescBind * int int32_t size_t std::string std::vector<int> std::vector<std::string> # not a single OpDesc::Attr
-
C++ Executor take ProgramDesc
Eventually we will change to ExecutionPlan, ProgramDesc is similar to ExecutionPlan (both can be serialized), but ProgramDescBind is too different from ExecutionPlan. If we develop multiple thread executor based on ProgramDescBind, there are too much to change later.
-
ExecutionPlan design doc (https://github.com/PaddlePaddle/Paddle/pull/6078)
-
change C++ Executor to take ExecutionPlan
-
Multiple thread executor design doc
-
C++ multiple thread Executor
-
simple C++ planner
Just place everything on CPU / GPU-0
-
Modular Python Executor
# pseudo code # implement local Python executor first, remote Python executor later. def run(program): plan = planner.plan(program, local_devices) fetch_vars = cpp_executor.run(plan, g_scope) return fetch_vars
-
single node multiple GPU planner