Multi-GPU, multi-node development milestones. (#5958) · Issue · PaddlePaddle / Paddle

Multi-GPU, multi-node development milestones.

Created by: helinwang

Single node multiple CPU threads
1. Executor support multiple threads: @helinwang propose to drive this.
2. Transpiling: convert user's input ProgramDesc to a ExecutionPlan that supports CPU-only sync-SGD data parallelism: @Yancey1989 .
Single node multiple GPUs
1. Transpiling: convert user's input ProgramDesc to a ExecutionPlan that supports GPU sync-SGD data parallelism.
Multiple nodes
1. operators for feeding data
2. Transpiling: convert user's input ProgramDesc to a ExecutionPlan that runs on multiple nodes.
3. Send / Recv OP
4. ExecutionPlan partition: partition the single ExecutionPlan to multiple ExecutionPlans, each partitioned ExecutionPlan runs on a node. Send / Recv OP is added between edges that cross nodes.
Fault tolerant: single node failure stops the training job and causes a job restart.
1. every executor should save state automatically and loads state upon restart.
Elastic ML: number of nodes can change without interrupting the training (training job will not stop).

Please comment if you have question or suggestions, thanks!