Created by: tonyyang-svail
A simple implementation parallel_do supporting multigpu
...
|
ParallelDo
| Split input
| Copy parameter to multiple GPUs
- Wait
|||| Forward on multiple GPUs
- Wait
| Merge output
- Wait
|
...
|
ParallelGradDo
| Split output@grad
- Wait
|||| Backward on multiple GPUs
- Wait
| AllReduce parameters
- Wait
|
...
TODO:
- Correctness
- init device context https://github.com/PaddlePaddle/Paddle/pull/7345
- getPlaceOp https://github.com/PaddlePaddle/Paddle/pull/6732